How Python
Actually Works
What happens when you type python main.py? What is CPython? Why is Python "slow"? Why does is work differently from ==? This section answers the internals questions interviewers love.
When you say "Python," you almost certainly mean CPython β the reference implementation written in C. There are others: PyPy (JIT, 4β5Γ faster), Jython (JVM), IronPython (.NET), MicroPython (embedded).
# Step 1 β Lexing: source β tokens
# "x = 5" β [NAME('x'), OP('='), NUMBER(5)]
# Step 2 β Parsing: tokens β AST (Abstract Syntax Tree)
import ast
tree = ast.parse("x = 5 + 3")
print(ast.dump(tree, indent=2))
# Step 3 β Compilation: AST β bytecode (.pyc in __pycache__)
import dis
def add(a, b):
return a + b
dis.dis(add)
# LOAD_FAST 'a'
# LOAD_FAST 'b'
# BINARY_OP +
# RETURN_VALUE
# Step 4 β PVM (Python Virtual Machine) executes bytecode
# The PVM is a loop reading and running bytecode instructions- "Is Python compiled or interpreted?" β BOTH. Sourceβbytecode (compile) then PVM executes (interpret)
- "What is a .pyc file?" β Cached bytecode. Speeds up re-imports, not re-running
- "What is PyPy?" β Alternative Python runtime with a JIT compiler. 4β5Γ faster for CPU loops
a = [1, 2, 3] b = [1, 2, 3] print(a == b) # True β same values print(a is b) # False β different objects # CPython caches small ints -5 to 256 as singletons x = 256; y = 256 print(x is y) # True β cached! x = 257; y = 257 print(x is y) # False β not cached # String interning β short identifier-like strings are interned a = "hello"; b = "hello" print(a is b) # True (interned) a = "hello world"; b = "hello world" print(a is b) # May be False # Rule: ONLY use 'is' for None, True, False if x is None: # β correct pass
| Mutable | Immutable |
|---|---|
| list | int, float, bool, complex |
| dict | str |
| set | tuple |
| bytearray | frozenset, bytes |
# Functions receive reference to object, not a copy def modify(lst): lst.append(99) # modifies ORIGINAL my_list = [1, 2, 3] modify(my_list) print(my_list) # [1, 2, 3, 99] β CHANGED! # Strings are immutable β always creates new object def modify_str(s): s += " world" # new string object, original unchanged my_str = "hello" modify_str(my_str) print(my_str) # "hello" β unchanged # Dict keys must be hashable (immutable) d = {[1,2]: "val"} # TypeError: unhashable type: 'list' d = {(1,2): "val"} # β tuple is hashable # Tuple containing a mutable β the gotcha t = ([1,2], [3,4]) t[0].append(99) # works! tuple refs unchanged, list mutates print(t) # ([1, 2, 99], [3, 4])
x = "global" def outer(): x = "enclosing" def inner(): x = "local" print(x) # "local" inner() print(x) # "enclosing" outer() print(x) # "global" # global / nonlocal keywords count = 0 def inc(): global count count += 1 def make_counter(): n = 0 def inc(): nonlocal n # modify enclosing n n += 1 return n return inc # β closure! inc remembers n counter = make_counter() print(counter()) # 1 print(counter()) # 2 β n persists across calls! # β οΈ Closure loop trap fns = [lambda: i for i in range(3)] print([f() for f in fns]) # [2,2,2] β all see final i! fns = [lambda i=i: i for i in range(3)] print([f() for f in fns]) # [0,1,2] β captured at creation
- "What is a closure?" β A function that captures variables from its enclosing scope, even after the outer function has returned.
Data Types &
Variables
Numbers, booleans, None β every type with its gotchas and interview traps.
# int β arbitrary precision, NO overflow! print(10 ** 100) # googol β no problem in Python binary = 0b1010 # 10 octal = 0o17 # 15 hexa = 0xFF # 255 big = 1_000_000 # underscores for readability # float β IEEE 754 double (64-bit) print(0.1 + 0.2) # 0.30000000000000004 β floating point! print(0.1 + 0.2 == 0.3) # False! import math print(math.isclose(0.1 + 0.2, 0.3)) # True β correct comparison # For money: use Decimal, NOT float from decimal import Decimal, getcontext getcontext().prec = 28 price = Decimal("19.99") # ALWAYS pass as string! tax = Decimal("0.18") print(price * tax) # 3.5982 (exact!) # Division types print(7 / 2) # 3.5 β true division (always float) print(7 // 2) # 3 β floor division print(-7 // 2) # -4 β floors towards -infinity! print(7 % 2) # 1 β modulo print(2 ** 10) # 1024 β power # int() truncates (towards zero), // floors (towards -inf) print(int(-3.9)) # -3 (truncate) print(-3.9 // 1) # -4 (floor)
# Falsy values in Python: # False, None, 0, 0.0, 0j, "", [], {}, set(), () # Everything else is truthy! print(bool([])) # False β empty list print(bool([0])) # True β list with ONE item print(bool("")) # False β empty string print(bool("0")) # True β non-empty string print(bool(0)) # False print(bool(0.0)) # False print(bool(None)) # False # bool is a subclass of int! print(isinstance(True, int)) # True print(True + True) # 2 print(True * 5) # 5 print(True == 1) # True print(False == 0) # True # short-circuit evaluation x = None name = x or "default" # "default" β x is falsy y = [1,2,3] first = y and y[0] # 1 β y is truthy, returns y[0] # and/or return values, not booleans! print(0 or "hello") # "hello" print(5 and "hello") # "hello" print([] or {}) # {} β both falsy, returns last
- "Is bool a subclass of int?" β Yes. True == 1, False == 0. True + True == 2.
- "What does 'and' return?" β The first falsy value, or the last value if all truthy.
# Variable = label pointing to an object in memory a = [1, 2, 3] # a β list object at memory 0x1234 b = a # b β SAME list object at 0x1234 b.append(4) print(a) # [1, 2, 3, 4] β BOTH see the change! # Assignment creates a new binding, not a copy b = [99] # b now β different list at 0x5678 print(a) # [1, 2, 3, 4] β a unchanged # Augmented assignment on immutable creates new object x = 5 print(id(x)) # some memory address x += 1 # x now points to NEW int(6)! print(id(x)) # different address # Augmented assignment on mutable modifies in-place lst = [1, 2] print(id(lst)) lst += [3] # calls lst.__iadd__([3]) β in-place print(id(lst)) # SAME address! # Multiple assignment, swap (no temp variable needed) a, b = 1, 2 a, b = b, a # swap β Python evaluates right side first # Chained assignment x = y = z = 0 # all point to SAME int(0) object x = y = z = [] # β οΈ all point to SAME list! (gotcha)
Strings β
Deep Dive
Immutable sequences of Unicode. Slicing, methods, formatting, and the performance traps.
s = "Python" print(s[0]) # 'P' print(s[-1]) # 'n' β negative = from end print(s[0:3]) # 'Pyt' β stop is EXCLUSIVE print(s[2:]) # 'thon' β to end print(s[:4]) # 'Pyth' β from start print(s[::2]) # 'Pto' β every 2nd char print(s[::-1]) # 'nohtyP' β REVERSE! # Strings are immutable s[0] = "J" # TypeError! s = "J" + s[1:] # must create new string # String repetition print("ha" * 3) # "hahaha" print("-" * 40) # separator line # Efficient joining parts = ["Hello", "World", "Python"] result = " ".join(parts) # O(n) β correct! # result = ""; for p in parts: result += p β O(nΒ²)! # Check membership print("Py" in "Python") # True print("py" in "Python") # False β case sensitive
s = " Hello, World! " # Case s.upper(); s.lower(); s.title(); s.swapcase() # Strip s.strip() # "Hello, World!" (both ends) s.lstrip() # strip left only s.rstrip() # strip right only s.strip("!") # strip specific char # Search s.find("World") # 9 β index, or -1 if not found s.index("World") # 9 β index, raises ValueError if not found s.count("l") # 3 s.startswith(" H") # True s.endswith(" ") # True # Split & replace "a,b,c".split(",") # ['a', 'b', 'c'] "a,b,c".split(",", 1) # ['a', 'b,c'] β max 1 split "a b c".split() # ['a', 'b', 'c'] β split on whitespace "hello".replace("l", "L") # "heLLo" "hello".replace("l","L",1) # "heLlo" β max 1 replacement # Check content "123".isdigit() # True "abc".isalpha() # True "abc123".isalnum() # True " ".isspace() # True "abc".islower() # True # Encoding "Hello".encode("utf-8") # b'Hello' b"Hello".decode("utf-8") # 'Hello'
name = "Ravi"; score = 95.678 # 1. % formatting (legacy, avoid) "Hello %s, score %.2f" % (name, score) # 2. .format() "Hello {}, score {:.2f}".format(name, score) "Hello {n}, score {s:.2f}".format(n=name, s=score) # 3. f-strings (Python 3.6+) β FASTEST f"Hello {name}, score {score:.2f}" f"{name!r}" # repr: 'Ravi' f"{score:010.3f}" # 000095.678 f"{1_000_000:,}" # 1,000,000 f"{'center':^20}" # centered in 20 chars f"{'left':<20}" # left-aligned f"{'right':>20}" # right-aligned # Python 3.8+ β self-documenting expressions x = 42 print(f"{x = }") # x = 42 print(f"{2 + 2 = }") # 2 + 2 = 4 # 4. Template strings (safe for user input) from string import Template t = Template("Hello $name") t.substitute(name="Ravi") # "Hello Ravi" # Raw strings β backslash NOT processed path = r"C:\Users\name\docs" # no need to escape \ regex = r"\d+\.\d+" # regex patterns
import re text = "My email is ravi@example.com and priya@test.org" # search β find FIRST match anywhere m = re.search(r"\w+@\w+\.\w+", text) print(m.group()) # "ravi@example.com" print(m.start()) # 12 print(m.end()) # 28 # findall β ALL matches as list emails = re.findall(r"\w+@\w+\.\w+", text) print(emails) # ['ravi@example.com', 'priya@test.org'] # match β only matches at START of string re.match(r"\d+", "123abc") # Match re.match(r"\d+", "abc123") # None # sub β replace matches result = re.sub(r"\w+@\w+\.\w+", "[REDACTED]", text) # Groups m = re.search(r"(\w+)@(\w+)\.(\w+)", text) print(m.group(0)) # full match print(m.group(1)) # "ravi" print(m.group(2)) # "example" # Named groups m = re.search(r"(?P<user>\w+)@(?P<domain>\w+)", text) print(m.group("user")) # "ravi" # Compile for reuse (faster in loops) pattern = re.compile(r"\d+", re.IGNORECASE) pattern.findall("abc123def456") # ['123', '456'] # Flags: re.IGNORECASE, re.MULTILINE, re.DOTALL
Collections β
List Β· Tuple Β· Set Β· Dict
Python's built-in collections with time complexities, internals, and when to use each.
| Operation | Time | Note |
|---|---|---|
| Access by index | O(1) | Direct pointer |
| append() | O(1) amortized | Occasional resize |
| insert(0,x) | O(n) | Shifts all elements |
| pop() | O(1) | From end |
| pop(0) | O(n) | Shifts all elements |
| x in list | O(n) | Linear search |
| sort() | O(n log n) | Timsort β stable |
a = [1, 2, 3] a.append(4) # [1,2,3,4] a.insert(1, 99) # [1,99,2,3,4] a.extend([5, 6]) # add multiple a.remove(99) # remove first occurrence a.pop() # remove+return last a.pop(0) # remove+return index 0 β O(n)! del a[1:3] # delete slice # Sorting a.sort() # in-place, returns None a.sort(key=lambda x: -x) # sort by key a.sort(key=lambda x: x%3, reverse=True) b = sorted(a) # returns NEW list b = sorted(a, key=len) # for strings # shallow vs deep copy b = a[:] # shallow copy b = a.copy() # shallow copy import copy b = copy.deepcopy(a) # deep copy # Unpacking first, *rest = [1, 2, 3, 4] # first=1, rest=[2,3,4] a, b, c = [1, 2, 3]
d = {"name": "Ravi", "age": 25}
d = dict(name="Ravi", age=25)
d = {i: i**2 for i in range(5)} # comprehension
d = dict(zip(["a","b"], [1,2])) # {'a':1,'b':2}
# Safe access
d["name"] # "Ravi" β KeyError if missing
d.get("name") # "Ravi" β None if missing
d.get("phone", "N/A") # "N/A" β default
# Modify
d["email"] = "r@r.com" # add or update
d.update({"city": "MUM"}) # merge dict
d |= {"x": 1} # merge operator (3.9+)
merged = d1 | d2 # new merged dict (3.9+)
# Delete
del d["age"] # KeyError if missing
d.pop("age") # remove + return
d.pop("age", None) # safe β no error
d.popitem() # remove + return last item
# Iteration
for k in d: # keys (default)
pass
for k, v in d.items(): # key-value pairs
pass
# setdefault
d.setdefault("scores", []).append(95)
# If "scores" missing, create []; then append
# defaultdict
from collections import defaultdict
wc = defaultdict(int) # auto 0 for missing keys
wc["hello"] += 1 # no KeyError!
groups = defaultdict(list)
groups["even"].append(2) # no KeyError!- "How does a dict work?" β hash(key) β finds bucket β stores key-value. Collision resolution via open addressing in CPython.
- "Are Python dicts ordered?" β Yes, insertion-ordered since Python 3.7.
- "What makes a good dict key?" β Must be hashable (immutable). str, int, tuple are hashable. list, dict, set are not.
s = {1, 2, 3}
s = set([1,2,2,3]) # {1,2,3} β deduplicates
s = set() # β οΈ NOT {} β that's a dict!
s.add(4); s.update([5,6])
s.remove(4) # KeyError if missing
s.discard(99) # safe β no error
# Set math
a = {1,2,3,4}; b = {3,4,5,6}
a | b # {1,2,3,4,5,6} β union
a & b # {3,4} β intersection
a - b # {1,2} β difference
a ^ b # {1,2,5,6} β symmetric diff
a.issubset(b); a.issuperset(b); a.isdisjoint(b)
# O(1) membership β use set, not list!
valid = {"MUM","DEL","BLR"}
if "MUM" in valid: # O(1) vs O(n) for list
pass
# frozenset β immutable, hashable set
fs = frozenset([1,2,3])
d = {fs: "value"} # β
can be dict key
# Deduplicate preserving order (3.7+)
data = [3,1,4,1,5,9,2,6,5,3]
unique = list(dict.fromkeys(data)) # [3,1,4,5,9,2,6]from collections import Counter, deque, namedtuple, ChainMap, OrderedDict # Counter β count occurrences c = Counter(["a","b","a","c","a","b"]) print(c) # Counter({'a':3,'b':2,'c':1}) print(c.most_common(2)) # [('a',3),('b',2)] print(c["z"]) # 0 β not KeyError! c1 = Counter("abcc"); c2 = Counter("abc") print(c1 - c2) # Counter({'c':1}) print(c1 + c2) # Counter({'c':3,'a':2,'b':2}) # deque β O(1) on BOTH ends (list.pop(0) is O(n)!) dq = deque([1,2,3], maxlen=5) dq.appendleft(0) # add to left O(1) dq.popleft() # remove from left O(1) dq.rotate(1) # rotate right # maxlen: auto-drops oldest β sliding window! # namedtuple β tuple with field names Point = namedtuple("Point", ["x","y"]) p = Point(3, 4) print(p.x, p.y) # attribute access print(p[0], p[1]) # index access still works print(p._asdict()) # {'x':3,'y':4} # ChainMap β search multiple dicts as one defaults = {"color":"blue","size":"md"} overrides = {"color":"red"} cm = ChainMap(overrides, defaults) print(cm["color"]) # "red" β overrides first print(cm["size"]) # "md" β from defaults
# List comprehension: [expr for item in iterable if condition] squares = [x**2 for x in range(10)] evens = [x for x in range(20) if x % 2 == 0] flat = [x for row in [[1,2],[3,4]] for x in row] # flatten 2D labels = ["even" if x%2==0 else "odd" for x in range(5)] # Dict comprehension word_len = {word: len(word) for word in ["hello","world"]} inverted = {v: k for k, v in {"a":1,"b":2}.items()} # Set comprehension unique_lens = {len(w) for w in ["hi","hello","hey"]} # {2,5,3} # Generator expression β LAZY, no memory allocation gen = (x**2 for x in range(1_000_000)) # no memory used yet! total = sum(x**2 for x in range(1_000_000)) # Generator vs List: # - Generator: iterate once, large data, memory efficient # - List: need indexing, multiple passes, need len() # Walrus operator := (Python 3.8+) β assign and use in expression data = [1, 2, 3, 4, 5, 6, 7, 8] filtered = [y for x in data if (y := x**2) > 10] # Nested comprehension (matrix transposition) matrix = [[1,2,3],[4,5,6],[7,8,9]] transposed = [[row[i] for row in matrix] for i in range(3)] # Practical: filter lines from file with open("log.txt") as f: errors = [line.strip() for line in f if "ERROR" in line]
Control Flow
if/elif/else, loops, match statement, itertools β all patterns with gotchas.
score = 85 if score >= 90: grade = "A" elif score >= 80: grade = "B" else: grade = "F" # Ternary status = "pass" if score >= 50 else "fail" # Chained comparisons (Python-specific!) if 0 <= score <= 100: # cleaner than x>=0 and x<=100 print("valid") # match statement (Python 3.10+) command = "quit" match command: case "quit": print("Quitting") case "go north" | "go south": # OR pattern print("Moving") case str(s) if s.startswith("load"): # guard print(f"Loading: {s}") case _: # wildcard print("Unknown") # match with destructuring point = (0, 5) match point: case (0, 0): print("origin") case (0, y): print(f"y-axis at {y}") case (x, 0): print(f"x-axis at {x}") case (x, y): print(f"({x},{y})")
# enumerate β index + value fruits = ["apple","banana","cherry"] for i, fruit in enumerate(fruits, start=1): print(f"{i}: {fruit}") # zip β parallel iteration names = ["Ravi","Priya"] scores = [95, 88] for name, score in zip(names, scores): print(f"{name}: {score}") # zip stops at shortest β use zip_longest for unequal # for...else β else runs if NO break occurred for n in range(2, 10): for x in range(2, n): if n % x == 0: print(f"{n} composite"); break else: print(f"{n} is prime") # only when no break # while...else β else runs when condition becomes False i = 0 while i < 5: i += 1 else: print("done") # always runs unless break # itertools β lazy combinatorial tools import itertools list(itertools.chain([1,2],[3,4])) # [1,2,3,4] list(itertools.islice(range(100),5)) # [0,1,2,3,4] list(itertools.product("AB",[1,2])) # cartesian product list(itertools.combinations("ABCD",2)) # all 2-combos list(itertools.permutations("ABC",2)) # all 2-perms list(itertools.groupby(sorted("aabbc"), key=lambda x:x)) # groups same consecutive # accumulate β running sum list(itertools.accumulate([1,2,3,4])) # [1,3,6,10]
Functions β
All Patterns
First-class functions, all argument types, the mutable default trap, closures, lambdas, functools.
# Positional, default, keyword def greet(name, greeting="Hello", punctuation="!"): return f"{greeting}, {name}{punctuation}" greet("Ravi") # Hello, Ravi! greet("Ravi", "Hi") # Hi, Ravi! greet("Ravi", punctuation=".") # Hello, Ravi. # *args β variadic positional (tuple inside) def add_all(*numbers): return sum(numbers) add_all(1, 2, 3, 4, 5) # 15 # **kwargs β variadic keyword (dict inside) def create_user(**kwargs): return kwargs create_user(name="Ravi", age=25) # {'name':'Ravi','age':25} # Full signature order: positional / *args / keyword-only / **kwargs def full(a, b, *args, keyword_only=True, **kwargs): print(a, b, args, keyword_only, kwargs) full(1, 2, 3, 4, keyword_only=False, x=10) # a=1, b=2, args=(3,4), keyword_only=False, kwargs={'x':10} # Keyword-only args (after * or *args β MUST be passed by name) def plot(x, y, *, color="blue", linewidth=1): pass plot(1, 2, color="red") # β plot(1, 2, "red") # TypeError! # Positional-only (Python 3.8+, before /) def circle(radius, /, color="blue"): pass circle(5) # β circle(radius=5) # TypeError! # Spread/unpack when calling args = (1, 2, 3) kwargs = {"key": "value"} some_func(*args, **kwargs) # β οΈ MUTABLE DEFAULT ARGUMENT β #1 Python gotcha def bad(item, lst=[]): # [] created ONCE at definition time! lst.append(item) return lst bad(1) # [1] bad(2) # [1, 2] β WRONG! same list object reused! bad(3) # [1, 2, 3] β keeps growing! def good(item, lst=None): # β correct pattern if lst is None: lst = [] lst.append(item) return lst
# Lambda β anonymous one-liner square = lambda x: x**2 add = lambda x, y: x + y # Best use: key= argument people = [("Ravi",25), ("Priya",22), ("Aman",28)] people.sort(key=lambda p: p[1]) # sort by age oldest = max(people, key=lambda p: p[1]) # Sort by multiple keys data = [{"name":"A","age":25},{"name":"B","age":25,"score":90}] data.sort(key=lambda x: (x["age"], x.get("score",0))) # map β apply function to each item (returns iterator) nums = [1, 2, 3, 4] squared = list(map(lambda x: x**2, nums)) # Prefer: [x**2 for x in nums] # filter β keep where function is True (returns iterator) evens = list(filter(lambda x: x%2==0, nums)) # Prefer: [x for x in nums if x%2==0] # functools.reduce β fold from functools import reduce product = reduce(lambda acc, x: acc*x, nums) # 24 # functools.partial β fix some arguments from functools import partial def power(base, exp): return base ** exp square = partial(power, exp=2) cube = partial(power, exp=3) square(5) # 25 cube(3) # 27
from functools import lru_cache, cache, wraps # lru_cache β memoization (Least Recently Used) @lru_cache(maxsize=128) def fib(n): if n < 2: return n return fib(n-1) + fib(n-2) print(fib(50)) # instant! (without cache: 2^50 calls) print(fib.cache_info()) # hits, misses, currsize fib.cache_clear() # clear cache # cache (Python 3.9+) β like lru_cache(maxsize=None) @cache def expensive(n): return n * 2 # wraps β preserve function metadata in decorators def my_decorator(func): @wraps(func) # β ALWAYS include this! def wrapper(*args, **kwargs): return func(*args, **kwargs) return wrapper # Without @wraps: func.__name__ == "wrapper" (wrong) # With @wraps: func.__name__ == "original" (correct) # total_ordering β define < and == only, get all comparisons from functools import total_ordering @total_ordering class Student: def __init__(self, grade): self.grade = grade def __eq__(self, other): return self.grade == other.grade def __lt__(self, other): return self.grade < other.grade # <=, >, >= are auto-generated!
OOP β Classes
& Internals
Class anatomy, inheritance, MRO, properties, __slots__, dataclasses, abstract classes.
class Animal:
# Class attribute β shared by ALL instances
kingdom = "Animalia"
count = 0
def __init__(self, name: str, sound: str):
self.name = name # public instance attribute
self._sound = sound # protected (convention only, still accessible)
self.__id = id(self) # private (name-mangled to _Animal__id)
Animal.count += 1
# Instance method β receives self
def speak(self) -> str:
return f"{self.name} says {self._sound}"
# Class method β receives cls, used for alternative constructors
@classmethod
def from_dict(cls, data: dict) -> "Animal":
return cls(data["name"], data["sound"])
@classmethod
def get_count(cls) -> int:
return cls.count
# Static method β no self/cls, just namespace organization
@staticmethod
def is_valid_name(name: str) -> bool:
return bool(name) and name.isalpha()
# Property β getter/setter without breaking API
@property
def sound(self) -> str:
return self._sound
@sound.setter
def sound(self, value: str):
if not value:
raise ValueError("Sound cannot be empty")
self._sound = value.upper()
@sound.deleter
def sound(self):
del self._sound
def __repr__(self): return f"Animal({self.name!r})"
def __str__(self): return self.name
def __del__(self): Animal.count -= 1 # called on garbage collection
# name mangling β access "private" attribute
dog = Animal("Rex", "woof")
print(dog._Animal__id) # accessible but shouldn't be
dog.sound = "bark" # calls setter β stores "BARK"
# classmethod as factory (alternative constructor)
dog2 = Animal.from_dict({"name":"Buddy","sound":"bark"})- "Difference between @classmethod and @staticmethod?" β classmethod gets cls (can create instances, access class state). staticmethod gets nothing β just a namespaced function.
- "What is name mangling?" β __attr becomes _ClassName__attr. Prevents accidental override in subclasses, NOT true privacy.
class A:
def method(self): return "A"
class B(A):
def method(self): return "Bβ" + super().method()
class C(A):
def method(self): return "Cβ" + super().method()
class D(B, C): # Diamond inheritance
pass
d = D()
print(d.method()) # "BβCβA" β MRO: D, B, C, A, object
print(D.__mro__) # (<class 'D'>, <class 'B'>, <class 'C'>, <class 'A'>, <class 'object'>)
print(D.mro()) # same as list
# super() uses MRO β NOT just the direct parent!
# That's why B.super() calls C, not A, in this diamond
# isinstance and issubclass
print(isinstance(d, B)) # True
print(isinstance(d, A)) # True β A is ancestor
print(issubclass(D, C)) # True
print(issubclass(int, object)) # True β everything is object
# Abstract base classes
from abc import ABC, abstractmethod
class Shape(ABC):
@abstractmethod
def area(self) -> float: pass # must implement
@abstractmethod
def perimeter(self) -> float: pass
def describe(self): # concrete method β can use!
return f"Area={self.area()}"
class Circle(Shape):
def __init__(self, r): self.r = r
def area(self): return 3.14159 * self.r**2
def perimeter(self): return 2 * 3.14159 * self.r
# Shape() β TypeError: Can't instantiate abstract classfrom dataclasses import dataclass, field from typing import ClassVar @dataclass class User: name: str email: str age: int = 0 # mutable default MUST use field(default_factory=...) tags: list = field(default_factory=list) _id: str = field(init=False, repr=False) # not in __init__ count: ClassVar[int] = 0 # class variable, not a field def __post_init__(self): # runs after auto __init__ self._id = f"user_{self.name.lower()}" User.count += 1 # frozen=True β immutable (auto __hash__ too) @dataclass(frozen=True) class Point: x: float y: float # order=True β auto __lt__, __le__, __gt__, __ge__ @dataclass(order=True) class Version: major: int minor: int patch: int v1 = Version(1,2,0); v2 = Version(1,3,0) print(v1 < v2) # True β compares field by field! # slots=True (Python 3.10+) β memory efficient @dataclass(slots=True) class FastPoint: x: float y: float
class Regular:
def __init__(self, x, y):
self.x = x; self.y = y
# has __dict__ β flexible but heavy
class Slotted:
__slots__ = ['x', 'y'] # declare allowed attributes
def __init__(self, x, y):
self.x = x; self.y = y
# NO __dict__ β fast, memory-efficient
import sys
r = Regular(1, 2)
s = Slotted(1, 2)
print(sys.getsizeof(r)) # ~168 bytes
print(sys.getsizeof(s)) # ~56 bytes
# Slotted cannot have dynamic attributes
s.z = 3 # AttributeError!
# Use case: ML data rows, game objects, anything with millions of instances
# Inheritance with slots
class Base:
__slots__ = ['x']
class Child(Base):
__slots__ = ['y'] # add y; x inherited from Base
# if Child omits __slots__, it gets __dict__ back!Dunder / Magic
Methods
Python's data model β make your objects behave like built-in types. The foundation of everything.
class Vector:
def __init__(self, x, y):
self.x, self.y = x, y
# ββ REPRESENTATION
def __repr__(self): return f"Vector({self.x},{self.y})" # unambiguous, for devs
def __str__(self): return f"({self.x},{self.y})" # user-friendly
def __format__(self, spec): return f"({self.x:{spec}},{self.y:{spec}})"
# ββ ARITHMETIC
def __add__(self, o): return Vector(self.x+o.x, self.y+o.y) # v1+v2
def __sub__(self, o): return Vector(self.x-o.x, self.y-o.y) # v1-v2
def __mul__(self, s): return Vector(self.x*s, self.y*s) # v*3
def __rmul__(self, s): return self.__mul__(s) # 3*v
def __truediv__(self, s): return Vector(self.x/s, self.y/s) # v/2
def __neg__(self): return Vector(-self.x, -self.y) # -v
def __abs__(self): return (self.x**2 + self.y**2)**0.5 # abs(v)
def __pow__(self, n): return Vector(self.x**n, self.y**n) # v**2
# ββ COMPARISON
def __eq__(self, o): return self.x==o.x and self.y==o.y # v1==v2
def __lt__(self, o): return abs(self) < abs(o) # v1 < v2
def __le__(self, o): return abs(self) <= abs(o)
# ββ CONTAINER PROTOCOL
def __len__(self): return 2 # len(v)
def __getitem__(self, i): return (self.x,self.y)[i] # v[0]
def __setitem__(self, i, val): # v[0]=5
if i==0: self.x=val
elif i==1: self.y=val
def __iter__(self): return iter((self.x,self.y)) # for c in v
def __contains__(self, val): return val in (self.x,self.y) # x in v
def __reversed__(self): return iter((self.y,self.x))
# ββ BOOLEAN & HASH
def __bool__(self): return self.x!=0 or self.y!=0 # bool(v)
def __hash__(self): return hash((self.x,self.y)) # use in set/dict
# ββ ATTRIBUTE ACCESS
def __getattr__(self, name): # called if attr NOT found
raise AttributeError(f"No attribute {name!r}")
def __setattr__(self, name, val):
super().__setattr__(name, val) # called on EVERY set
# ββ CALLABLE
def __call__(self, scale): return Vector(self.x*scale, self.y*scale) # v(2)
# ββ CONTEXT MANAGER
def __enter__(self): return self # with v as obj:
def __exit__(self, exc_type, exc_val, tb): return False
# ββ NUMERIC PROTOCOL (augmented assignment)
def __iadd__(self, o): # v += other (in-place if possible)
self.x += o.x; self.y += o.y
return self
# Test
v1 = Vector(3,4); v2 = Vector(1,2)
print(v1 + v2) # (4,6)
print(2 * v1) # (6,8) β rmul
print(abs(v1)) # 5.0
print(v1(3)) # (9,12) β callable!
print(list(v1)) # [3,4] β iterable
s = {v1, v2} # hashable β set works
print(f"{v1:.2f}") # (3.00,4.00) β formatDecorators &
Descriptors
Decorators are just syntactic sugar for higher-order functions. Understand them from first principles.
import functools, time # @decorator is syntactic sugar for: func = decorator(func) # Basic decorator def timer(func): @functools.wraps(func) # ALWAYS include β preserves __name__, __doc__ def wrapper(*args, **kwargs): start = time.perf_counter() result = func(*args, **kwargs) print(f"{func.__name__}: {time.perf_counter()-start:.4f}s") return result return wrapper @timer def slow(): time.sleep(0.1) # Decorator with arguments β needs 3 levels def retry(max_attempts=3, exceptions=(Exception,)): def decorator(func): @functools.wraps(func) def wrapper(*args, **kwargs): for attempt in range(max_attempts): try: return func(*args, **kwargs) except exceptions as e: if attempt == max_attempts - 1: raise print(f"Attempt {attempt+1} failed: {e}") return wrapper return decorator @retry(max_attempts=3, exceptions=(ConnectionError,)) def fetch(url): pass # Class-based decorator (for stateful decorators) class Cache: def __init__(self, func): self.func = func self.cache = {} functools.update_wrapper(self, func) def __call__(self, *args): if args not in self.cache: self.cache[args] = self.func(*args) return self.cache[args] @Cache def fibonacci(n): if n < 2: return n return fibonacci(n-1) + fibonacci(n-2) # Stacking decorators β applied bottom-up @timer @retry(max_attempts=3) def flaky_network_call(): pass # equivalent to: timer(retry(3)(flaky_network_call)) # Property decorator β most common built-in decorator class Circle: def __init__(self, radius): self._radius = radius @property def radius(self): return self._radius @radius.setter def radius(self, v): if v < 0: raise ValueError("Radius must be positive") self._radius = v @property def area(self): # computed property β no setter return 3.14159 * self._radius ** 2
# Descriptor β reusable validation for any attribute class PositiveNumber: def __set_name__(self, owner, name): self.name = name # called when class is created def __get__(self, obj, objtype=None): if obj is None: return self # accessed from class return getattr(obj, f"_{self.name}", 0) def __set__(self, obj, value): if value <= 0: raise ValueError(f"{self.name} must be positive") setattr(obj, f"_{self.name}", value) class Product: price = PositiveNumber() # descriptor instance weight = PositiveNumber() # reused for different attr def __init__(self, price, weight): self.price = price # triggers PositiveNumber.__set__ self.weight = weight p = Product(10.0, 2.5) p.price = -5 # ValueError: price must be positive # How property is implemented (roughly) class property_impl: def __init__(self, fget, fset=None, fdel=None): self.fget = fget self.fset = fset def __get__(self, obj, objtype=None): if obj is None: return self return self.fget(obj) def __set__(self, obj, value): if self.fset is None: raise AttributeError("can't set attribute") self.fset(obj, value)
Iterators &
Generators
The protocol powering all Python loops, comprehensions, and lazy evaluation.
# Iterable: has __iter__ β returns an iterator # Iterator: has __iter__ AND __next__ lst = [1, 2, 3] it = iter(lst) # create iterator print(next(it)) # 1 print(next(it)) # 2 print(next(it)) # 3 print(next(it)) # StopIteration! # for loop is just sugar for this: it = iter(lst) while True: try: item = next(it) print(item) except StopIteration: break # Custom iterator class class Countdown: def __init__(self, start): self.current = start def __iter__(self): # makes it iterable return self def __next__(self): # makes it an iterator if self.current <= 0: raise StopIteration self.current -= 1 return self.current + 1 for n in Countdown(3): print(n) # 3, 2, 1
# Generator function β uses yield def countdown(start): while start > 0: yield start # pause here, return value start -= 1 # resume from here next time # StopIteration raised automatically when function ends gen = countdown(3) print(next(gen)) # 3 print(next(gen)) # 2 for n in countdown(5): print(n) # 5,4,3,2,1 # Infinite generator β impossible with list! def integers(start=0): while True: yield start start += 1 gen = integers() print(next(gen)) # 0 print(next(gen)) # 1 β generates forever # Generator pipeline β chained lazy evaluation def read_large_file(path): with open(path) as f: for line in f: # reads ONE line at a time yield line.strip() def filter_errors(lines): for line in lines: if "ERROR" in line: # processes one line at a time yield line def parse_error(lines): for line in lines: yield line.split("|") # processes one at a time # Memory: O(1) β only one line in memory at any time! errors = parse_error(filter_errors(read_large_file("app.log"))) # yield from β delegate to sub-generator def chain(*iterables): for it in iterables: yield from it list(chain([1,2], [3,4], [5])) # [1,2,3,4,5] # Generator send() β bidirectional def accumulator(): total = 0 while True: value = yield total # yield sends out, receives in total += value acc = accumulator() next(acc) # prime the generator print(acc.send(10)) # 10 print(acc.send(20)) # 30
Memory &
Garbage Collection
Reference counting, cycle detection, weak references β how Python manages memory.
# CPython uses REFERENCE COUNTING as primary GC # Every object has a refcount β when it reaches 0, object is freed immediately import sys a = [1, 2, 3] print(sys.getrefcount(a)) # 2 (a + the getrefcount call arg) b = a print(sys.getrefcount(a)) # 3 (a + b + getrefcount arg) del b print(sys.getrefcount(a)) # 2 (back to 2) # Reference counting fails on circular references class Node: def __init__(self, val): self.val = val self.next = None a = Node(1) b = Node(2) a.next = b # a refs b (b refcount=2) b.next = a # b refs a (a refcount=2) del a; del b # refcounts go to 1, not 0 β LEAK! # Both objects still in memory because they ref each other # Python's CYCLIC GARBAGE COLLECTOR handles this import gc gc.collect() # manually trigger cycle collection print(gc.get_count()) # objects in each generation (0,1,2) gc.disable() # disable for performance-critical code # Weak references β reference that doesn't increase refcount import weakref class Expensive: def __del__(self): print("Expensive deleted!") obj = Expensive() weak = weakref.ref(obj) # doesn't prevent garbage collection print(weak()) # Expensive object del obj # refcountβ0 β freed immediately print(weak()) # None β object is gone # WeakValueDictionary β cache that doesn't prevent GC cache = weakref.WeakValueDictionary() data = Expensive() cache["key"] = data del data # freed! cache["key"] returns None # Memory profiling import tracemalloc tracemalloc.start() # ... your code ... snapshot = tracemalloc.take_snapshot() top = snapshot.statistics("lineno") for stat in top[:5]: print(stat)
- "How does Python manage memory?" β Reference counting (primary) + cyclic GC (for cycles). Immediate deallocation when refcount hits 0.
- "What is a memory leak in Python?" β Circular references that GC misses, or keeping references alive unintentionally (global lists, class variables).
- "When would you use weakref?" β Caches, observer patterns β when you want to reference an object without keeping it alive.
π₯ The GIL β
The Real Truth
The Global Interpreter Lock β why Python isn't "truly" multithreaded, and what to do about it.
# WHY the GIL exists: # CPython's memory management (reference counting) is NOT thread-safe. # The GIL prevents two threads from simultaneously modifying refcounts, # which would corrupt memory. It was simpler to add one big lock than # fine-grained locks on every object. # CONSEQUENCE 1: Threading does NOT speed up CPU-bound code import threading, time def count_up(n): total = 0 for i in range(n): total += i return total # Single-threaded start = time.time() count_up(10_000_000) count_up(10_000_000) print(f"Sequential: {time.time()-start:.2f}s") # ~0.5s # Multi-threaded β NOT faster! GIL means they take turns start = time.time() t1 = threading.Thread(target=count_up, args=(10_000_000,)) t2 = threading.Thread(target=count_up, args=(10_000_000,)) t1.start(); t2.start() t1.join(); t2.join() print(f"Threaded: {time.time()-start:.2f}s") # also ~0.5s (maybe slower!) # CONSEQUENCE 2: Threading DOES help with I/O-bound code! # When a thread waits for I/O (network, disk, sleep), # it RELEASES the GIL β other threads can run! import time def fetch_url(url): time.sleep(1) # simulates network wait (releases GIL) # Sequential: 3 seconds for url in ["a","b","c"]: fetch_url(url) # Threaded: ~1 second β all sleep simultaneously! threads = [threading.Thread(target=fetch_url, args=(u,)) for u in ["a","b","c"]] for t in threads: t.start() for t in threads: t.join() # SOLUTION for CPU-bound: multiprocessing (separate processes, no GIL) from multiprocessing import Pool with Pool(4) as p: # 4 separate processes, each with own GIL results = p.map(count_up, [10_000_000]*4) # This IS 4x faster on 4 cores!
| Scenario | Best Tool | Why |
|---|---|---|
| CPU-bound (math, ML, compression) | multiprocessing | Bypasses GIL with separate processes |
| I/O-bound (HTTP, DB, files) | threading or asyncio | GIL released during I/O waits |
| Many concurrent I/O tasks | asyncio | Single thread, event loop, no GIL needed |
| CPU + C extensions (numpy) | threading works! | NumPy releases GIL in C code |
- "Is Python multithreaded?" β Python HAS threads but the GIL means only one executes Python bytecode at a time. True parallelism requires multiprocessing.
- "When is threading still useful despite the GIL?" β I/O-bound tasks β threads release GIL while waiting for network/disk, allowing other threads to run.
- "What is the GIL and why does it exist?" β A mutex in CPython protecting the reference counting memory model. Makes CPython thread-safe by serializing access, but at the cost of true parallelism.
- "How do you achieve true parallelism in Python?" β multiprocessing module (separate processes), Cython with nogil, or use async for I/O concurrency.
- "Will the GIL ever be removed?" β Python 3.13 introduced an experimental no-GIL mode (PEP 703). Not default yet but coming.
Threading Β· Multiprocessing
Β· Async/Await
Three concurrency models β know exactly when to use each and how they work.
import threading, queue, time # Basic thread def worker(name, delay): time.sleep(delay) print(f"{name} done") t = threading.Thread(target=worker, args=("T1", 1), daemon=True) t.start() t.join() # wait for thread to finish # Thread pool β reuse threads from concurrent.futures import ThreadPoolExecutor urls = ["url1","url2","url3","url4","url5"] with ThreadPoolExecutor(max_workers=5) as executor: futures = [executor.submit(fetch, url) for url in urls] results = [f.result() for f in futures] # blocks until done # Or use map (like Pool.map) with ThreadPoolExecutor(max_workers=5) as ex: results = list(ex.map(fetch, urls)) # Thread safety β Race condition counter = 0 lock = threading.Lock() def increment(): global counter with lock: # acquire lock, auto-release counter += 1 threads = [threading.Thread(target=increment) for _ in range(100)] for t in threads: t.start() for t in threads: t.join() print(counter) # 100 β correct with lock # Other synchronization event = threading.Event() event.set() # signal all waiting threads event.wait() # block until event is set event.clear() sem = threading.Semaphore(5) # allow 5 threads at once with sem: pass # auto acquire/release # Thread-safe queue q = queue.Queue(maxsize=10) q.put("item") item = q.get() # blocks if empty q.task_done()
from multiprocessing import Pool, Process, Queue, Value, Array, Manager from concurrent.futures import ProcessPoolExecutor import os # Process pool β the right way def cpu_work(n): return sum(i**2 for i in range(n)) with Pool(processes=4) as pool: # 4 separate processes results = pool.map(cpu_work, [10**6]*4) # parallel! # ProcessPoolExecutor β modern API with ProcessPoolExecutor(max_workers=4) as ex: futures = list(ex.map(cpu_work, [10**6]*4)) # Shared state between processes shared_val = Value('i', 0) # shared int shared_arr = Array('d', 10) # shared double[10] def worker(val): with val.get_lock(): val.value += 1 # Manager β more flexible shared state with Manager() as manager: shared_list = manager.list([]) shared_dict = manager.dict({}) # Process communication via Queue def producer(q): for i in range(5): q.put(i) q.put(None) # sentinel def consumer(q): while (item := q.get()) is not None: print(item) q = Queue() p1 = Process(target=producer, args=(q,)) p2 = Process(target=consumer, args=(q,)) p1.start(); p2.start() p1.join(); p2.join() # Key: processes have SEPARATE memory β no shared state by default # Each process is a full Python interpreter with its own GIL
import asyncio, aiohttp # Coroutine β async function, returns coroutine object async def fetch(session, url): async with session.get(url) as response: return await response.json() # await releases control to event loop # Run coroutines CONCURRENTLY (not parallel β single thread!) async def main(): async with aiohttp.ClientSession() as session: urls = ["https://api1.com", "https://api2.com", "https://api3.com"] # asyncio.gather β run all concurrently, wait for all results = await asyncio.gather( *[fetch(session, url) for url in urls] ) # asyncio.gather with return_exceptions β don't fail on one error results = await asyncio.gather( *[fetch(session, url) for url in urls], return_exceptions=True ) asyncio.run(main()) # entry point β Python 3.7+ # async context manager and async iterator class AsyncFileReader: async def __aenter__(self): self.file = open("data.txt") return self async def __aexit__(self, *args): self.file.close() async def __aiter__(self): return self async def __anext__(self): line = self.file.readline() if not line: raise StopAsyncIteration await asyncio.sleep(0) # yield control return line # asyncio.TaskGroup (Python 3.11+) β better error handling async def main(): async with asyncio.TaskGroup() as tg: task1 = tg.create_task(coroutine1()) task2 = tg.create_task(coroutine2()) # waits for both; if one fails, cancels others # asyncio.Queue β producer/consumer pattern async def producer(q): for i in range(5): await q.put(i) await asyncio.sleep(0.1) async def consumer(q): while True: item = await q.get() print(item) q.task_done() # Timeout try: result = await asyncio.wait_for(slow_coroutine(), timeout=5.0) except asyncio.TimeoutError: print("Timed out!") # asyncio.sleep(0) β yield control without actual sleep # Use in CPU loops to keep event loop responsive
- "async/await vs threading?" β asyncio: single thread, no context switching cost, great for many I/O tasks. threading: multiple threads with OS scheduling, simpler code for simple cases.
- "What is a coroutine?" β A function that can be paused (at await) and resumed. Not a thread.
- "Can asyncio run CPU-bound code?" β It CAN but SHOULDN'T β it blocks the event loop. Use loop.run_in_executor() with a ThreadPoolExecutor or ProcessPoolExecutor.
Exceptions &
Error Handling
# Exception hierarchy (important ones) # BaseException # βββ SystemExit (sys.exit()) # βββ KeyboardInterrupt (Ctrl+C) # βββ Exception (catch-all for "normal" errors) # βββ ValueError (wrong value type) # βββ TypeError (wrong type) # βββ KeyError (dict key not found) # βββ IndexError (list index out of range) # βββ AttributeError (attribute not found) # βββ NameError (variable not defined) # βββ RuntimeError # βββ StopIteration (iterator exhausted) # βββ OSError (IOError, FileNotFoundError) # βββ ArithmeticError (ZeroDivisionError) # Full try/except/else/finally try: result = int("abc") # raises ValueError except ValueError as e: print(f"Value error: {e}") except (TypeError, KeyError) as e: # multiple exceptions print(f"Type/Key error: {e}") except Exception as e: # catch-all (use sparingly) print(f"Unexpected: {type(e).__name__}: {e}") raise # re-raise β don't swallow! else: print("No exception β this runs if try succeeded") finally: print("Always runs β cleanup here") # Custom exceptions class AppError(Exception): """Base exception for this app""" pass class ValidationError(AppError): def __init__(self, field: str, message: str): self.field = field self.message = message super().__init__(f"{field}: {message}") class NotFoundError(AppError): def __init__(self, resource: str, id: int): super().__init__(f"{resource} with id={id} not found") self.resource = resource self.id = id # Raise with context try: data = parse_json(raw) except json.JSONDecodeError as e: raise ValidationError("body", "Invalid JSON") from e # "from e" preserves original traceback as __cause__ # Exception groups (Python 3.11+) try: raise ExceptionGroup("multiple errors", [ ValueError("bad value"), TypeError("bad type"), ]) except* ValueError as eg: # except* for groups for e in eg.exceptions: print(f"ValueError: {e}") # Context managers for cleanup (see files section too) from contextlib import suppress with suppress(FileNotFoundError): # silently ignore error os.remove("maybe_exists.txt")
- Never use bare
except:β always at minimumexcept Exception: - Never silently swallow exceptions without logging
- Create a custom exception hierarchy for your app
- Use
raise ... from eto preserve exception chain - Put only the minimum code in try block, not everything
Modules, Packages
& Imports
# How import works: # 1. Check sys.modules cache first (already imported? return cached) # 2. Find module file (searches sys.path) # 3. Compile to bytecode # 4. Execute module code # 5. Store result in sys.modules import sys print(sys.path) # list of directories Python searches print(sys.modules.keys()) # all currently loaded modules # Import styles import math # import module import math as m # alias from math import sqrt, pi # import specific names from math import sqrt as sq # alias specific name from math import * # β οΈ avoid β pollutes namespace # Relative imports (inside packages) from . import sibling_module # same directory from .. import parent_module # parent directory from .utils import helper # specific from sibling # if __name__ == "__main__" # When you run: python script.py β __name__ = "__main__" # When you import: import script β __name__ = "script" if __name__ == "__main__": main() # only runs when script executed directly # Lazy/conditional import try: import ujson as json # fast JSON except ImportError: import json # fallback # importlib β dynamic imports import importlib module = importlib.import_module("math") func = getattr(module, "sqrt") func(16) # 4.0 # Reload a module (dev use only) importlib.reload(math) # __init__.py β makes a directory a package # mypackage/ # __init__.py β can be empty or run setup code # utils.py # models/ # __init__.py # user.py # __all__ β controls what "from module import *" exports __all__ = ["PublicClass", "public_function"] # in module # Circular imports β common problem # a.py: from b import B # b.py: from a import A β CircularImportError! # Fix: import inside function, use importlib, or restructure
File I/O &
Context Managers
# Always use 'with' β auto-closes file even on exception with open("data.txt", "r", encoding="utf-8") as f: content = f.read() # entire file as string lines = content.splitlines() # list of lines (no \n) # File modes # "r" β read (default) # "w" β write (overwrites!) # "a" β append # "x" β exclusive create (fails if exists) # "r+" β read+write # "rb" β binary read # Efficient line-by-line (lazy β O(1) memory) with open("large_file.txt") as f: for line in f: # reads one line at a time process(line) # Write with open("out.txt", "w") as f: f.write("Hello\n") f.writelines(["line1\n","line2\n"]) print("formatted", file=f) # redirect print to file # JSON import json with open("data.json") as f: data = json.load(f) # file β dict with open("out.json","w") as f: json.dump(data, f, indent=2, ensure_ascii=False) json.loads('{"key":"val"}') # string β dict json.dumps({"key":"val"}) # dict β string # CSV import csv with open("data.csv") as f: reader = csv.DictReader(f) # row as dict for row in reader: print(row["name"]) with open("out.csv","w",newline="") as f: writer = csv.DictWriter(f, fieldnames=["name","age"]) writer.writeheader() writer.writerow({"name":"Ravi","age":25}) # pathlib (modern, preferred over os.path) from pathlib import Path p = Path("data") / "logs" / "app.log" print(p.exists(), p.is_file(), p.suffix) # .log p.mkdir(parents=True, exist_ok=True) text = p.read_text(encoding="utf-8") p.write_text("content") for f in Path(".").glob("*.py"): print(f) list(Path(".").rglob("*.py")) # recursive
# Context manager = __enter__ + __exit__ # 'with' guarantees __exit__ runs even on exception # Class-based context manager class DatabaseConnection: def __init__(self, url): self.url = url self.conn = None def __enter__(self): self.conn = connect(self.url) return self.conn # what 'as' binds to def __exit__(self, exc_type, exc_val, traceback): self.conn.close() if exc_type: print(f"Error: {exc_val}") return False # False = don't suppress exceptions # True = suppress exception (rarely want this) with DatabaseConnection("sqlite:///db") as conn: conn.execute("SELECT * FROM users") # @contextmanager β generator-based (simpler!) from contextlib import contextmanager @contextmanager def timer(name): import time start = time.perf_counter() try: yield # code inside 'with' runs here finally: # always cleanup elapsed = time.perf_counter() - start print(f"{name}: {elapsed:.3f}s") with timer("my operation"): time.sleep(0.1) # "my operation: 0.100s" # contextlib utilities from contextlib import suppress, redirect_stdout, nullcontext import io with suppress(FileNotFoundError): # ignore specific errors os.remove("maybe.txt") f = io.StringIO() with redirect_stdout(f): # capture print output print("captured!") output = f.getvalue() # "captured!\n" # Multiple context managers with open("in.txt") as fin, open("out.txt","w") as fout: fout.write(fin.read())
Built-ins &
Standard Library
Python's batteries-included β essential built-in functions and must-know stdlib modules.
# Type conversion int("42"), float("3.14"), str(42), bool(0) list("abc") # ['a','b','c'] tuple([1,2,3]) # (1,2,3) set([1,2,2,3]) # {1,2,3} dict(a=1,b=2) # {'a':1,'b':2} # Numeric abs(-5) # 5 round(3.14159, 2) # 3.14 round(2.5) # 2 β banker's rounding! (rounds to even) round(3.5) # 4 pow(2, 10) # 1024 pow(2, 10, 100) # 1024 % 100 = 24 (modular exponentiation) divmod(17, 5) # (3, 2) β quotient and remainder # Iterables len([1,2,3]) # 3 sum([1,2,3]) # 6 sum([1,2,3], start=10) # 16 min([3,1,2]) # 1 max([3,1,2]) # 3 min("Ravi","Priya", key=len) # "Ravi" (shorter) sorted([3,1,2]) # [1,2,3] sorted("Python") # ['P','h','n','o','t','y'] reversed([1,2,3]) # iterator enumerate(["a","b"], 1) # (1,'a'),(2,'b') zip([1,2],[3,4]) # (1,3),(2,4) map(str, [1,2,3]) # iterator of strings filter(None, [0,1,False,"",2]) # [1,2] β filter falsy any([False,0,True]) # True all([True,1,"yes"]) # True all([True,0,"yes"]) # False # Object inspection type(42) # <class 'int'> isinstance(42,int) # True id(42) # memory address hash("hello") # integer hash dir([]) # list of attributes/methods vars(obj) # obj.__dict__ hasattr(obj,"x") # True if obj.x exists getattr(obj,"x",42) # obj.x or default 42 setattr(obj,"x",5) # obj.x = 5 delattr(obj,"x") # del obj.x callable(func) # True if callable repr(obj) # obj.__repr__() # Input/Output input("Enter: ") # always returns string print("a","b","c", sep="-", end="\n") open("file.txt","r") print(*[1,2,3]) # unpack: 1 2 3
# os β operating system interface import os os.getcwd() # current working directory os.listdir(".") # list files/dirs os.makedirs("a/b/c", exist_ok=True) os.remove("file.txt") os.rename("old.txt","new.txt") os.environ.get("API_KEY","default") os.path.join("dir","file.txt") # platform-safe path os.path.exists("file.txt") os.path.basename("/path/to/file.txt") # "file.txt" # sys β interpreter info import sys sys.argv # command line args sys.exit(0) # exit program sys.path # module search paths sys.version # Python version string sys.getsizeof(obj) # memory size in bytes sys.setrecursionlimit(10000) # datetime from datetime import datetime, date, timedelta now = datetime.now() today = date.today() fmt = now.strftime("%Y-%m-%d %H:%M:%S") parsed = datetime.strptime("2024-01-15", "%Y-%m-%d") tomorrow = today + timedelta(days=1) diff = datetime(2024,12,31) - datetime.now() print(diff.days) # math import math math.sqrt(16) # 4.0 math.floor(3.7) # 3 math.ceil(3.2) # 4 math.log(100, 10) # 2.0 math.pi, math.e, math.inf, math.nan math.isclose(a,b,rel_tol=1e-9) # random import random random.random() # float [0.0, 1.0) random.randint(1, 10) # int [1, 10] inclusive random.choice(["a","b","c"]) random.choices(lst, k=3) # sample WITH replacement random.sample(lst, k=3) # sample WITHOUT replacement random.shuffle(lst) # in-place shuffle random.seed(42) # reproducible results # copy import copy copy.copy(obj) # shallow copy copy.deepcopy(obj) # deep copy (recursive) # hashlib β hashing import hashlib h = hashlib.sha256(b"hello").hexdigest() hashlib.md5(b"data").digest() # raw bytes
Type Hints &
Annotations
Python 3.5+ type system β not enforced at runtime but critical for large codebases and tools like mypy.
from typing import Optional, Union, Any, Callable, TypeVar, Generic from typing import TypedDict, Protocol, Literal, Final from collections.abc import Iterator, Generator, Sequence, Mapping # Basic annotations name: str = "Ravi" age: int = 25 scores: list[float] = [9.5, 8.0] # Python 3.9+ β no List from typing mapping: dict[str, int] = {"a": 1} pair: tuple[int, str] = (1, "hello") fixed: tuple[int, ...] = (1, 2, 3) # variable-length tuple # Optional β can be None (Python 3.10+: int | None) def greet(name: str, title: str | None = None) -> str: return f"{title} {name}" if title else name # Union (Python 3.10+: use X | Y instead) def process(data: str | int | list) -> None: pass # Callable def apply(func: Callable[[int, int], int], a: int, b: int) -> int: return func(a, b) # TypeVar β generic type variable T = TypeVar("T") def first(items: list[T]) -> T: return items[0] # Generic class class Stack(Generic[T]): def __init__(self): self._items: list[T] = [] def push(self, item: T) -> None: self._items.append(item) def pop(self) -> T: return self._items.pop() s: Stack[int] = Stack() # TypedDict β typed dict structure class UserDict(TypedDict): name: str age: int email: str | None user: UserDict = {"name":"Ravi","age":25,"email":None} # Protocol β structural subtyping (duck typing with types) class Drawable(Protocol): def draw(self) -> None: ... # just needs draw() method def render(obj: Drawable) -> None: obj.draw() class Circle: def draw(self): print("O") render(Circle()) # works! Circle implements Drawable Protocol # Literal β specific values only from typing import Literal Mode = Literal["r","w","a","rb","wb"] def open_file(path: str, mode: Mode) -> None: pass # Final β can't be reassigned MAX_SIZE: Final = 100 # TYPE_CHECKING β avoid circular imports from typing import TYPE_CHECKING if TYPE_CHECKING: from mymodule import HeavyClass # only imported by type checker def process(obj: "HeavyClass") -> None: pass # forward ref as string # Python 3.12 β new syntax type Vector = list[float] # type alias type Point[T] = tuple[T, T] # generic alias def first[T](items: list[T]) -> T: ... # generic function
Metaclasses &
Class Creation
The deepest Python magic β classes that create classes. Understanding this separates Python experts from everyone else.
# Classes are instances of 'type' print(type(int)) # <class 'type'> print(type(str)) # <class 'type'> print(type(type)) # <class 'type'> β type is its own metaclass! # Create a class dynamically with type() # type(name, bases, namespace) MyClass = type("MyClass", (object,), { "x": 42, "greet": lambda self: f"Hello, I have x={self.x}" }) obj = MyClass() print(obj.greet()) # "Hello, I have x=42" # Custom metaclass β intercept class creation class SingletonMeta(type): _instances = {} def __call__(cls, *args, **kwargs): # Called when creating an instance: MyClass() if cls not in cls._instances: cls._instances[cls] = super().__call__(*args, **kwargs) return cls._instances[cls] class Database(metaclass=SingletonMeta): def __init__(self): self.connection = "connected" db1 = Database() db2 = Database() print(db1 is db2) # True β Singleton! # Metaclass __new__ β intercept class DEFINITION class RegistryMeta(type): registry = {} def __new__(mcs, name, bases, namespace): cls = super().__new__(mcs, name, bases, namespace) if bases: # don't register base class itself RegistryMeta.registry[name] = cls return cls class Plugin(metaclass=RegistryMeta): pass class AudioPlugin(Plugin): pass class VideoPlugin(Plugin): pass print(RegistryMeta.registry) # {'AudioPlugin': <class...>, 'VideoPlugin': <class...>} # __init_subclass__ β modern alternative to metaclasses class Plugin: _registry = {} def __init_subclass__(cls, plugin_type=None, **kwargs): super().__init_subclass__(**kwargs) if plugin_type: Plugin._registry[plugin_type] = cls class AudioPlugin(Plugin, plugin_type="audio"): pass class VideoPlugin(Plugin, plugin_type="video"): pass print(Plugin._registry) # {'audio': AudioPlugin, 'video': VideoPlugin} # Class creation order: # 1. metaclass.__prepare__ (create namespace dict) # 2. Execute class body in that namespace # 3. metaclass.__new__ (create class object) # 4. metaclass.__init__ (initialize class object)
- "What is a metaclass?" β A class whose instances are classes. type is the default metaclass. Used for class registration, validation, singleton patterns.
- "When would you use a metaclass?" β ORM frameworks (SQLAlchemy), plugin systems, enforcing coding conventions. For most problems, prefer __init_subclass__ or class decorators.
- "What is the difference between __new__ and __init__?" β __new__ creates the object (returns it). __init__ initializes it (receives it as self). __new__ is called first.
Functional
Programming
Python supports functional patterns β pure functions, immutability, higher-order functions, itertools, operator.
import operator, functools, itertools # operator module β functions for operators operator.add(1, 2) # 3 (same as 1+2) operator.itemgetter(1)([10,20,30]) # 20 operator.attrgetter("name")(user) # user.name data = [{"name":"Ravi","age":25},{"name":"Priya","age":22}] data.sort(key=operator.itemgetter("age")) # sort by age field # functools.reduce β fold/accumulate product = functools.reduce(operator.mul, [1,2,3,4,5]) # 120 # Higher-order functions def compose(*funcs): """Compose functions: compose(f,g,h)(x) == f(g(h(x)))""" def composed(x): for f in reversed(funcs): x = f(x) return x return composed double = lambda x: x * 2 add1 = lambda x: x + 1 square = lambda x: x ** 2 pipeline = compose(double, add1, square) # double(add1(square(x))) print(pipeline(3)) # double(add1(9)) = double(10) = 20 # itertools β lazy combinatorial tools list(itertools.chain([1,2],[3,4],[5])) # [1,2,3,4,5] list(itertools.chain.from_iterable([[1,2],[3,4]])) # flatten one level list(itertools.islice(range(100), 5, 15, 2)) # [5,7,9,11,13] list(itertools.takewhile(lambda x: x<5, [1,2,3,6,7,2])) # [1,2,3] list(itertools.dropwhile(lambda x: x<5, [1,2,3,6,7,2])) # [6,7,2] list(itertools.starmap(operator.add, [(1,2),(3,4)])) # [3,7] list(itertools.product("AB",[1,2])) # [('A',1),('A',2),('B',1),('B',2)] list(itertools.combinations("ABCD",2)) # 6 pairs no repeat list(itertools.permutations("ABC",2)) # 6 ordered pairs list(itertools.combinations_with_replacement("AB",2)) # AA,AB,BB # groupby β group consecutive equal elements (must be sorted first!) data = sorted([1,1,2,2,3,3,1,1], ) for key, group in itertools.groupby(data): print(key, list(group)) # accumulate β running total/product list(itertools.accumulate([1,2,3,4])) # [1,3,6,10] list(itertools.accumulate([1,2,3,4], operator.mul)) # [1,2,6,24] # Flatten nested lists nested = [[1,[2,3]],[4,[5,[6]]]] def flatten(lst): for item in lst: if isinstance(item, list): yield from flatten(item) else: yield item list(flatten(nested)) # [1,2,3,4,5,6]
Performance &
Profiling
Find bottlenecks, measure everything, know the fast vs slow Python patterns.
# timeit β benchmark small snippets import timeit t = timeit.timeit( setup="data=[1,2,3,4,5]*100", stmt="sum(data)", number=100_000 ) print(f"{t:.4f}s") # Quick one-liner timing %timeit sum(range(1000)) # Jupyter magic # cProfile β full program profiling import cProfile cProfile.run("my_function()") # Or as context manager with cProfile.Profile() as pr: my_function() pr.print_stats(sort="cumulative") # line_profiler β line-by-line (pip install line_profiler) # @profile decorator on function, then: kernprof -l script.py # memory_profiler β memory usage per line # @profile decorator, then: python -m memory_profiler script.py # tracemalloc β built-in memory tracing import tracemalloc tracemalloc.start() create_many_objects() snapshot = tracemalloc.take_snapshot() top = snapshot.statistics("lineno") for stat in top[:5]: print(stat)
# β FAST vs β SLOW patterns # String building parts = ["a", "b", "c"] * 1000 fast = "".join(parts) # β O(n) slow = "" for p in parts: slow += p # β O(nΒ²) # Membership test valid = {"a","b","c"} "a" in valid # β O(1) β set "a" in ["a","b","c"] # β O(n) β list # Dict access d.get("key", default) # β one lookup if "key" in d: d["key"] # β two lookups # Local vs global variable lookup import math def fast_sqrt(): local_sqrt = math.sqrt # β local lookup is faster for i in range(10000): local_sqrt(i) # List comprehension vs loop squares = [x**2 for x in range(1000)] # β ~2x faster # numpy for numeric data (releases GIL!) import numpy as np arr = np.array([1.0]*1_000_000) arr * 2 # β C-speed, releases GIL # Generator for large sequences total = sum(x**2 for x in range(10**7)) # β O(1) memory total = sum([x**2 for x in range(10**7)]) # β O(n) memory # Slots for many instances class Point: __slots__ = ["x","y"] # β 40-60% less memory # lru_cache for repeated computation from functools import lru_cache @lru_cache(maxsize=None) def fib(n): return n if n<2 else fib(n-1)+fib(n-2) # bytes vs str for binary data with open("file.bin","rb") as f: # β read as bytes data = f.read()
Testing β
pytest Deep Dive
Write tests that actually catch bugs β fixtures, parametrize, mocking, and test patterns.
# install: pip install pytest pytest-cov # run: pytest tests/ -v --cov=src import pytest from unittest.mock import Mock, patch, MagicMock # Basic test β function starts with test_ def test_addition(): assert 1 + 1 == 2 def test_string(): result = "hello".upper() assert result == "HELLO" assert isinstance(result, str) # Testing exceptions def test_zero_division(): with pytest.raises(ZeroDivisionError): 1 / 0 def test_value_error_message(): with pytest.raises(ValueError, match="must be positive"): validate_age(-5) # @pytest.mark.parametrize β test multiple inputs @pytest.mark.parametrize("a,b,expected", [ (1, 2, 3), (0, 0, 0), (-1, 1, 0), (100, 200, 300), ]) def test_add(a, b, expected): assert add(a, b) == expected # Fixture β setup/teardown, shared state @pytest.fixture def sample_user(): return {"name": "Ravi", "email": "ravi@example.com", "age": 25} @pytest.fixture def db_connection(): conn = create_test_db() yield conn # runs test, then cleanup conn.close() # teardown def test_create_user(sample_user, db_connection): result = create_user(db_connection, sample_user) assert result["name"] == "Ravi" # Fixture scope @pytest.fixture(scope="session") # once per test session @pytest.fixture(scope="module") # once per module @pytest.fixture(scope="class") # once per class @pytest.fixture(scope="function") # once per test (default) # Mocking def test_email_sent(): with patch("myapp.email.send_email") as mock_send: register_user("ravi@example.com") mock_send.assert_called_once_with( to="ravi@example.com", subject="Welcome!" ) # Mock return values def test_fetch_user(): with patch("myapp.db.get_user") as mock_get: mock_get.return_value = {"name": "Ravi"} result = get_user_profile(user_id=1) assert result["name"] == "Ravi" mock_get.assert_called_once_with(1) # Mock exceptions def test_handles_db_error(): with patch("myapp.db.get_user") as mock_get: mock_get.side_effect = ConnectionError("DB down") with pytest.raises(ServiceUnavailable): get_user_profile(1) # pytest marks @pytest.mark.slow @pytest.mark.skip(reason="not ready") @pytest.mark.skipif(sys.platform=="win32", reason="Linux only") @pytest.mark.xfail(reason="known bug") # conftest.py β shared fixtures across test files # pytest auto-discovers and imports conftest.py # Test coverage # pytest --cov=myapp --cov-report=html # Opens htmlcov/index.html to see coverage by line
Interview
Cheatsheet
The most commonly asked Python interview questions β with answers that actually impress.
""" Q: Is Python interpreted or compiled? A: Both. Python source β bytecode (compiled step) β PVM executes bytecode (interpreted step). .pyc files are cached bytecode in __pycache__. Q: What is the GIL? A: Global Interpreter Lock β a mutex in CPython that ensures only ONE thread runs Python bytecode at a time. Exists because reference counting isn't thread-safe. Result: Python threads don't achieve true CPU parallelism. Fix: multiprocessing for CPU tasks, asyncio for I/O. Q: Mutable default argument trap β explain it A: Default arguments are evaluated ONCE at function definition, not each call. So def f(lst=[]) β ALL calls share the same list. Fix: def f(lst=None): if lst is None: lst = [] Q: What is a decorator? A: A function that takes a function and returns a new function (usually wrapping the original). @decorator is sugar for: func = decorator(func) @functools.wraps preserves __name__, __doc__. Q: Difference between deepcopy and shallow copy? A: Shallow: copies container, inner objects are still references. Deep: recursively copies all nested objects. a = [[1,2],[3,4]]; b = a.copy(); b[0].append(9) β a changed! With deepcopy: a is unaffected. Q: What is a generator? A: A function with yield. Returns a generator object. Lazy β generates values one at a time on demand. O(1) memory regardless of sequence size. Used for: large files, infinite sequences, pipelines. Q: What is a metaclass? A: A class whose instances are classes. type is the default. Used for: singleton, class registry, ORM models (SQLAlchemy), enforcing interface requirements. Modern alternative: __init_subclass__, class decorators. Q: How does Python's garbage collection work? A: Primary: reference counting. When refcount β 0, freed immediately. Secondary: cyclic GC (generational) for circular references. gc module: gc.collect(), gc.disable(), gc.get_count() Q: What is the difference between __str__ and __repr__? A: __repr__: unambiguous, for developers, eval(repr(x)) should recreate x. __str__: human-readable, for end users. print() calls __str__. repr() calls __repr__. If only __repr__ defined, it's used for both. Q: What is *args and **kwargs? A: *args: captures extra positional args as a tuple **kwargs: captures extra keyword args as a dict Order: positional / *args / keyword-only / **kwargs Q: Explain list vs tuple vs set vs dict A: list: ordered, mutable, duplicates OK, O(1) index, O(n) search tuple: ordered, immutable, hashable, slightly faster than list set: unordered, unique, O(1) membership test, uses hash table dict: keyβvalue, O(1) get/set, ordered since 3.7 Q: What is monkey patching? A: Dynamically modifying a class or module at runtime. import mymodule; mymodule.myfunction = new_function Used in tests to mock. Can cause confusion β use carefully. Q: What is __slots__? A: Replaces instance __dict__ with a fixed-size array. ~40-60% less memory, faster attribute access. Tradeoff: can't add new attributes dynamically. Q: What is the difference between is and ==? A: == checks value equality (__eq__ method). is checks identity (same object in memory, same id()). Only use 'is' for None, True, False. Q: What is a context manager? A: Object with __enter__ and __exit__. Used with 'with' statement. Guarantees cleanup (close file, release lock) even on exception. @contextmanager + yield = generator-based context manager. """ # Tricky output questions # Q: What does this print? x = [1, 2, 3] y = x x += [4] print(x is y) # True! += on list calls __iadd__ (in-place) x = (1, 2, 3) y = x x += (4,) print(x is y) # False! tuples are immutable, creates new tuple # Q: What prints? def f(x=[]): x.append(1) return x print(f()) # [1] print(f()) # [1, 1] β mutable default trap! print(f()) # [1, 1, 1] # Q: Output? class A: val = [] # class attribute β shared! a = A(); b = A() a.val.append(1) print(b.val) # [1] β same list! # Q: What is the output? print(0.1 + 0.2 == 0.3) # False β floating point! print(round(2.5)) # 2 β banker's rounding! print(round(3.5)) # 4 print(bool([0])) # True β non-empty list! print(True + True) # 2 β bool is int subclass!
- CPython int cache: -5 to 256 are singleton objects
- dict is ordered since Python 3.7 (impl 3.6)
- bool is a subclass of int (True==1, False==0)
- GIL: one thread at a time for CPU, released for I/O
- Strings immutable β join() is O(n), concatenation O(nΒ²)
- empty set = set(), NOT {} (that's an empty dict)
- round(0.5) == 0, round(1.5) == 2 β banker's rounding
- // floor division: -7//2 == -4, not -3
- is vs ==: only use 'is' for None/True/False
- forβ¦else: else runs only if loop completes WITHOUT break
- list.sort() returns None; sorted() returns new list
- *args β tuple; **kwargs β dict inside function
| Operation | list | dict/set |
|---|---|---|
| x in container | O(n) | O(1) avg |
| Access by index/key | O(1) | O(1) avg |
| append / add | O(1) amort | O(1) avg |
| insert(0, x) | O(n) | N/A |
| pop() / pop last | O(1) | O(1) avg |
| pop(0) / pop first | O(n) | N/A |
| delete by key | O(n) | O(1) avg |
| sort | O(n log n) | N/A |
| len() | O(1) | O(1) |