undefined

Buffl

Advist

by Nikolas M.

Which built-in Python data types are sequences and what characterizes a sequence?

Built-in sequence types are:

- str

- list

- tuple

- range

A sequence is an ordered collection that:

- Preserves element order

- Supports indexing (s[0])

- Supports slicing (s[1:4])

- Can be iterated over

What is the difference between a stack and a queue, and when should each be used?

Stack → LIFO (Last In, First Out)

- push, pop

- Used for recursion, undo operations

Queue → FIFO (First In, First Out)

- enqueue, dequeue

- Used for scheduling, task processing, BFS

What is a deque in Python?

A deque (double-ended queue) from collections allows:

- Fast append() and pop()

- Fast appendleft() and popleft()

Efficient for insertions/removals at both ends.

Which operations are less efficient in a deque compared to a list?

Random access (d[i]) is slower in a deque.

Lists allow O(1) indexing due to contiguous memory.

Deques are optimized for fast end operations, not middle access.

What is the difference between an iterable and an iterator?

Iterable:

- An object you can loop over

- Implements __iter__()

Iterator:

- Created using iter()

- Produces items with next()

- Keeps state

- Raises StopIteration when finished

What can be used as a context manager and why?

Any object implementing:

- __enter__()

- __exit__()

Used with "with" to:

- Automatically manage resources

- Ensure cleanup (files, DB connections)

What are magic methods and when are they used?

Magic methods start and end with __ (e.g., __init__, __str__).

They define behavior for:

- Object creation

- Arithmetic

- Comparison

- Printing

They enable operator overloading.

How can you mark a function as private in Python?

You cannot make it truly private.

Convention:

- _function() → protected

- __function() → name mangling (class only)

Privacy is by convention, not enforced.

What does the zip() function do?

zip() combines multiple iterables element-wise.

Returns an iterator of tuples.

Example:

zip([1,2], ['a','b']) → (1,'a'), (2,'b')

What does functools.wraps do?

Used in decorators.

It preserves:

- Original function name

- Docstring

- Metadata

Without it, the wrapper replaces function metadata.

What are generator expressions?

A compact way to create generators:

(x*x for x in range(5))

They:

- Produce values lazily

- Use less memory

- Return generator objects

What is a closure in Python?

A closure is a function that:

- Remembers variables from its enclosing scope

- Even after the outer function has finished

Used for function factories and data encapsulation.

What is a dataclass and its advantages?

A @dataclass automatically generates:

- __init__

- __repr__

- __eq__

Advantages:

- Less boilerplate

- Cleaner code

- Designed for storing data

What does it mean that Python is strongly typed and dynamically typed?

Strongly typed:

- No implicit type coercion (e.g., "3" + 3 fails)

Dynamically typed:

- Type is determined at runtime

- No need to declare variable types

What is Python’s naming convention (PEP 8)?

- variables/functions: lowercase_with_underscores

- Classes: CamelCase

- Constants: UPPERCASE

Name three built-in data types in Python.

Examples:

- int

- str

- float

(others: list, dict, set, tuple, bool)

What is a module in Python?

A module is a .py file containing:

- Functions

- Classes

- Variables

Used to organize and reuse code.

Where does Python look for modules?

Python searches:

- Current directory

- PYTHONPATH

- Standard library directories

- Installed packages (site-packages)

Stored in sys.path.

What does the finally clause do in try/except?

The finally block:

- Always executes

- Runs whether an exception occurs or not

Used for cleanup operations.

Can default parameters appear before non-default ones?

No.

Default parameters must come after required parameters.

Correct:

def f(a, b=3)

What is the difference between positional and keyword arguments?

Positional:

- Order matters

Keyword:

- Passed using parameter names

- Order does not matter

How do you accept an arbitrary number of arguments?

*args → variable positional arguments (tuple)

**kwargs → variable keyword arguments (dict)

How can you unpack collections into function arguments?

Use:

* for sequences

** for dictionaries

Example:

func(*my_list)

func(**my_dict)

How do you sort a list by the last letter of each string?

Use sorted() with key:

sorted(strings, key=lambda s: s[-1])

What is the difference between append() and extend()?

append(x):

- Adds one element

- List grows by one

extend(iterable):

- Adds multiple elements

- Merges another iterable

What does list slicing [x:y:z] mean?

x → start index

y → stop index (exclusive)

z → step size

Example:

[1:10:2] → every second element

What does my_list[1:] return for a list of 20 elements?

It returns:

- All elements starting from index 1

- Excludes the first element

- Total length = 19

What are namedtuple, Counter, and OrderedDict?

namedtuple:

- Tuple with named fields

Counter:

- Counts element frequency

OrderedDict:

- Dictionary preserving insertion order (older Python versions)

How do you specify a variable type?

Using type hints:

x: int = 3

Type hints improve readability and static analysis.

How do you specify a function return type?

Using ->

def func() -> int:

return 3

What are decorators?

Decorators:

- Wrap functions

- Add functionality

- Use @syntax

They modify behavior without changing original code.

Compare list, tuple, set, and dict (2 characteristics each).

list:

- Ordered

- Mutable

tuple:

- Ordered

- Immutable

set:

- Unordered

- Unique elements

dict:

- Key-value pairs

- Mutable

Name two ways to format strings in Python.

1. f-strings:

f"Hello {name}"

2. str.format():

"Hello {}".format(name)

What do the positions in string[x:y:z] stand for?

x -> start position (inclusive)
y -> end position (exclusive)
z -> step size

Name two different ways to format strings in Python.

f-strings (f"text {variable}")
.format()
method ("text {}".format(variable))

Write an f-string that prints "Alice is 25 years old" using variables name = "Alice" and age = 25.

print(f"{name} is {age} years old.")

Write code using .format() to print "Hello, Bob!" using the variable name = "Bob".

print("Hello, {}!".format(name))

What do %s, %d, and %f mean in % formatting?

%s = string
%d = integer (decimal)
%f = float

Example: "Name: %s, Age: %d" % (name, age)

How do you format a float to 2 decimal places using f-strings?

price = 19.99

print(f"{price:.2f}") # Output: 19.99

What's the difference between append() and extend()?

append(x) — adds ONE element (even if it's a list)
extend(iterable) — adds MULTIPLE elements from an iterable

my_list = [1, 2]

my_list.append([3, 4]) # → [1, 2, [3, 4]] ← nested list!

my_list.extend([3, 4]) # → [1, 2, 3, 4] ← flattened

What's the difference between pop() and remove()?

pop(i) — removes by INDEX (default: last), RETURNS the value
remove(x) — removes by VALUE (first occurrence), returns None

my_list = ['a', 'b', 'c']

my_list.pop(1) # Removes index 1 → returns 'b'

my_list.remove('a') # Removes value 'a' → returns None

Which list methods modify the list in-place (don't return a new list)?

reverse() — reverses in-place
sort() — sorts in-place
append(), extend(), insert(), remove(), pop(), clear()

⚠️ Common mistake: my_list.sort() returns None, NOT the sorted list!

What does "shallow copy" mean for list.copy()?

Copies the list structure, but NOT nested objects (e.g., inner lists)
Changes to nested objects affect BOTH lists

list1 = [[1, 2], [3, 4]]

list2 = list1.copy()

list1[0][0] = 99 # ← Changes list2 too!

# list1 = [[99, 2], [3, 4]]

# list2 = [[99, 2], [3, 4]] ← affected!

What do index() and count() return?

index(x) — index of FIRST occurrence (error if not found)
count(x) — number of occurrences (0 if not found)

my_list = ['a', 'b', 'a', 'c']

my_list.index('a') # → 0 (first occurrence)

my_list.count('a') # → 2 (appears twice)

What does insert(i, x) do? What if index is out of bounds?

Inserts element x BEFORE index i. If i is out of bounds, appends to start/end (no error).

my_list = ['a', 'b', 'c']

my_list.insert(0, 'z') # → ['z', 'a', 'b', 'c']

my_list.insert(100, 'end') # → [..., 'c', 'end'] ← at end

Write the syntax for list, dict, and set comprehensions.

# List comprehension

[expression for item in iterable if condition]

# Dict comprehension

{key: value for item in iterable if condition}

# Set comprehension

{expression for item in iterable if condition}

Write a list comprehension to get squares of even numbers from [1, 2, 3, 4, 5].

numbers = [1, 2, 3, 4, 5]

result = [x**2 for x in numbers if x % 2 == 0]

# Result: [4, 16]

What's the difference between {x for x in nums}and [x for x in nums]?

{x for x in nums} — set comprehension → removes duplicates, unordered
[x for x in nums] — list comprehension → keeps duplicates, ordered

nums = [1, 2, 2, 3]

list_comp = [x for x in nums] # → [1, 2, 2, 3]

set_comp = {x for x in nums} # → {1, 2, 3}

What's the difference between a module and a package in Python?

Module: A single .py file containing Python code
Package: A directory containing modules +__init__.py file

Example:

Module: math_utils.py
Package:
mypackage/(directory) with__init__.py inside

What is the purpose of __init__.py in a package?

Marks a directory as a package (traditional packages)
Can be empty OR contain initialization code
Namespace packages (Python 3.3+) don't require __init__.py

Note: Without __init__.py, the directory is a namespace package (more advanced)

Briefly describe namedtuple, Counter, OrderedDict, and deque from the collections module.

namedtuple— tuple with named fields (access by name:point.x)
Counter— counts occurrences of elements in an iterable
OrderedDict— dict that preserves insertion order (+ ordering methods)
deque— double-ended queue, fast append/pop from both ends

Easy memory trick:

namedtuple = group variables together
Counter = count collection items
OrderedDict = preserve insertion order
deque = efficient stack/queue

Explain when try, except, else, and finally blocks execute.

try— code that might raise an exception
except— runs if exception occurs in try
else— runs ONLY if NO exception occurred
finally— ALWAYS runs (cleanup code)

Order: try → except (if error) → else (if no error) → finally (always)

Show two ways to catch multiple exception types.

# Method 1: Separate blocks

try:

...

except ValueError:

...

except TypeError:

...

# Method 2: Single block with tuple

try:

...

except (ValueError, TypeError):

...

How do you manually raise an exception in Python?

Use the raise keyword:

raise ValueError("Custom error message")

# Or re-raise the current exception:

try:

...

except SomeError:

# do something

raise # re-raises the same exception

Where must default parameters appear in a function definition?

Default parameters MUST come AFTER non-default parameters.

# ✅ Correct

def func(a, b, c=5, d=10):

pass

# ❌ Wrong

def func(a, c=5, b): # SyntaxError!

pass

What are *args and **kwargs? What data types do they create?

*args— collects extra positional arguments into a tuple
**kwargs— collects extra keyword arguments into a dict

def example(*args, **kwargs):

print(type(args)) # → <class 'tuple'>

print(type(kwargs)) # → <class 'dict'>

example(1, 2, 3, a=4, b=5)

# args = (1, 2, 3)

# kwargs = {'a': 4, 'b': 5}

How do you unpack a list and a dict as function arguments?

*list— unpacks list/tuple as positional arguments
**dict— unpacks dict as keyword arguments

def func(a, b, c):

return a + b + c

values = [1, 2, 3]

func(*values) # Same as: func(1, 2, 3)

params = {"a": 1, "b": 2, "c": 3}

func(**params) # Same as: func(a=1, b=2, c=3)

When calling a function, what's the rule about positional and keyword arguments?

Positional arguments MUST come BEFORE keyword arguments.

func(1, 2, c=3) # ✅ Correct

func(1, b=2, c=3) # ✅ Correct

func(a=1, 2, 3) # ❌ WRONG! keyword before positional

What is the syntax of a lambda function? What are its limitations?

lambda parameters: expression

Limitations:

Only one expression (no multiple statements, no loops)
Implicitly returns the expression result
Noreturn keyword needed/allowed

Example:

lambda x: x ** 2 is the same as def f(x): return x ** 2

Sort the list [("Alice", 25), ("Bob", 20)] by age using a lambda.

students = [("Alice", 25), ("Bob", 20)]

sorted_students = sorted(students, key=lambda x: x[1])

# Result: [('Bob', 20), ('Alice', 25)]

Pattern:

sorted(iterable, key=lambda x: x[index/attribute])

What's the difference between map() and filter() with lambdas?

map(func, iterable)— transforms each element, returns all
filter(func, iterable)— keeps only elements where func returnsTrue

nums = [1, 2, 3, 4, 5]

# map: transform all elements

list(map(lambda x: x ** 2, nums))

# → [1, 4, 9, 16, 25] (all 5 transformed)

# filter: keep only some elements

list(filter(lambda x: x % 2 == 0, nums))

# → [2, 4] (only even numbers kept)

What does it mean that functions are "first-class objects" in Python?

Functions can be:

Assigned to variables
Passed as arguments to other functions
Returned from functions
Stored in data structures (lists, dicts)

Example:

python

def greet(name):

return f"Hi, {name}"

# Assign to variable

my_func = greet

# Pass as argument

def call_func(func, arg):

return func(arg)

call_func(greet, "Alice") # → "Hi, Alice"

Write a function apply_twice(func, value) that applies a function to a value twice.

def apply_twice(func, value):

return func(func(value))

def square(x):

return x ** 2

apply_twice(square, 2) # → 16

# First: square(2) = 4

# Second: square(4) = 16

What's the difference between an iterable and an iterator? What methods must each implement?

Iterable:

Can be looped over (lists, strings, etc.)
Implements__iter__()→ returns an iterator
Can be iterated multiple times

Iterator:

Represents a stream of data
Implements__iter__()(returns self) AND__next__()
RaisesStopIterationwhen exhausted
One-time use — exhausted after iteration

Explain what happens behind the scenes when you write for item in my_list.

Python calls iter(my_list)→ gets an iterator by calling __iter__()
In each iteration, Python calls next(iterator)→ gets next value via __next__()
When __next__()raises StopIteration, the loop terminates

# for item in my_list:

# print(item)

# Equivalent to:

iterator = iter(my_list)

while True:

try:

item = next(iterator)

print(item)

except StopIteration:

break

What are magic/dunder methods in Python? Give 4 examples.

Methods that start and end with double underscores (__). They allow you to customize how Python's built-in operations work with your custom classes.

4 Examples:

__str__— called bystr()andprint()
__init__— constructor, called when creating object
__eq__— called by==operator
__len__— called bylen()

(Other valid answers:__repr__,__add__,__lt__,__iter__,__next__,__call__,__enter__,__exit__)

Which dunder method is called for each operation?

len(obj) # → obj.__len__()

str(obj) # → obj.__str__()

obj1 == obj2 # → obj1.__eq__(obj2)

obj1 + obj2 # → obj1.__add__(obj2)

obj1 < obj2 # → obj1.__lt__(obj2)

for x in obj: # → obj.__iter__()

next(iterator) # → iterator.__next__()

obj() # → obj.__call__()

What's the difference between __str__ and __repr__?

__str__— human-readable, friendly string (for print(),str())
__repr__— unambiguous, developer-focused representation (for repr(), REPL)

class Point:

def __str__(self):

return f"Point at ({self.x}, {self.y})"

def __repr__(self):

return f"Point(x={self.x}, y={self.y})"

p = Point(3, 4)

print(p) # → "Point at (3, 4)" (uses __str__)

repr(p) # → "Point(x=3, y=4)" (uses __repr__)

Given this basic decorator, modify wrapper to accept arbitrary arguments and keyword arguments:

def decorator(func):

def wrapper():

print('Execution started')

result = func()

print('Execution completed')

return result

return wrapper

def decorator(func):

def wrapper(*args, **kwargs): # ← Add *args, **kwargs

print('Execution started')

result = func(*args, **kwargs) # ← Pass them to func

print('Execution completed')

return result

return wrapper

What is a decorator in Python? How is @decorator syntax used?

A decorator is a function that takes a function and returns a modified version of it.

@decorator

def my_func():

pass

# Equivalent to:

my_func = decorator(my_func)

Pattern: Outer function returns inner wrapper function (closure).

What are the two ways to implement decorators?

Function-based (most common):

def decorator(func):

def wrapper(*args, **kwargs):

return func(*args, **kwargs)

return wrapper

Class-based:

class Decorator:

def __init__(self, func):

self.func = func

def __call__(self, *args, **kwargs):

return self.func(*args, **kwargs)

What does functools.wraps do and why is it needed?

Preserves the original function's metadata (__name__, __doc__, etc.) when decorating.

from functools import wraps

def decorator(func):

@wraps(func) # ← Preserves func.__name__, func.__doc__

def wrapper(*args, **kwargs):

return func(*args, **kwargs)

return wrapper

Without it: decorated function has __name__ = "wrapper"instead of the original name.

Describe closures in Python. (bullet points)

A closure is when an inner function has access to variables from its outer function
The inner function "remembers" the outer variables even after the outer function returns
Inner functions can be nested inside outer functions
This is the mechanism behind function-based decorators

def outer(x):

def inner(y):

return x + y # inner accesses x from outer

return inner

add_5 = outer(5)

print(add_5(3)) # → 8 (remembers x=5)

How are closures related to decorators?

Decorators work because of closures:

The decorator'swrapperfunction is an inner function
It retains access tofunc(the decorated function) from the outer scope
Whenwrapperis called later, it can still callfunc

def decorator(func): # Outer function

def wrapper(*args, **kwargs): # Inner function

return func(*args, **kwargs) # Uses func (closure!)

return wrapper

What methods must a class implement to be used as a context manager?

__enter__(self)— called when enteringwithblock, returns value forasvariable
__exit__(self, exc_type, exc_val, exc_tb)— called when exiting, handles cleanup

class MyContext:

def __enter__(self):

# Setup code

return self # or any value

def __exit__(self, exc_type, exc_val, exc_tb):

# Cleanup code (always runs!)

return False # propagate exceptions

How do you create a function-based context manager using @contextmanager?

Use @contextmanager decorator with yield:

from contextlib import contextmanager

@contextmanager

def my_context():

# Setup code (before yield)

print("Entering")

try:

yield # or yield value for 'as'

# Code block runs here

finally:

# Cleanup code (after yield)

print("Exiting")

# Usage:

with my_context():

print("Inside")

What's the difference between a list comprehension and a generator expression?

List comprehension ([]) — creates entire list in memory immediately:

[x**2 for x in range(10)] # → [0, 1, 4, 9, ...]

Generator expression (()) — creates values on demand (lazy):

(x**2 for x in range(10)) # → generator object

Benefit: Generators are memory-efficient for large datasets.

What does "lazy evaluation" mean in the context of generators?

Values are produced on demand, not all at once.

Generator doesn't compute values until you ask for them (next()or loop)
Memory efficient — doesn't store all values
Can represent infinite sequences

# Only computes values as needed:

gen = (x**2 for x in range(1_000_000))

next(gen) # Only computes first value

How would you re-implement enumerate() and zip() using generators?

# enumerate

def my_enumerate(iterable, start=0):

index = start

for item in iterable:

yield (index, item)

index += 1

# zip

def my_zip(*iterables):

iterators = [iter(it) for it in iterables]

while True:

try:

values = [next(it) for it in iterators]

yield tuple(values)

except StopIteration:

return

Show the syntax for type hints on variables and functions.

# Variable type hints

name: str = "Alice"

age: int = 25

# Function parameter and return type hints

def greet(name: str, age: int) -> str:

return f"Hello, {name}! Age: {age}"

# Function with no return value

def print_data(data: list) -> None:

print(data)

Does Python enforce type hints at runtime?

No! Type hints are optional annotations for documentation and static analysis only.

def add(a: int, b: int) -> int:

return a + b

# This runs without error, even with wrong types:

add("hello", "world") # → "helloworld"

To check types: Use a static type checker like mypy, not Python itself.

What does the @dataclass decorator do? What methods does it auto-generate?

The @dataclassdecorator automatically generates boilerplate methods for classes that mainly store data.

Auto-generated methods:

__init__()— constructor from field annotations
__repr__()— string representation
__eq__()— equality comparison

from dataclasses import dataclass

@dataclass

class Point:

x: int

y: int

# No need to write __init__, __repr__, __eq__ manually!

p = Point(3, 4)

print(p) # Point(x=3, y=4)

Convert this regular class to a dataclass:

class Person:

def __init__(self, name, age):

self.name = name

self.age = age

from dataclasses import dataclass

@dataclass

class Person:

name: str

age: int

# That's it! __init__, __repr__, __eq__ are auto-generated

Explain why NumPy is faster than Python lists. (Give 3 reasons)

Contiguous memory layout — values stored next to each other → CPU cache friendly
SIMD vectorized operations — CPU can process multiple values in one instruction
Written in C — compiled C code under the hood, not interpreted Python

# Example: NumPy is ~100x faster

import numpy as np

arr = np.arange(1_000_000)

result = arr * 2 # Uses SIMD, contiguous memory, C code

What are the downsides/limitations of NumPy arrays?

Homogeneous types — all elements must be same type (mixing types forces coercion)
Fixed size — adding/removing elements is costly (creates new array)
Integer overflow — fixed-size integers can overflow (unlike Python's arbitrary precision)

# Overflow example:

arr = np.array([127], dtype=np.int8)

arr[0] += 1

print(arr) # → [-128] ← Overflow!

Name the main NumPy data types and how to check an array's type.

Integers:int8,int16,int32,int64(signed),uint8,uint16,uint32,uint64(unsigned)

Floats:float16,float32,float64

Others:bool,object(avoid — loses performance!)

Check type:

arr.dtype # → int64, float64, etc.

Example:

import numpy as np

arr = np.array([1, 2, 3], dtype=np.int8)

print(arr.dtype) # → int8

What's the difference between np.zeros(), np.ones(), np.full(), and np.eye()?

np.zeros(shape)— array filled with 0
np.ones(shape)— array filled with 1
np.full(shape, value)— array filled with any value
np.eye(n)— n×n identity matrix (diagonal 1s, rest 0s)

np.zeros(3) # → [0. 0. 0.]

np.ones(3) # → [1. 1. 1.]

np.full(3, 7) # → [7 7 7]

np.eye(3) # → [[1. 0. 0.]

# [0. 1. 0.]

# [0. 0. 1.]]

What are the 4 ways to index NumPy arrays?

Basic indexing:arr[1],arr[1, 2]
Slicing:arr[1:3],arr[:, 1]
Boolean indexing:arr[arr > 5](filter by condition)
Fancy indexing:arr[[0, 2, 4]](select specific indices)

What are the two uses of np.where()?

1. Find indices where condition is True:

indices = np.where(arr > 10)

arr[indices] # Get those values

2. Conditional replacement (like ternary operator):

# Replace values > 10 with 100, else keep original

result = np.where(arr > 10, 100, arr)

Explain NumPy's broadcasting rules. Are shapes (3, 1) and (1, 4) compatible? What about (3, 2) and (3, 3)?

3 Rules:

If different ndim, prepend 1s to smaller shape
Compatible if: same size OR one is 1 in each dimension
Result shape = max of each dimension

Examples:

(3, 1) + (1, 4):
- Dim 0: 3 vs 1 → compatible ✓
- Dim 1: 1 vs 4 → compatible ✓
- Result: (3, 4) ✅
(3, 2) + (3, 3):
- Dim 0: 3 vs 3 → compatible ✓
- Dim 1: 2 vs 3 → incompatible ❌ (neither is 1)
- Fails! ❌

What's the difference between a view and a copy? Which operations create each?

View — references same data, changes affect original Copy — independent data, changes don't affect original

Create VIEWS:

Slicing: arr[1:4]
Reshape: arr.reshape(2, 3)
Transpose: arr.T

Create COPIES:

.copy():arr.copy()
Fancy indexing: arr[[0, 2, 4]]
Boolean indexing: arr[arr > 5]

Example:

arr = np.array([1, 2, 3, 4])

view = arr[1:3]

view[0] = 999

print(arr) # → [1 999 3 4] ← Original changed!

copy = arr.copy()

copy[0] = 111

print(arr) # → [1 999 3 4] ← Original unchanged

What are np.nan, np.inf, and how do you check for them?

np.nan— Not a Number (missing/undefined values)
np.inf— Positive infinity
np.NINFor-np.inf— Negative infinity
np.pi— π (3.14159...)
np.e— Euler's number (2.71828...)

Checking:

np.isnan(arr) # Check for nan

np.isinf(arr) # Check for infinity

np.isfinite(arr) # Check for finite (not nan, not inf)

⚠️ Important:

np.nan == np.nan # → False! Use np.isnan() instead

What do reshape(-1), expand_dims(), squeeze(), and flatten() do?

reshape(-1)— flatten to 1D OR auto-calculate one dimension

arr.reshape(-1) # → 1D
arr.reshape(-1, 3) # → auto-calc rows for 3 columns

expand_dims(arr, axis)— add dimension of size 1

arr.shape: (3,) → expand_dims(arr, 0) → (1, 3)
squeeze()— remove dimensions of size 1

arr.shape: (1, 3, 1) → squeeze() → (3,)
flatten()— convert to 1D (always copy)

arr_2d.flatten() → 1D array (copy, not view)

Explain vstack(), hstack(), and column_stack() with examples.

vstack()— vertical stack (stack as rows)

np.vstack([[1,2,3], [4,5,6]])
# → [[1 2 3]
# [4 5 6]]
hstack()— horizontal stack (side by side)

np.hstack([[1,2,3], [4,5,6]])
# → [1 2 3 4 5 6]
column_stack()— stack 1D arrays as columns

np.column_stack([[1,2,3], [4,5,6]])
# → [[1 4]
# [2 5]
# [3 6]]

What do .reduce(), .accumulate(), and .outer() do for ufuncs? Is np.vectorize() fast?

.reduce()— apply operation across array → single value

np.add.reduce([1,2,3,4]) # → 10 (sum all)
.accumulate()— cumulative operation → intermediate results

np.add.accumulate([1,2,3,4]) # → [1,3,6,10]
.outer()— apply to all pairs from two arrays

np.multiply.outer([1,2], [10,20])
# → [[10,20], [20,40]]
np.vectorize()— ⚠️ NOT fast! Just a convenience wrapper (essentially a for loop), no performance benefit

What are the 4 ways to create a pd.Series? What happens to dict keys?

import pandas as pd

import numpy as np

# 1. From list (default integer index)

pd.Series([10, 20, 30])

# 2. From dict (keys become index!)

pd.Series({'a': 10, 'b': 20})

# 3. From scalar (broadcast to all index positions)

pd.Series(5, index=['a', 'b', 'c'])

# 4. From NumPy array (view by default!)

arr = np.array([1, 2, 3])

pd.Series(arr) # view — changes to arr affect Series

pd.Series(arr, copy=True) # copy — independent

What are the 4 main ways to create a DataFrame?

# 1. Dict of lists (most common)

pd.DataFrame({'col1': [1,2,3], 'col2': [4,5,6]})

# 2. Dict of Series (Series index → row labels)

pd.DataFrame({'col1': pd.Series([1,2], index=['a','b'])})

# 3. List of dicts (each dict = one row, missing keys → NaN)

pd.DataFrame([{'a': 1, 'b': 2}, {'a': 3}])

# 4. 2D NumPy array (specify column names explicitly)

pd.DataFrame(np.array([[1,2],[3,4]]), columns=['A','B'])

What is pd.Index? Name 3 key properties.

pd.Index is the label system for rows and columns in Pandas.

3 key properties:

Immutable — cannot change individual elements (ensures safe sharing)
Ordered multi-set — maintains order, allows duplicate labels
Built on NumPy — backed by np.ndarray, supports NumPy-like operations

idx = pd.Index(['a', 'b', 'c'])

idx[0] = 'z' # ❌ TypeError! Immutable

idx.values # → numpy array underneath

What's the difference between .loc[] and .iloc[] in pandas?

.loc[]— label-based (use index labels and column names)
.iloc[]— integer-based (use 0-based positions)

Memory trick: loc = label, iloc = integer

How do you filter a DataFrame? How do you combine conditions?

What do all() and any() do in Pandas? What does axis control?

all()— True if ALL values meet condition
any()— True if ANY value meets condition
axis=0(default) — check column-wise → result per column
axis=1— check row-wise → result per row

What's the difference between unique(), nunique(), and value_counts() in pandas?

unique()— returns array of unique values (in order of appearance)
nunique()— returns count of unique values (integer)
value_counts()— returns frequency of each value (sorted by count)

Show how to use replace() in pandas with scalar, list, and dict. How do you drop rows vs columns?

What does apply() do? What's the difference between axis=0 and axis=1?

Applies a function to each column or row of a DataFrame.

axis=0(default) — function receives each column → result per column
axis=1— function receives each row → result per row

Name 4 string methods available via .str accessor in Pandas.

.str.upper()/.str.lower()— change case
.str.split(sep)— split into list (useexpand=Truefor columns)
.str.slice(start, stop)or.str[start:stop]— slice characters
.str.strip()— remove whitespace
.str.contains(pattern)— boolean mask if contains pattern
.str.replace(old, new)— replace substring

Name 5 ways to handle missing values in Pandas.

dropna()— drop rows/columns with missing values

df.dropna() # drop rows with any NaN
df.dropna(thresh=2) # keep rows with ≥2 non-NaN
fillna(value)— fill with single value

df.fillna(0)
fillna(dict)— fill each column differently

df.fillna({'A': 0, 'B': 'unknown'})
Forward fill — use previous value

df.ffill()
Backward fill — use next value

df.bfill()
interpolate()— estimate from surrounding values (linear interpolation)

df.interpolate()

None vs np.nan vs pd.NA:

None→ Python null, object dtype
np.nan→ float, numeric columns
pd.NA→ nullable integers/strings (Int64,string)

What's the difference between dtype='category' and pd.CategoricalDtype(ordered=True) in pandas? What does get_dummies() do?

dtype='category'— nominal categories (no order)

s.astype('category') # colors, cities, etc.
pd.CategoricalDtype(ordered=True)— ordinal categories (has order)

dtype = pd.CategoricalDtype(['S','M','L','XL'], ordered=True)
s.astype(dtype) # now S < M < L < XL, comparisons work!

pd.get_dummies()— one-hot encoding (each category → binary column)
x
pd.get_dummies(df['color'])
# red → [1,0,0], blue → [0,1,0], green → [0,0,1]
.cat— accessor to manage categories (add, remove, rename, reorder)

Name the 5 key parameters of pd.read_csv().

Parameter	Purpose	Example
`header`	Which row = column names	`header=None` (no header)
`names`	Custom column names	`names=['a','b','c']`
`index_col`	Column → row index	`index_col='id'`
`usecols`	Load only these columns	`usecols=['name','age']`
`delimiter`	Separator character	`delimiter=';'`

df = pd.read_csv('data.csv',

header=0,

names=['ID', 'Name', 'Age'],

index_col='ID',

usecols=['Name', 'Age'],

delimiter=','

)

# Write back (no index!)

df.to_csv('output.csv', index=False)

Explain the split-apply-combine pattern in pandas. How do you apply multiple aggregations?

Pattern:

Split — divide DataFrame into groups
Apply — run function on each group
Combine — merge results back together

What do set_index() and reset_index() do in pandas? When does a MultiIndex appear?

set_index(col)— move column(s) to become the row index
reset_index()— move index back to regular columns
MultiIndex — multiple levels of row labels (e.g., fromgroupbywith multiple columns)

What does pivot()do in pandas? Given this table, write the result:

df.pivot(index='name', columns='subject', values='score')

pivot() reshapes long → wide format.

index→ row labels
columns→ unique values become column names
values→ fills the table

Result:

⚠️ Fails with duplicates → use pivot_table() instead (aggregates duplicates)

Reverse: melt()goes wide → long

long_df = wide_df.melt(id_vars='student', -> was rownames before

var_name='subject', -> was colnames before

valuename='score') -> was values before

What does melt() do in pandas? What columns does the result always have?

melt() reshapes wide → long format (opposite ofpivot()).

Result always has:

id_vars columns (unchanged)
variable column (former column names)
value column (former cell values)

df.melt(id_vars=['name'], var_name='subject', value_name='score')

Given this wide pandas DataFrame , write the result of df.melt(id_vars='name'):

id_vars='name' → name column stays
variable → old column names (math, english)
value → old cell values

What's the difference between concat() and merge() in pandas? Explain the 4 join types.

concat()— stacks DataFrames (no key matching needed)

merge()— joins on matching key column (like SQL JOIN)
how=
Keeps
'inner'
Only matching rows (default)
'left'
All left rows + matches from right
'right'
All right rows + matches from left
'outer'
ALL rows from both, NaN where no match

`how=`	Keeps
`'inner'`	Only matching rows (default)
`'left'`	All left rows + matches from right
`'right'`	All right rows + matches from left
`'outer'`	ALL rows from both, NaN where no match

What are the 4 differences between primary and secondary biological databases? Give examples of each.

Aspect	Primary	Secondary
Data	Raw/unprocessed	Curated/organized
Release	Frequent	Infrequent
Funding	Institutional	Project-based
Examples	GenBank, UniProt, PDB	PFAM, CATH/SCOP, PROSITE

Memory trick:

Primary = Plain raw data, Pouring in constantly
Secondary = Sorted, Selectively updated

Name the 3 primary biological databases, what they contain, and their approximate sizes.

Database	Contains	Size
GenBank	DNA/RNA sequences	~10⁹ sequences, ~10¹² residues
UniProtKB	Protein sequences	SwissProt: ~5×10⁵ (curated) + TrEMBL: ~10⁸ (uncurated)
PDB	Protein 3D structures	~200,000 structures

UniProt key distinction:

SwissProt = manually curated, small, high quality
TrEMBL = auto-annotated, huge, lower quality

Describe the GenBank flat file format. What are the mandatory fields? How does a record start/end?

Format rules:

Fixed 80 columns wide
Keywords in cols 1–10, sub-keywords in cols 3–4, values in cols 13–80

Mandatory fields:

LOCUS — name, length, type, date
ACCESSION — unique stable ID
VERSION — accession + version number
ORIGIN — the actual sequence

Optional with sub-records:

REFERENCE → AUTHORS, TITLE, JOURNAL
SOURCE → ORGANISM
FEATURES → gene, CDS, etc.

Record boundaries:

Starts with:LOCUS
Ends with://

What does the VERSION field track? Can it uniquely identify a version of an entry?

VERSION increments only when the sequence changes
Annotation changes (references, features, organism) do NOT change the version

Therefore: NO — VERSION cannot uniquely identify a specific state of an entry, because the same version number can have different annotations at different points in time.

NM_000518.5 → sequence unchanged

→ but annotations may differ over time!

Name the 4 main E-utilities and describe the standard pipeline.

Tool	Purpose
esearch	Search database → returns list of IDs
efetch	Download records by ID
einfo	Info about databases and searchable fields
elink	Find related records across databases

Standard pipeline:

esearch → get IDs → efetch → download records

Name the 6 steps of SwissProt manual curation.

Sequence curation — verify and clean the sequence
Sequence analysis — run tools, identify domains/features
Literature curation — extract data from publications
Family-based curation — propagate annotations from related proteins
Evidence attribution — tag each fact with its evidence type
Quality assurance — second review + automated checks

Key point: Every annotation has an evidence tag (experimental / by similarity / predicted) — this is what makes SwissProt trustworthy! 🎯

Compare X-ray, NMR, and Cryo-EM for protein structure determination.

X-ray Crystallography (most common in PDB):

Protein must form crystals → X-rays diffract → 3D electron density map
Gives one single structure
Quality: resolution (Å, lower = better; <2Å excellent) + R-value (lower = better fit, ~0.20 good)

NMR Spectroscopy:

Protein in solution (no crystals needed)
Measures distances between atoms via magnetic field
Gives an ensemble of 10–30 similar structures (not one!) — spread shows flexibility
Limited to small proteins (<50 kDa)

Cryo-EM:

Protein flash-frozen in ice (no crystals needed)
Electron beam + 2D images from many angles → 3D reconstruction
Works for large complexes (ribosomes, viruses, membrane proteins)
Nobel Prize 2017

What does CATH stand for? Describe each level.

Level	Name	Based on
C	Class	Secondary structure content (α, β, αβ)
A	Architecture	Overall 3D shape of secondary structures
T	Topology	Shape + connectivity between elements
H	Homologous superfamily	Common evolutionary ancestor

Direction: Broad (Class) → Specific (Homologous)

Memory: Cats Are Totally Homologous 🐱

What is PFAM and what is a seed-MSA?

PFAM — secondary database of protein families/domains derived from UniProt.

Seed-MSA (Seed Multiple Sequence Alignment):

A small, manually curated alignment of representative sequences for a family
Used to build an HMM profile (Hidden Markov Model)
HMM then searches all of UniProt to find all family members

Process:

Seed sequences → Seed alignment → HMM profile → Search UniProt → Full family

Each PFAM entry has: seed alignment, full alignment, HMM profile, PDB links.

Name the 4 main BioPython modules and their purpose.

Module	Purpose
`Bio.Entrez`	Access NCBI databases (esearch, efetch)
`Bio.SeqIO`	Read/write sequence files (GenBank, FASTA)
`Bio.Seq`	Work with sequences (complement, transcribe, translate)
`Bio.PDB`	Parse and analyze PDB structure files

Give 2 reasons why NoSQL databases were developed.

Horizontal scaling — SQL databases scale vertically (bigger server), which is expensive. NoSQL scales horizontally (more servers), which is cheaper and more flexible.
Semi-structured data — SQL requires a fixed schema (same columns every row). NoSQL handles flexible, nested, or varying data structures naturally (e.g. JSON documents with different fields per record).

Name the 4 NoSQL categories with an example each and their use case.

Type	Example	Data model	Use case
Key-Value	Redis	`key → value`	Caching, sessions
Wide Column	Cassandra	Variable columns per row	IoT, time-series, logs
Document	MongoDB	JSON/YAML documents	Profiles, content
Graph	Neo4j	Nodes + relationships	Social networks, recommendations

Memory trick: Koalas Watch Dark Grey movies

What does CAP stand for? What does the theorem state?

C — Consistency: All nodes see same data after any write
A — Availability: System always responds within acceptable time
P — Partition Tolerance: System works even if network between nodes fails

Theorem: In a distributed system, you can only fully satisfy 2 out of 3 simultaneously.

Choice	Sacrifice	Example
CP	Availability	MongoDB, Redis
AP	Consistency	Cassandra, DynamoDB
CA	Partition tolerance	Traditional SQL (single machine)

In practice, Partition Tolerance is non-negotiable for distributed systems — you always have network failures. So the real choice is always CP vs AP! 🎯

What does BASE stand for? How does it differ from ACID?

B — Basically Available: System always responds, even with stale data
S — Soft State: System state may change over time as nodes sync (no instant consistency required)
E — Eventually Consistent: All nodes will converge to the same value given enough time without new writes

ACID	BASE
Always consistent	Eventually consistent
Strict transactions	Flexible updates
Hard to scale	Easy to scale
PostgreSQL, MySQL	Cassandra, MongoDB

What's the difference between pessimistic (ACID) and optimistic (BASE) concurrency?

Pessimistic (ACID/SQL):

Assumes conflicts will happen → locks data before accessing
Other users must wait until lock is released
Safe but creates bottlenecks at scale

Optimistic (BASE/NoSQL):

Assumes conflicts are rare → no locks, everyone reads/writes freely
If conflict detected → resolve after the fact (e.g. last write wins)
Fast and scalable, trades strict safety for performance

Pessimistic: 🔒 Lock → Read/Write → Unlock → next person

Optimistic: Read/Write freely → detect conflict → resolve

What's the difference between horizontal and vertical scaling? Which does NoSQL use?

Vertical (scale up): More RAM/CPU on one machine — limited by hardware ceiling, expensive, single point of failure
Horizontal (scale out): More machines added — theoretically unlimited, cheap, no single point of failure

NoSQL is designed for horizontal scaling because:

No JOINs across tables → data can live on different servers
Flexible schema → easy to partition/shard data
Eventual consistency → nodes work independently

Vertical: [💻 BIG]

Horizontal: [💻][💻][💻][💻][💻] ← NoSQL ✅

What is MVCC and how does conflict resolution work?

MVCC (Multiversion Concurrency Control) — instead of locking, each write creates a new version of the data. Old versions remain readable.

Benefit: Readers never block writers, writers never block readers.

Conflict resolution:

Both users read version v2
User A writes → creates v3✅
User B tries to write v2 → system detects v3 already exists → conflict!
User B must re-read v3 and retry

What are vector clocks and what are they used for?

Vector clock = list of (node_id, counter) pairs, one counter per node in the system.

Purpose: Track causality between events across distributed nodes (wall-clock time is unreliable).

Rules:

Each write increments your own counter
When receiving a message, take the max of each counter + increment your own

Conflict detection:

If clock A ≤ clock B on all positions → A happened before B
If neither is ≤ the other → concurrent writes → conflict!

A:[2,1,0] vs B:[1,2,1] → conflict!

A:[1,0,0] vs B:[2,1,0] → A happened before B

What are the components of the property graph model?

Component	Has
Nodes	Unique ID, labels (type), properties (key-value)
Edges	Unique ID, label, direction (→), properties (key-value)

Node: (id:1, label:Person, name:"Alice", age:25)

Edge: (id:101, from:1, to:2, label:FRIENDS_WITH, since:2020)

Name 4 graph representations and their main trade-off.

Adjacency Matrix — grid, O(1) lookup, wastes O(n²) memory for sparse graphs

Incidence Matrix — nodes×edges grid, good for edge analysis, rarely practical

Edge List — just a list of pairs, minimal memory, slow neighbor lookup

Adjacency List — node → list of neighbors, best balance for sparse graphs

Write the Gremlin syntax for: get all vertices, filter by property, follow edges, access property.

Compare Cypher, Gremlin, and SPARQL.

What does CRUD stand for? Name the corresponding SQL commands.

C — Create →INSERT
R — Read →SELECT
U — Update →UPDATE
D — Delete →DELETE

CRUD = the minimum set of access functions any data system must provide. 🎯

What is REST? Describe the 4 HTTP verbs and their CRUD mapping.

REST = verb (action) applied to noun (URL resource)

Verb	Action	CRUD
`GET`	Retrieve	Read
`POST`	Create new	Create
`PUT`	Replace/update	Update
`DELETE`	Remove	Delete

What do map() and reduce()do?

map(func, list)— apply function to every element → same-length list

map(lambda x: x**2, [1,2,3]) # → [1, 4, 9]
reduce(func, list)— aggregate all elements into one value

from functools import reduce
reduce(lambda acc, x: acc + x, [1,2,3,4,5]) # → 15

Together: Map transforms, reduce aggregates:

total = reduce(lambda a,b: a+b, map(lambda x: x*0.9, prices))

Join Course

Preview

Author

Nikolas M.

Information

Last changed
5 months ago

Report course