You can see an example of a simple decorator below. We want the decorator to be able to wrap a function
with arbitrary arguments and arbitrary keyword arguments. What do we need to do to achieve that? Explain
your answer.
wrapper(*args, **kwargs)
func(*args, **kwargs)
Using *args and **kwargs allows the wrapper to accept any number of positional and keyword arguments
Describe the difference between a Python list and a NumPy array. What problem can arise when we use
a NumPy array as opposed to a Python list?
python list: can hold different types of elements
NumPy array: can hold elements of the same type
Data type restrictions,
NumPy array has fixed size, so modification aquires creating a new array.
Describe broadcasting in NumPy. What conditions have to be satisfied for broadcasting to take place?
Broadcasting allows operations between arrays of different shape without explicitly replicating data.
Rules:
If two arrays have the same shape, no broadcasting is needed.
If two arrays have different shape, NumPy aligns their dimensions from the right.
If the dimension is missing or is 1, it gets stretched to match the other array.
If dimensions do not match and neither is 1, broadcasting fails.
Describe five ways of handling missing values with Pandas (specifying the exact methods is not necessary).
Removing missing values
Filling missing values with a default or statistical value
Forward or backward filling
Using interpolation to estimate missing values
Using machine learning to predict missing values
Describe two ways of handling/encoding categorical data with Pandas.
one-hot encoding: converts categorical values into binary columns
label encoding: assigns an unique integer to each category
Name a primary database and the type of data it holds for each of the three major cases of biomolecular
data.
NCBI - DNA/RNA sequences
UniProt - protein sequences
PDB - 3D structures of protein, RNA, and complexes
Describe three aspects where primary and secondary database differ
Type of data stored: primary databases store raw data, where secondary databases store curated, processed, and analysed data
Data processing & Annotation: primary databeses - minimal processing, secondary databases - highly curated
Data reliability & standardisation: primary - data may contain errors, secondary - data is validated and reviewed
What is the goal of the CATH database and into which units is the data organized, i.e. what does CATH
stand for?
Class
Architechture
Topology
Homologous Superfamily
The goal is to help researchers understand protein function, identify structural similarities amongst proteins, and classify proteins based on hierarchical organisations
List two categories of NoSQL databases
Document oriented databases
Key-value stores
The CAP theorem describes central aspects of distributed systems. What ist the meaning of the letters
(term and explanation) and what does the theorem say about them (One sentence per bullet point)?
Contistency - every node in the system always returns the most up-to-date data
Availability - every request always recieves a response
Partition tolerance - the system continues functioning even if nodes are unreachable
Theorem
A distributed system can only guarantee two of three properties at the same time
If the network is partitioned, the system must chose between consistency and availability
If the system prioritises consistency and availability, it cannot tolerate network partitions
In principle there are two strategies to scale up database system. Name the two strategies and briefly
describe their main features.
Vertical scaling: Increasing the power of a single server by upgrading hardware
Horizontal scaling: distributing data across multiple servers
Zuletzt geändertvor 2 Monaten