Fragenkatalog

Buffl

0. Advanced Datavisualisation

by Carina S.

What of vanilla python data types are sequences? (Python)

string
list
tuple
range

Stack & queue + key operations + when to use which one (Python)

Stack: LIFO (Last in first out)
- Elements are added and removed from the same end („top“ of the stack)
- Most recently added element is the first one to be removed
- key operations: Push, Pop, Peek or Top
- used in situations where you need to keep track of the state or history, perform recursive algorithms, when you want to keep track of function calls or undo / redo functionality, memory management, backtracking algorithms
queue: FIFO (First in first out)
- Elements are added at one end and removed from the other end
- key operations: Enqueue, Dequeue, Front, Rear
- where tasks need to be processed in the order they arrive, e.g. printing, task scheduling, handling requests in networking

What is a deque? (Python)

double ended queue
combines features of both stacks and queues, allowing elements to be added or removed efficiently from both ends
key operations: append(x), appendleft(x), pop(), popleft()
in python, the „collections“ module provides a „deque“ class that implements a double-ended queue

Which operations does a deque perform less efficiently than a regular list? (Python)

Accessing elements (especially close to the center) in lists more efficient than when using deque
Reason: its more efficient because a list uses an index to access elements, and a deque needs to invoke the „next“ method for all nodes up until the desired node
Random access to elements by index: Lists O(1), Deque O(n)

What are iterables in python? (Python)

an object that is capable of returning its elements one by one and implements an __iter__ method

an iterable is supposed to return an iterator
(when we use an iterable in a for…in loop, an iterator is automatically created and is used to produce values until exhaustion)

What are iterators in python? (Python)

an iterator is an object that represents a stream of data and is required to implement an __iter__ method and a __next__ method

The __iter__ method should return the iterator object itself to allow using an iterator with a for…in loop
iterators are lazy —> value is only calculated when needed / we do not precompute all the values at once
an iterator SHOULD NOT change the corresponding iterable

How do we create iterables and iterators in python? (Python)

iterables: create a class and define the dunder function __iter__
iterator: create a class and define the dunder functions __iter__ and __next__

Is an iterable an iterator and vica vera? (Python)

iterator is an iterable, but iterable is not an iterator

What can be used as a context manager and why do we use them? (Python)

we have class-based and function-based context managers
- class-based: classes that have a __enter__ (first executed) and __exit__ (second executed) methods
- function-based: using a function that yields and a @contextmanager decorator
use them to work with files, time benchmarking, working with file system, when you don’t want to upgrade gradients in pytorch, interactions with external stuff,…
use them to establish a connection to something (like database or file) and want to make sure that this connection gets properly closed in all cases

Magic methods and when to use them? (Python)

methods that start and end with a double underscore (also called dunder methods) (e.g. __init__, __str__, __add__)
they are called automatically by the Python interpreter in response to certain language-specific operations
implementing such methods allows us to use standard Python functions and operators with instances of our custom classes
use when you want to define the behaviour of objects in response to certain actions, such as when to define the string representation of an object

How can you mark a function as private? (Python)

you can’t, there is no such thing as fully private functions or variables
But: attribute names starting with a double underscore indicate that the attribute in question is supposed to be used internally, but that is not enforced by the interpreter

What does the zip function do? (Python)

zip wraps two (or more) iterables with a lazy generator that yields tuples containing pairs of next values from each iterable
(If our iterables contain different numbers of items: stops when the shortest ends)

What does functools.wrap do? (Python)

helps us to keep original names and docstrings of wrapped functions, since decorators usually lose this function

What are generator expressions in python? (Python)

a generator is a function that returns a generator iterator (use the yield keyword)
a generator expression combines lazy evaluation of generators with the simplicity of list comprehensions (lazy meaning values are only computed if / when we want it)

How are generator expressions different from list comprehensions? (Python)

generator: lazy evaluation = loading only necessary parts needed right now into the memory -> low memory usage, slower when accessing multiple elements
list comprehension: loading everything into memory even if it is not needed -> faster when accessing multiple elements, expensive in terms of memory usage

What are closures and why do they exist? (Python)

a closure is a function that keeps access to its environment as it was when the function was defined
closures help us to hide / protect the data, to generate functions at runtime, to create function-based decorators

What is a dataclass and what are its advantages? (Python)

Dataclass is a decorator for automatically generating magic methods such as __init__ and __repr__ to user defined classes

Adapt for 3 concept and their pros / cons (dataclass, class, named tuple) (Python)

dataclass	class	named tuple
conciseness & readability automatic method generaton immutable by default default values and supports type annotations	Maximum customiztaion supports class inheritance allows use of @property decorator	immutable by default Conciseness named attributes: improves code readability
limited customization: less control / flexibility Mutability requires additional effort	biolerplate code more verbose manual error-prone implementation	limited customization no default values no type annotations

Use Data Classes When: You want a balance between conciseness and automatic method generation, especially for simple data storage purposes with some customization.
Use Regular Classes When: You need maximum customization, inheritance, or when dealing with complex class hierarchies. Also suitable when explicit control over mutability is required.
Use Namedtuples When: You want a concise and immutable representation of data, particularly when immutability is crucial and customization is not a high priority.

What is a strongly typed and dynamic language? What does it mean? (Python)

Python is a strongly typed dynamic language
strongly typed = the data type of a variable is enforced and checked at compile-time or runtime, which means that the type of a variable is known and checked before the program runs
dynamic = the variable types are determined and checked at runtime, which means the type of a variable can change during the execution of the program (e.g. reassigning a variable with a different value)

Briefly explain the naming scheme of variables, functions and classes in python. (Python)

variables & functions: lowercase_and_underscores
- do not start with digit and do not use python keywords (e.g. TRUE, while, import, in)
classes: CamelCase

Name 3 built-in datatypes in python (Python)

int, float, str (also: bool, list, tuple, dict)

What is a module? (Python)

A Python module is a file containing Python code that can be imported and having a ’.py’ extension.
The name of the module is the name of the file without extension.
Python modules give us a mechanism to solve problems of code persistency, library organisation and naming conflicts (namespaces)
A module can be imported and used in other Python programs to provide access to its defined attributes and functionalities.

Where does python look for a module / package? (Python)

Python looks for a module or package in directories specified by the sys.path variable
python installation directory, system path, and working directory

What does the finally clause mean in the context of exceptions? (Python)

a block of code that will always be executed as the last task before the try statement completes even when prior code threw an error
-> useful for cleaning up and closing objects

Can you define a function like this:

def my_pow(x=10, power)

pass

(Python)

no, default variables have to appear after the parameters that don’t have default values (in the parameter list)
right function —>
def my_pow(power, x = 10)
pass

What is the difference between keyword and positional arguments? (Python)

keyword arguments:
- allow us to pass arguments in any order when calling a function -> using parameter names
- must be placed after positional arguments at function call
positional arguments:
- are passed to a function based on their position or order in the function call: the order in which the arguments are passed matters
- keep them in front of the keyword arguments

How do you create functions that accept arbitrary numbers of arguments? (Python)

the *-operator allows us to create functions that accept arbitrary number of arguments
—> splat operator (star) + name, e.g. def function_name (*args)

How can you unpack collections and pass their values as arguments to a function? (Python)

you can use the *-operator to unpack collections (e.g. lists and tuples) and pass their values as arguments to a function.
my_values = [2021, 12, 31]
def my_date(year, month, day): # triples already parsed as year, month, day
pass
my_date(*my_values) # unpacking
—> in this example, *my_values unpackt the values from the list, and they are passed as arguments to the function my_date

How would you sort the values of a list by their third character? (Python)

you can use the sorted function with a custom key function that extracts the third character from each element in the list
strings = [„Annie","Jake","Paul"]
sorted(strings, key = lambda x:x[2])

What is the difference between append and extend for lists? (Python)

append: add single element / value to the end of a list
extend: add multiple values at the end of a list at once

What does x, y, z do when accessing a list [x:y:z]. Give all eight possibilities and explain what they do. (Python)

list[x:y:z] is used for slicing / extracting a subsequence from a list, while x = start index, y = stop index and z = step size
list[:] - returns a copy of the entire list
list[x] - returns a single element, not a slice: accessing the element at the index x
list[x:] - returns a slice starting from index x to the end of the list
list[:y] - returns a slice from the beginning of the list up to (but not including) index y
list[::z] - returns a slice of the list with a step size of z including every z-th element
list[x:y] - returns a slice starting from index x and up to (but not including) index y
list[x::z] - returns a slice starting from index x with a step size of z
list[:y:z] - returns a slice from the beginning of the list up to (but not including) y with a step size of z
list[x:y:z] - returns a slice starting from index x up to (but not including) y with a step size of z

If we have a list of 20 elements, what does my_list[1:10:2] do? (Python)

start at index x=1 (second element of list) and return this and every second element (z = 2) from then x onwards until index 9 (y = 10)
returns a sublist including elements at indices 1, 3, 5, 7, 9

Briefly explain the function of namedtuple, counter, and OrderedDict (Python)

(all from collection module)
namdedtuple:
- a complex data type that allows to group variables together under one name
- if we care about grouping attributes but don’t really care about modeling behaviour, we can use namedtuple instead of defining a new class
counter:
- counts the occurrences of elements in a collection (e.g., list, tuple, string) and stores them as dictionary keys with their counts as values.
OrderedDict:
- maintains the order of items (key-value pairs) based on the order they were inserted.
- when comparing two objects of the OrderedDict type, not only the items are tested for equality, but also the insertion order

How can you specify the type of a variable at the same time as its value? (Python)

using type hints: they provide information about the expected type of a variable and can be used for documentation and static analysis tools.
You can specify the type of a variable using the colon ”:” followed by the type after the variable's name, e.g. var_name: int = 4 or my_list: list[int] = [1, 2, 3]

How can you specify the return data type of a function? (Python)

You can specify the return data type of a function using a type hint after the arrow (->) in the function definition. Type hints provide information about the expected type of the value returned by the function.
e.g. def my_func() -> datatype: pass
(type hints are optional and not enforced at runtime —> serve as a form of documentation)

What are decorators? (Python)

Decorators allows us to create specialized reusable chunks that can be applied to other functions to modify or extend the behavior of this function. They allow you to wrap another function and perform actions before, after, or around its execution.
using @-notation
we can apply several decorators to one function
They help us to avoid writing boilerplate code. When used correctly, they make reasoning about code easier. They can be used for authentication / autorization, benchmarking, logging,…

Compare list, tuple, set and dict. Name 2 characteristics for each one (Python)

list:
- A Python list is a mutable, heterogeneous, ordered sequence of elements
- can have duplicates
tuple:
- A Python tuple is an immutable, heterogeneous, ordered sequence of elements
set:
- A Python set is a mutable, heterogeneous, unordered sequence of distinct elements. A set can contain only hashable elements
- no duplicates
dict:
- A Python dictionary is a mutable, heterogeneous, unordered sequence of key‐value pairs

Name two different ways to format strings (there are three ways) (Python)

printf-like formatting via the %-operator
- not advised using it
- e.g. ’%s is approx %.2f’ % (’Pi’, 3.14) —> ’Pi is approx 3.14’
str.format
- use the ”format” method to insert values into placeholders within a string.
- allows us to substitute values by index (or named fields)
- e.g. ’{1}, {2}, {0}’.format(’first’, ’second’, ’third’) —> ’second, third, first’
- can be used to specify width and alignment of our substitutions
f-strings:
- String starts with f allow us to embed expressions inside string literals using curly braces { }
- e.g. n = 3 f’{n} bla bla’

Why is numpy faster than normal python lists and under which circumstances? (NumPy)

NumPy: efficiently manipulates large arrays and matrices of numerical data
- implemented in Python and C
- contiguous blocks of memory, which allows for fast and efficient access to elements
- memory efficient and uses vectorization operations —> that means we dont need to explicitly loop over elements and get results of our computations faster
- NumPy arrays are homogeneous (only 1 data type) and store data in sequential chunks of memory —> that means several values can be copied from RAM to the CPU cache at once
Python lists:
- hold references to the values they hold (instead of containing the values themselves, they contain locations in memory where the data is stored)
- That gives us heterogeneity, but that also means we have no guarantee that all (or, at least, enough) of our values are copied to the cache in one operation.
- implemented as a collection of pointers to objects, which can lead to memory fragmentation and slower performance
Numpy not faster if: non-numerical data, small data sets

Provided several pairs of np arrays, determine if a specific operation (like addition) can be broadcasted for each pair. Explain why. (NumPy)

To determine if a specific operation (e.g., addition) can be broadcasted for each pair of NumPy arrays, you need to consider the dimensions of the arrays.
we can perform pairwise operations on arrays of different shapes, as long as arrays are compatible in every dimension
Broadcasting allows us to use a smaller array several times together with a larger array according to the following rules:
- arrays are compatible in a dimension if they have the same size in a given dimension OR if the smaller array has size 1
- if the arrays do not have the same number of dimensions, prepend (add to beginning / as prefix) 1 to the shape of the smaller one until they do
- a smaller array acts as if it was copied along those dimensions where its size is 1
Examples:
- (N x M) x vector -> always possible
- (N x M) x (M x N), (N x M) x (M x P), (N x M) x (N x M) possible
- (N x M) +- (N x M) possible

What are the pros and cons of np arrays compared to vanilla python lists? (NumPy)

Pros:
- implemented in Python and C (faster)
- contiguous blocks of memory, which allows for fast and efficient access to elements
- memory efficient and uses vectorization operations —> that means we dont need to explicitly loop over elements and get results of our computations faster
- NumPy arrays are homogeneous (only 1 data type) and store data in sequential chunks of memory —> that means several values can be copied from RAM to the CPU cache at once
Cons:
- only for numerical, small data sets faster than normal python lists
- lacks flexibility of python lists
- loses efficiency with object datatype
- immutable when one element is added the whole array has to be re-created
- type restrictions: all elements in a np array must be of the same type

When do we want to use copies, when do we want to use views? (NumPy)

copies: new array object containing a copy of the data (deep copy) —> change of value doesn’t change original values
views: new array object that refers to the same data —> change of value leads to change of original value in original object
use copies: when independence and isolation of data are crucial, when you want to modify the data without affecting the original array
use views: for memory efficiency, especially when working with large datasets or performance-critical tasks, e.g. to access specific portions of array without duplication

What does a double index for a column mean? (NumPy)

fancy indexing, e. g.:
- we can combine simple indices with fancy indexing: numbers[2, [2, 0]] —> returns third row, first and third number (is same as [numbers[2, 2], numbers[2, 0]])
- we can combine slicing with fancy indexing too: numbers[1:, [1, 0]] —> everything after first row (2nd, etc) and then the numbers in column 2 and then numbers in column 1 (flipping) —> (1,1), (1,0), (2,1), (2,0) (in a 3x3 matrix)

What are the performance gains using apply / np.vectorize, and why? (NumPy)

aim to improve the performance of element-wise operations on arrays or Series
allow performing an operation on several values at once
can create ufuncs using np.vectorize: a function that accepts NumPy arrays as input and performs computations on them element-wise
The vectorize function is provided primarily for convenience, not for performance (—> we don’t actually want to use np.vectorize!). The implementation is essentially a for-loop.
The apply function is also inefficient —> it uses loops, converts rows into series, etc.

What are different approaches to create numpy arrays? (NumPy)

use array to create arrays from lists, sequences, tuples, ranges,
- e.g. nparray([x, y, z]), nparray((x, y, z))
use fromiter method to create NumPy arrays from iterables
- e.g. np.fromiter({3, 2, 1, 2, 3}, dtype=np.int16)
pre-build: np.zeros(), np.ones(), np-eye(), np.full(), np.random.random(), np.random.randint()

How can you check which type of data a np array contains? (NumPy)

np_array.dtype

How can you access elements of a 2d np array? (NumPy)

by indices:
- e.g. arr_2d[1, 1] —> second value in second row, arr_2d[1] —> second row
using slicing:
- e.g. arr_2d[:2, 1:] —> 1st and 2nd row, and everything after the 1st column (2nd,…)
using boolean indexing:
- e.g. numbers = np.array([…]), booleans = (numbers % 2 == 0)
by passing an array of indices to access multiple elements at once:
- e.g. numbers[[0, 2, 1],[1, 2, 0]] is the same as [numbers[0, 1], numbers[2, 2], numbers[1, 0]]
combine simple indices or slicing with fancy indexing:
- e.g. numbers[2, [2, 0]] —> returns third row, first and third number
- e.g. numbers[1:, [1, 0]] —> everything after first row (2nd, etc) and then the numbers in column 2 and then numbers in column 1

How can you find elements fulfilling a specific condition in a np array? (NumPy)

we can get indices or values of elements satisfying a specific condition via the where method:
- e.g. indices = np.where(arr > 15) —> arr[indices] (to get values) or indices (to get indices)

Name 3 datatypes of numpy? (NumPy)

int8, int16, int32, int64 (signed integers, positive and negative)
uint8, uint16, uint32, uint64 (unsigned integers, only non negative values)
float16, float32, float64
bool8, unicode, string, object, complex64, complex128

How can you create a pandas dataframe? (Pandas)

create dataframe with pd.DataFrame(): e.g. students_df = pd.DataFrame(arr, columns=())
We can create a DataFrame using
- a dictionary of Series
- a structured or 2D NumPy array
- an iterable of iterables
- a dictionary of iterables
- an iterable of dictionaries
- explicit column names
- (from a file, e.g. using read_csv())

What are the advantages of pandas? (Pandas)

simplify working with tabular data
combine high performance of NumPy with flexibility of spreadsheets and RDBs
enable flexible handling of missing data
provide routines to reshape, split and aggregate, and select subsets of data
implement merging and other relational operations
exchange data between in-memory data and files: CSV, txt, Microsoft Excel, SQL DBs, JSON, etc.
efficient data visualization
integration with NumPy bzw. builds on top of NumPy

Define the pd.Index (Pandas)

an Index is an immutable array or an immutable ordered multi-set built on top of a NumPy array
- It also provides immutability, but be careful —> set copy=True to not make a shallow copy
can be of any type
allows to perform set-like operations (reindexing) (e.g. pd_index.intersection, .union, .difference), attributes from numpy arrays (e.g. pd_index.size, .shape, .ndim, .dtype), or some other useful methods (e.g. pd_index.is_unique, .has_duplicates, .is_monotonic_increasing, .insert(), .delete(), .copy(), .max(), .argmax(), .value_counts(), .drop_duplicates())

Can a pd.Series object contain heterogenous data? (Pandas)

yes due to automatic interferance but stored as object type

Give examples for how to create a pd.Series (Pandas)

can be created from a scalar, list, sequence, iterable, NumPy array (behaves like a view so might want to use copy=True), dictionary (keys become index values)

What is the difference between head / tail and nlargest / nsmallest? (Pandas)

df.head(n): return the first n rows (per default equal to 5)
df.tail(n): returns the last n rows
nlargest(n, columns) / nsmallest(n, columns): return first / last n rows sorted by given columns

Are the data types of pd.DataFrame limited to the np.array data types? (Pandas)

no, since pandas extends the capabilities of NumPy by also introducing additional data types
e.g. pd.StringDtype, pd.CategoricalDtype

What exactly do we gain by using the pd.StringDtype data type? (Pandas)

avoid mixing strings with non‐strings when using object
explicitly setting string data type simplifies understanding code

How can you apply a function to a dataframe? (Pandas)

use apply() method to perform computations on values, rows, or columns
apply() allows you to apply a function along the axis of the DataFrame, either row-wise (axis = 1) or column-wise (axis = 0 (default) or axis = index)
df.apply(function, axis) while axis is optional
can also use lambda functions or other callable objects
(keep in mind that apply lacks efficiency)

Name a few methods that you can apply on strings? (Pandas)

len(), upper(), lower(), startswith(), split(), slice(), …

What is the difference between None, np.nan and pd.NA? (Pandas)

None: universal null type, not the same as 0
np.nan: same as NaN or NAN, not a number constant to store None-like values / not valid Numbers, floating point value
pd.NA: specific missing data indicator introduced in pandas to adress some limitations of np.nan, e.g. better support for data types including integers and booleans

Why would you use pandas visualize? -> not in Lecture? (Pandas)

fast exploration of data with quick and easy visualization
more sophisticated than matplotlib
integration with DataFrame / Series
Various plot types
Automatic handling of Labels and Legends
Integration with matplotlib

Name five ways to deal with missing values in Pandas? (Pandas)

drop missing values, rows, and columns using df.dropna(); can use parameters
- thresh for threshold
- axis=’columns’ because default is rows
- how=’all’ when you want to remove them only when all values are missing
- subset=[…] to check this subset of columns when removing rows)
fill missing values using:
- filling with a single value (using fillna() method)
- forward propagation (using fillna(method=’ffill’)) —> missing values will not be replaced if there is no last valid value
- backward propagation (using fillna(method=’bfill’)) —> missing values will not be replaced if there is no next valid value
- filling with a dictionary / Series (using fillna({dictionary}))
- interpolation (using interpolate() method) (uses linear interpolation by default. We can change the technique by passing a corresponding value for the method parameter)

Types of plots and when to use them -> not in Lecture (Plots)

line plot: relationship between a continuous independent variable and continuous dependent variable
histogram: distributions
boxplot: compare distributions (e.g. percentiles, median, outliers)
violin plot: like box plot + data density
scatterplot: relationship between two continuous variables (to identify patterns or correlations)
one continuous and one discrete variable —> use bar plots, box plots or violin plots by category

What is the difference between primary and secondary databases? What are examples for each of them? (Databases)

primary db:
- unprocessed raw data from experimental and observational studies including annotations
- minimally processed and regulary updated, often institutional funding
- e.g. DNA sequences, Protein Sequences and Structures
- GenBank, Uniprot, PDB
secondary db:
- collection of topic-centric and highly processed structured subsets of primary data
- more specialized, curated, serve specific topics, may have lower release frequency, often project funding based
- e.g. defines structural hierarchies, patterns, interactions
- SCOP, CATH, PROSITE, PFAM

What is the CAP theorem? (Databases)

Consistency, availability and partition tolerance cannot all completely be satified at the same time —> only 2, Availability and partition tolerance is the important combination
C: consistency: All replicating nodes of a database system have the same state after a transaction; Read access to any node returns the same results
A: availability: acceptable response time
P: partition tolerance: If a node or a connection fails the system remains to be responsive

Why do we use the CAP theorem? (Databases)

Systems can provide at most 2 of these 3 guarantees
for understanding trade-off involved in designing distributed systems and for choosing the right architecture for a given problem

BASE principle (Databases)

Basically Available
Soft state
Eventually consistent
(prioritizes availability and responsiveness over strong consistency -> emphasizes relaxing consistency)

CRUD principle (Databases)

represents a minimum (fundamental) set of access functions that can be performed on data
Create, Read, Update, Delete
(SQL: insert, select, update, delete)
(HTTP: Post, Get, Put, Delete)

What is a document based model? (Databases)

are used in databases to store data in semi-structured or structured document format, and each document typically contains key-value pairs, arrays or nested structures

Why shouldn't we store biological databases in SQL databases, and why is it better to use NoSQL? (Databases)

SQL: fixed schema, Tables
NoSQL:
- more flexible and scalable storage since Biological data often has a dynamic and evolving schema
- Biological data often semi-structured —> SQL expects strict tabular structure while NoSQL can handle semi-structured data better
- NoSQL can handle variability better, since Biological data can vary greatly (e.g. metadata, different lengths, etc.) —> store in JSON format for nested and flexible data structures

Question about order of magnitude of primary databases (Databases)

10^12 residues GenBank, 10^9 sequences
10^13 residues WGS etc., 10^9 sequences
568k sequences in Swiss-Prot, 205.5mio amino acids
10^9 sequences in TrEMBL, slightly under 10^12 amino acids
PDB: 200k in total, 172k X-ray, 13.9k NMR, 14.1k Electronmicroscopy

Structure of flat file format (Databases)

fixed number of columns
structure is modelled via identation and column number
a line is called a record and is typed (variable number of lines)
typing happens via a keywords in the first columns
subkeyword are indented
no keyword: continuation of the previous line

How is an HTTP request structured? (Databases)

composed of request line + header + optional body with meta data in key / value format
request line: application of verb (GET / PUT / POST / DELETE) to a noun and an optional response

What is a Graph database? (Databases)

Based on graph or tree structures to connect elements
Property Graph as data model:
- Nodes to reflect items ( entities
- Edges to reflect relations
very suitable for traversing, fraud detection, regulatory networks, semantic web

4 NoSQL types (Databases)

Key / Value / Tuple Stores
Wide Column Stores / Column Family Systems
Document Stores
Graph Databases

SwissProt entry compared to UniProt entry (Databases)

SwissProt:
- extensively reviewed and annotated by experts
- curated, expert-reviewed and highly reliable section of UniProt
UniProt:
- includes TrEMBL which contains automatically annotated entries
- includes SwissProt but also TrEMBL and provides accuracy but also broader resource

What is the primary database for Protein structures? What is a secondary database for protein structures? (Databases)

primary for protein structures: PDB
secondary for protein structures: SCOP

Name one primary database and explain what is stored in it (Databases)

GenBank: DNA / RNA sequences
Uniprot: Protein sequences
PDB: Protein structures

Name one secondary database and explain what is stored in it (Databases)

SCOP / CATH: structural classification and hierarchies of proteins
PFAM: Protein sequences, focus on protein families and domains
PROSITE: patterns and interactions in proteins

Explain the CATH principle and for what each letter stands for. (Databases)

C: class
A: architecture
T: topology
H: homologous superfamily
(Sort protein structures based on a single domain considering C, A, T, H)

Given: json file; Exercise: write different commands to count or find certain document using MongoDB language (Databases)

find: db.collection_name.find({criteria},fields_to_show)
count: db.collection_name.count({criteria})

Explain the Property Graph Model (Databases)

Nodes: to reflect items/entities
Edges: to reflect relations
Property: key-value pair to describe attributes / characteristics associated with nodes or edges

Join Course

Preview

Author

Carina S.

Information

Last changed
2 years ago

Report course