Wie unterscheidet sich die Gendichte von Bakterien mit der von höheren Eukaryoten?
Gendichte bei Prokaryoten (~ 90%) >>> Gendichte bei Eukaryonten (~ 1-2%)
In Bakterien liegt die Gendichte bei 1 Gen pro 1.000 - 1400 Basen und in höheren Eukaryoten liegt die Gendichte bei 1 Gen pro 100.000 Basen.
Bakterien haben also eine höhere Gendichte als höhere Eukaryoten.
Wie wird das Sequenzlogo mit dem EM-Algorithmus dargestellt?
start with initial guesses for region and size (e.g. region of a binding site is already known from prior experiments)
1) expectation step:
position-wise composition of the site is used to estimate the probability of finding the site at any position of the seqs
these probabilities are used in turn to provide new information as to the expected base distribution for each column
2) maximization step: new counts of bases for each position in the site found in E-step are substituted for the previous set
E- and M-steps repeated until convergence
This is e.g. done by the MEME (Multiple EM for Motif Elucidation)
Was wird mit der Höhe der Abschnitte in einem Sequenzlogo ausgesagt?
measure of conservation of the base at the position
information content / entropy in bits
GREY:
can be corrected by base frequencies of the bases
data might include pseudocounts to overcome effects of missing data
the maximum value for DNA bases is 2 bits. (log2(4))
Warum ist es wichtig auch Pseudogene zu kennen?
pseudogenes: Nonfunctional sequences of genomic DNA that are originally derived from functional genes, but exhibit such degenerative features as premature stop codons and frameshift mutations that prevent their expression
might interfer with experiments
PCR and hybridization experiments
transcribed pseudogenes
interference with disease diagnostics and treatment
molecular record of dynamics and evolution of genomes
rate of nucleotide substitutions
rate of DNA loss
improvement of gene prediction and annotation efforts
Was bedeutet "multiplicity" und "co-operativity" in Zusammenhang mit miRNA target Interaktionen?
multiplicity: one miRNA can target more than one gene
co-operativity: one gene can be controlled by more than one miRNA
Wie verändert sich der positive Vorhersagewert, wenn das Target mit dem Informanten stark übereinstimmt?
Genvorhersage für D. melanogaster:
too diverged → number of mismatches low because most of sequence can not be aligned
too close → number of mismatches low because sequence is unchanged
for D. melanogaster best acc. with using D. ananassae with ~1 substitution per synonymous site
for Human mouse would be a good informant (~0.6 substitutions per synonymous site)
Wie kann es dazu kommen, dass in ein Transkript ein alternatives Exon hinzugefügt wird und das zu einem verkürzten Protein-Produkt führt?
exon has alternative stop codon
alternative exon leads to frame shift → former out of frame stop codon located nearer to the start comes in frame
Nennen Sie einen möglichen Ursprung für Operons.
???
Rolle des Horizontellen Gentransfers: Vorteil komplette Sets an Genen zu übertragen und dem Empfänger einen definierten Phenotyp zu übertragen
evtl. ausgehend von thermophilen Bakterien
Gen-Duplikation und -Fusion: Durch Gen-Duplikation könnten mehrere Kopien eines Gens entstanden sein, die sich anschließend unterschiedlich spezialisiert haben. Diese Duplikate könnten sich dann in der Nähe zueinander angeordnet haben, um unter der Kontrolle eines einzigen Promotors reguliert zu werden, was zur Bildung eines Operons führte. Dies könnte durch genetische Rekombination oder andere chromosomale Veränderungen geschehen sein.
Horizontaler Gentransfer: Bakterien können Gene über horizontalen Gentransfer austauschen. Wenn mehrere funktionell verwandte Gene zusammen übertragen werden, könnten sie sich in einem neuen Wirt zu einem Operon zusammenschließen, um eine koordinierte Regulation zu ermöglichen. Diese Zusammenführung könnte durch gemeinsame Regulationselemente und Promotoren begünstigt werden.
Selektionsdruck: Ein starker Selektionsdruck könnte die Bildung von Operons begünstigt haben. Organismen, die in der Lage waren, mehrere Gene gemeinsam und effizient zu regulieren, hatten möglicherweise einen Vorteil in bestimmten Umgebungen, was zu einer natürlichen Selektion für solche genetischen Strukturen führte.
Regulatorische Effizienz: Die Nähe von funktionell verwandten Genen kann die regulatorische Effizienz erhöhen. Wenn Gene, die an demselben Stoffwechselweg beteiligt sind, nah beieinander liegen, können sie leichter und schneller durch gemeinsame Regulatorproteine kontrolliert werden. Dies könnte zur Selektion für die Bildung von Operons geführt haben.
Co-transkriptionale Vorteile: Die gleichzeitige Transkription von Genen, die für zusammenarbeitende Proteine kodieren, könnte den Zusammenbau von Proteinkomplexen erleichtern und sicherstellen, dass alle notwendigen Komponenten in den richtigen Verhältnissen vorhanden sind. Dies könnte einen evolutionären Vorteil bieten und die Bildung von Operons fördern.
Wie wirkt sich eine Vergrößerung des Windows auf den positiven Vorhersagewert eines ORFs aus? / Wie wirkt sich eine Vergrößerung des Frameshift der ORF-Länge auf die Genauigkeit der Vorhersage aus?
[vmtl. bezogen auf GeneMark???]
higher sensitivity
lower specificity
What are pseudogenes? What are the two main classes distinguished?
nicht-funktionale Gene, aus funktionalen Genen hervorgegangen besitzen degenerative Eigenschaften (missense/nonsense mutations), die Expression verhindern
Classes:
conventional
processed pseudogenes
Explain the Ka/Ks ratio. What does the value say about conservation, what conclusions can be made about the selection pressure?
• Ka - Zahl der nicht synonymen Mutationen
• Ks - Zahl der synonymen Mutationen
• Ka/Ks höher mit niedrigerer Konservierung
• Ka/Ks = 1 ⇒ kein Selektionsdruck
• Ka/Ks > 1 ⇒ positiver Selektionsdruck (positive selection)
• Ka/Ks < 1 ⇒ negativer Selektionsdruck (purifying selection)
for pseudogenes Ka/Ks = 1 expected
The majority of human genes undergo “purifying selection,” the evolutionary process disfavors nucleotide mutations that cause detrimental amino acid substitutions in the protein thus keeps the protein as it is
experimental < 1: underestimated Ka/Ks as genes were compared with present day genes and not the ancestral functional gene that gave rise to the processed pseudogene
What are the three strategies for gene prediction? Give an example for each.
Content based:
Beispiel (ORFs, Codon usage, Repeat periodicity, Compositional complexity)
Site based:
Beispiel (splice sites, TFbinding sites, Consensus sequences, Polyadenylation signals, start/stop codons)
Comparative:
Beispiel (Inference based on homology, Protein sequence similarity, Modular structure of proteins usually precludes finding complete gene)
Ordnen Sie folgende Programme dem richtigen Begriff zu:
ORPHEUS
Sift
MiRScan
Genescan
REPuter
TFFM
Kraken
Annovar
Augustus+
Welche Eigenschaften hat ein starker Promoter?
DNA sequence that facilitates a high rate of transcription
efficiently binds to the RNA polymerase and promotes robust transcription initiation
strong promoter has a high affinity for the RNA polymerase, allowing efficient binding and initiation of transcription
presence of specific sequence motifs within the promoter region
Effiziente Bindung der RNA-Polymerase: Ein starker Promotor hat Sequenzen, die eine hohe Affinität zur RNA-Polymerase aufweisen, wodurch die Enzymbindung erleichtert und die Initiation der Transkription effizienter wird.
Starke Aktivator-Bindungsstellen: Starke Promotoren können Bindungsstellen für Transkriptionsaktivatoren enthalten, die die Transkription fördern, indem sie die RNA-Polymerase rekrutieren oder deren Aktivität erhöhen.
Minimale Repressoren-Bindungsstellen: Um die Transkription nicht zu hemmen, weisen starke Promotoren oft wenige oder keine Bindungsstellen für Repressoren auf, die die Transkription blockieren könnten.
Mangel an sekundären Strukturen: Die DNA-Sequenz eines starken Promotors hat wenige sekundäre Strukturen (wie Haarnadelstrukturen), die die Bindung der RNA-Polymerase und den Transkriptionsprozess stören könnten.
Was ist ein Sigma Faktor? Wofür wird dieser in der Transkription benötigt? Welcher Sigma Faktor tritt am häufigsten auf?
Untereinheiten der bakteriellen RNA-Polymerase, die notwendig sind, um die RNA-Polymerase an spezifische Promotorsequenzen der DNA zu binden und die Transkription zu starten
Promotorerkennung: Sigma-Faktoren erkennen und binden spezifische Promotorsequenzen auf der DNA.
Initiation der Transkription: Nachdem der Sigma-Faktor die RNA-Polymerase an den Promotor gebunden hat, hilft er bei der Entwindung der DNA-Doppelhelix, um den Einzelstrang als Matrize für die RNA-Synthese zugänglich zu machen. Dies ermöglicht den Beginn der RNA-Transkription.
Austauschbarkeit: Verschiedene Sigma-Faktoren können je nach Umweltbedingungen oder Wachstumsphasen unterschiedliche Gene regulieren. Bakterien verfügen über mehrere Sigma-Faktoren, die jeweils spezifische Promotoren erkennen und so die Expression verschiedener Gen-Sets steuern.
Regulation der Genexpression: Sigma-Faktoren ermöglichen es Bakterien, schnell auf Umweltveränderungen zu reagieren. Zum Beispiel kann ein bestimmter Sigma-Faktor aktiviert werden, wenn die Zelle Stressbedingungen ausgesetzt ist, wodurch Gene exprimiert werden, die an der Stressantwort beteiligt sind.
Spezifität und Funktion: Jeder Sigma-Faktor hat eine spezifische Rolle und erkennt eine bestimmte Gruppe von Promotoren:
Sigma-70 (σ70): Der Haupt-Sigma-Faktor, der für die Transkription der meisten Haushaltsgene unter normalen Wachstumsbedingungen verantwortlich ist.
dissociable subunit of the RNA polymerase holoenzyme needed for transcription initiation from promoter elements
enables specific binding of promoter region
There are multiple interchangeable sigma factors, each of which recognizes a distinct set of promoters (promoters of house-keeping/heat-shock genes)
σ70 (primary sigma factor) is expressed under normal conditions in E. coli
Sequencingverfahren
Nennen Sie drei Unterschiede des Whole Genome Shotgun und des Clone-by-Clone Verfahrens.
Welches Sequenzierungs-Verfahren wird eher für prokayrotische Genome und welches für eukaryotische Genome verwendet? Erklären Sie genau, warum dies so ist.
Historicaly clone-by-clone was used more common for eukaryotic genes as it allows to overcome challenges with highly repetitive and complex regions in eukaryotic genomes
WGS particularly suitable for organisms with smaller genomes and less complex genomic structures
Approaches can be combined in a hybrid shotgun-sequencing approach
Nennen Sie vier alternative Splicing Varianten.
Wie kann man herausfinden, ob ein alternatives Splicing stattgefunden hat?
AS can be verified by analysing RNA isoforms
using RT-PCR with primers that flank the alternatively spliced region → different length of PCR product
using microarrays (high-throuput approach) with exon-exon junction probes
Nennen Sie die Vorgehensweise / zwei Effekte von Alternative Splicing. Welche Auswirkungen hat es, wenn das Proteinprodukt dadurch größer wird?
Ablauf Splicing:
5 critical bases: 5’ splice site / donor splice site (GU), branch point (A), 3’ splice site / acceptor splice site (AG)
Types of AS:
constitutive AS: gene is always spliced the same way
regulated AS: different forms are generated under different conditions
Roles of AS:
Addition of new protein parts
multiple effects: ← Auswirkungen wenn größer/anders
alter protein binding properties
alter intracellular localization
alter extracellular localization
alter enzymatic or signaling activities
alter protein stability
…
Example: Transkription factor without activation part (from one exon) is a Repressor
Influence RNA function
AS alters 5’ or 3’ UTR regions → effects subcellular localization and/or RNA stability
Coordinated Regulation of Biological Events
neuron development (DSCAM)
Channel activity associated with hearing
Muscle contraction
Beschreibe eine Methode, wie man mit bioinformatischen Mitteln Alternative Splicing analysieren kann. Gehe besonders auf die notwendigen Daten ein.
Alignment of ESTs (expressed sequence tags) against DNA (/ pre-mRNA?) sequence
Insertions and deletions in the ESTs relative to the [?pre-]mRNA are identified as potential altervaltive splices
Alternative splices are detected when two splices are mutually exclusive
Requires ESTs which are cDNA sequences derived from mRNA with reverse transcriptase
Was für einen Vorteil hat es, wenn Gene sich zu einem Operon zusammengefügt haben?/Vorteil von Operons?
Defintion operon:
Multigene bacterial operons have one promoter and one transcriptional stop. The transcript holds more than one gene with multiple translational starts and stops.
Reason:
Koordinierte Genexpression: Durch die Organisation von Genen in einem Operon können alle Gene, die für eine bestimmte biochemische Funktion oder einen Stoffwechselweg benötigt werden, gleichzeitig und in einem koordinierten Muster exprimiert werden. Dies gewährleistet, dass alle benötigten Proteine in der richtigen Menge zur Verfügung stehen.
Effizienz in der Genregulation: Ein Operon ermöglicht die Regulierung mehrerer Gene durch eine einzige regulatorische Region (wie einen Promotor und einen Operator). Dies spart Energie und Ressourcen, da nur ein Regulatorprotein notwendig ist, um die Expression mehrerer Gene zu steuern.
Schnelle Anpassung an Umweltveränderungen: Bakterien können schnell auf Änderungen in ihrer Umwelt reagieren, indem sie die Expression ganzer Gencluster in einem Operon anpassen. Zum Beispiel kann ein Bakterium, das in eine Umgebung mit Laktose gelangt, schnell die Gene des Lac-Operons aktivieren, um die Laktose zu verwerten.
Synchronisierte Abschaltung: Ebenso wie die gleichzeitige Aktivierung ermöglicht ein Operon die gleichzeitige Abschaltung der Genexpression. Dies ist nützlich, wenn die Produkte der Gene nicht mehr benötigt werden, was Energie und Ressourcen spart.
Kompakte genetische Organisation: Operons tragen zur kompakten Organisation des Genoms bei, was besonders in prokaryotischen Zellen mit ihrem begrenzten Platzangebot wichtig ist.
Nennen Sie vier Vorhersagemöglichkeiten, sowie eine Erklärung warum es damit möglich ist./Nenne 4 Sachen zur Operonvorhersage.
prokaryotische Genvorhersagemethoden:
EcoParse: HMMs for gene prediction with different models for the intergenic region depending on operon or non-operon genes: “long intergenic region” and “short”. (p. 52)
might show different distribution of base frequencies as regulatory elements are missing for genes in an operon (e.g. no RBS in (-20)...(-1) region of start codon of the second gene)
ORPHEUS: Tool based on intrinsic and extrinsic information
DPS match → use as seed ORF and refine start and stop of ORF → derive codon usage → derive RBS weight matrix → full set of predicted genes
detects genes and RBS → can derive: operon or not
GeneMark:
fifth-order markov model
uses intrinsic information about frequency of hexamers in each of the frames and background
GeneMark.hmm:
HMM with states for start codons, typical/atypical (e.g. horizontal gene transfer/Class III gene) gene and stop codon for +/- strand
GLIMMER:
interpolated markov models
detects patterns present in known gene sequences
TESTCODE:
every third base tents to be the same much more often than random in coding regions (AA composition bias + codon bias)
Welche zwei Klassen von Informationen werden in der Genvorhersage verwendet? Nennen Sie auch je zwei Unterklassen dieser Informationen.
intrinsic information
conserved splice signals
hexamer composition of exons/introns
reading frame consistency of exons
exon/intron length distribution
promoter and polyA signals
isochore differences
extrinsic information
EST
cDNA
protein-genome alignments
Was ist die Kozak-Sequenz?
DNA motif for protein translation initiation site in most eukaryotic mRNA transcripts
ribosomal binding site
(region arround start codon)
5'-(gcc)gccRccAUGG-3'
(eukaryotic equivalent to Shine-Dalgarno)
Skizzieren Sie den Aufbau von GenScan.
Architecture:
Generalized HMM (GHMM)
models both strands at the same time; from intergenic state model can enter states for + strand genes or - strand genes
states:
N: intergenic region
P: promotor (sensor for TATA)
F: five-prime UTR
than either single-exon gene or model for multiple exon gene
single-exon genes are modeled by a single state (Esngl)
multiple exons:
state for initial exon models region from translational start to donor splice site Einit
3 states for different phases of introns (Ik for k: 0: between codons, 1: after first base, 2: after second base)
3 states for exons between introns also for keeping the phase information Ek
terminal exon Eterm
T: three-prime UTR
A: poly-A signal (sensor for Cap signal)
reverts to N
GeneMarkS-T komplett aufschreiben. Insb. darauf eingehen, ob und wie sich die Anzahl an Transkripten auswirkt.
GeneMarkS (GeneMark.hmm EucaryoticSelf-training):
parallel unsupervised training and prediction
based on eukaryotik GeneMark.hmm; architecture:
Generalized HMM
models single exon genes and multiple exons genes
models strands at the same time
initial start site
for single exon genes only one single exon gene state
for multiple exon genes:
initial exon
donor site
intron
acceptor site
internal exon (goes back to donor site)
or terminal exon
stop site state
intergenic region state
Procedure:
all parameters of the model with reduced architecture are initialized
reduced architecture:
donor/acceptor only with two canconic dinucleotides
initiation/termination site: canonic start/stop codons
sequences emitted by non-site states: uniform length distributions
non-coding: zero-order Markov model, parameters estimated based on nucleotide frequencies in the genome
coding: different approaches, e.g. [pre?]trained on long ORFs
GeneMark run to get coding and non-coding labels
subset of uniformly labeled fragments are used to reestimate parameters
repeat until convergence
RNA-Strukturvorhersage
Gegeben ist folgende Formel: Erklären Sie die einzelnen Schritte und skizzieren Sie diese.
Wie könnte man die obige Formel noch verbessern?
Base pair maximization: Recursive definition of the best score for a subsequence i,j → four possibilities:
1: i,j are a base pair, added on to a structure for i+1…j-1, add +1
2: i is unpaired, added on to a structure for i+1…j
3: j is unpaired, added on to a structure for i…j-1
4: i,j are paired, but not to each other: the structure for i..j adds together substructures for two sub-sequences, i..k and k+1..j (bifurcation)
Verbesserung:
It is more plausible that an RNA adopts a globally minimum energy structure, not the structure with the maximum number of base pairs → predict overall free energy
Additionally use thermodynamic information
negative stacking energy for matches
positive destabalizing energies for loops (size-dependend)
Was sind covariance models? Was ist deren Ziel?
Describes both the secondary structure and the primary sequence consensus of an RNA
Can be applied to several RNA analysis problems
consensus secondary structure prediction
multiple sequence alignment
database similarity searching
Covariance models are constructed automatically
from existing RNA sequence alignments
even from initially unaligned example sequences
Iterative training procedure
Optimal algorithm for RNA secondary structure prediction based on pairwise covariations in multiple alignments
Statistical model that captures the patterns of covariation that can be obtained from a MSA. Covariated bases tend to coevolve as this ensures that the base pair is maintained and RNA structure is conserved. RNA structure prediction can be improved by giving positions with greater covariation more weight.
A covariance model of tRNA sequences is an extremely sensitive and discriminative tool for searching for additional tRNAs and tRNA-related sequences in sequence databases.
What is Rfam?
Rfam is a database of ncRNA families represented by multiple sequence alignments and profile SCFGs
Struktur aufmalen, die nicht vorhergesagt werden kann
Pseudoknots
pseudoknots violate the recursive definition of the optimal score S(i,j)
Repetitive Sequenzen
Nennen Sie alle Klassen von Interspersed Repeats.
“Complex repeats” (interspersed repeats)
Constitute ~45% of the human genome
Derived from biologically active ‘transposable elements’ (TEs)
Involve RNA intermediates (retroelements) or DNA intermediates (DNA transposons)
Retroelements: reproduce via reverse transcription followed by integration into the host DNA
long-terminal repeat transposons (LTR)
long interspersed elements (LINEs); these encode a reverse transcriptase
short interspersed elements (SINEs); these include Alu repeats
DIRS-like elements
Penelope-like elements (PLEs)
DNA transposons: capable of integrating themselves to, and excising themselves from, the host genome, thus taking advantage of the host replication through this ‘cut-and-paste’ mechanism.
DNA transposons constitute 3% of the human genome
—————————————————
Transposable Elements (TEs):
Class I TEs (Retrotransposons): These elements transpose through an RNA intermediate. They are transcribed into RNA, which is then reverse transcribed back into DNA and inserted into a new location in the genome. Subtypes include Long Interspersed Nuclear Elements (LINEs) and Short Interspersed Nuclear Elements (SINEs).
LINEs: Autonomous elements capable of independent transposition.
SINEs: Non-autonomous elements that rely on the enzymatic machinery of LINEs for their transposition.
Class II TEs (DNA Transposons): These elements transpose directly through a DNA intermediate. They cut and paste themselves into new locations within the genome.
Processed Pseudogenes: These are non-functional sequences derived from functional genes through retrotransposition. They are created when mRNA transcripts are reverse transcribed and inserted back into the genome without their regulatory elements.
Other Retrotransposons: This category includes elements like endogenous retroviruses (ERVs), which are remnants of ancient viral infections that have integrated into the host genome.
—————————————————————
Interspersed repeats:
Retroelements:
LINEs (Long Interspersed Nuclear Elements) [autonomous]
SINEs (Short Interspersed Nuclear Elements) [nonautonomous]
LTRs (Long Terminal Repeat Retrotransposons)
DNA-Transposons
Tandemly repeated DNA:
Microsatellites
Minisatellites
Cryptically simple repeats
Low complexity repeats
Satellite and telomeric repeats
Segmental Duplications
TIRs (Terminal inverted repeats)
PLEs (Penelope-like elements)
Nennen Sie zwei Eigenschaften von Interspersed Repeats.
Involve RNA intermediates (Retroelements) or DNA intermediates (DNA transposons)
Mobility:
conservative transposition
replicative transposition
retrotransposition
Welche drei anderen repetitive Sequenzklassen außer interspersed repeats gibt es noch? Welche Unterschiede gibt es zwischen Interspersed Repeats zu den genannten Formen?
Tandemly repeated DNA (Simple sequence repeats without interuption)
one to a dozen base pairs
may be formed by replication slippage
a dozen to 500 base pairs
Segemented duplications
nearly identical copies ranging in size from 1 to >200 kb
originate from duplicative transpositions
Pseudogenes
derived from functional genes but with deleterious mutation
SNPs
Was versteht man unter SNPs?
Single nucleotide polymorphisms (SNPs)
occurs when a single nucleotide replaces one of the other three nucleotide letters. SNPs found in a coding seq are of great interest as they are more likely to alter function of a protein.
most common type of genetic variation in humans.
account for 90% of the variation between individuals.
Welche zwei Klassen von SNPs unterscheidet man und was ist der Unterschied zwischen den beiden? / Welche Typen gibt es; beschreiben.
Synonymous vs Nonsynonymous
Synonymous:
not causing a change in the amino acid
Non-synonymous:
A nonsynonymous or missense variant is a single base change in a coding region that causes an amino acid change in the corresponding protein
Transition vs Transversion
transition: changes a purine to another purine (A ↔ G), or a pyrimidine to another pyrimidine (C ↔ T)
transversion: change from purine (A/G) to pyrimidine (T/C) or vice versa.
Wieso kann es durch SNPs auf kodierenden und nicht-kodierenden Regionen zu Krankheiten führen?
SNPs may be informative with respect to disease:
Functional variation. A SNP associated with a nonsynonymous substitution in a coding region will change the amino acid sequence of a protein.
Regulatory variation. A SNP in a noncoding region can influence gene expression.
Association. SNPs can be used in whole-genome association studies. SNP frequency is compared between affected and control populations.
miRNA
Nennen Sie drei Unterschiede zwischen Pflanzen und Tier miRNA.
Number of miRNA genes present:
Plants: 100-200 genes
Animals: 100-500
Location within genome:
Plants: predominantly intergenic regions
Animals: intergenic regions, introns
Presence of miRNA clusters:
Plants: uncommon
Animals: common
miRNA biosynthesis:
Plants: Dicer-like
Animals: Drosha, Dicer
Mechanism of repression:
Plants: mRNA-cleavage (methylation?)
Animals: Translational repression
Location of miRNA-binding motifs:
Plants: predominantly in the ORF
Animals: predominantly in the 3’-UTR
Number of miRNA-binding sites within target sites:
Plants: Generally one
Animals: Generally multiple
Function of known target genes:
Plants: Regulatory genes - crucial for development, enzymes
Animals: Regulatory genes - crucial for development, structural proteins, enzymes
Erläutern Sie den Arbeitsablauf des targetScan Algorithmus.
Welche Daten benötigt man für deren Berechnung?
Welche Nachteile haben diese?
TargetScan:
Vorhersage von miRNA-Zielgenen
thermodynamics-based modeling of RNA:RNA duplex interactions
comparative sequence analysis
Input:
miRNA that is conserved in multiple organisms
a set of orthologous 3‘ UTR sequences from these organisms
Structures, energies, and scoring for predicted RNA-duplexes
search the UTRs in the first organism for segments of perfect Watson-Crick complementarity to bases 2–8 of the miRNA: “miRNA seed” and “seed matches”
extend each seed match with additional base pairs to the miRNA as far as possible in each direction, allowing G:U pairs, but stopping at mismatches
optimize basepairing of the remaining 3‘ portion of the miRNA to the 35 bases of the UTR immediately 5‘ of each seed match using the RNAfold program
assign a folding free energy G to each such miRNA:target site interaction
assign a Z score to each UTR
sort the UTRs in this organism by Z score and assign a rank Ri to each
predict as targets those genes for which both Zi≥ZC and Ri≤RCfor an orthologous UTR sequence in each organism, where ZC and RC are pre-chosen Z score and rank
Nachteile:
Incompleteness of orthologous gene annotations
Some targets may not meet the stringent seed matching, Z score, or rank criteria
Some target sites may lie outside the 3‘ UTR (plants)
Some targets may not be conserved in the complete set of organisms
⇒ The actual number of target genes regulated by each miRNA is likely to be substantially higher
Welcher Proteinkomplex ist bei der Umwandlung von pre-miRNA in miRNA in Tieren beteiligt?
DICER
micro-RNA:
Family of 21-25 nucleotide small RNAs
Function: altering the expression levels of a diverse repertoire of genes in a sequence-dependent manner:
at the transcriptional or post-transcriptional level
regulate many aspects of development and physiology
RNA-Seq
Nenne je zwei Vor- und Nachteile für hybridisierungs und sequenzbasierende Verfahren, zb Microarray vs RNA-Seq
Hybridisierungsverfahren (Microarrays):
Vorteile:
Relatively low cost
Well established in clinical use
Analysis only of pre-defined sequences
Dynamic range limited by scanner
high background-noise
cross-hybridization möglich
Sequenzbasierdende Verfahren (RNA-seq):
identifizierung alternativer Splicevarianten/neue Transkripte
hohe sensitivität
relatively high cost
high computational effort
prone to contamination
Beschreibe kurz das Vorgehen von RNA-Seq
Identifies the full set of transcripts, including large and small RNAs, novel transcripts from unannotated genes, rare transcripts, splicing isoforms and gene-fusion transcripts
Reveals the complex landscape and dynamics of the transcriptome from yeast to human at an unprecedented level of sensitivity and accuracy
Base-pair-level resolution and a much higher dynamic range of expression levels
Overview of the experimental steps in an RNA sequencing (RNA-seq) protocol
RNA extraction → target enrichment → cDNA → library prep → sequencing → Transcriptome/genome mapping → data analysis
Experimental design: number of replicates, depth of sequencing
Parameters: alignment rate, desired power, significance level, log-fold change
RNA-seq workflow
Quality control
Alignment of reads to reference genome
Transcriptome assembly
Differential expression
Gegeben eine Darstellung von Exons und junction reads, male die Genexpressionskurve
Vergleich pro-/eu-karyoten
Nennen Sie drei Unterschiede zwischen prokaryotischen und eukaryotischen Genomen.
Size:
prokaryotes between 1s and 10s of Mb
eukaryotes between 1s and 1.000s of Mb
Topology:
prokaryotes: mostly circular
eukyryotes: mostly linear
Gene number:
prokaryotes: most <10.000
eukaryotes: often >10.000
Pseudogenes:
prokaryotes: few
eukaryotes: many
Complexity:
prokaryotes: low
eukaryotes: high
Horizontal gene transfer:
prokaryotes: frequent
eukaryotes: rare
Intergenic regions:
prokaryotes: short (<100kb)
eukaryotes: long (often >100kb)
Genome duplication:
prokaryotes: none
eukaryotes: frequent (especially in plants)
Gene duplication:
prokaryotes: rare
eukaryotes: frequent
Repeated sequences:
prokaryotes: minor components
eukaryotes: major components
What is the Shine-Dalgarno sequence?
ribosomal binding site of bacterial mRNA
What are ORFs?
Sequences from a start codon to a stop codon
may be a gene
What are ESTs?
Expressed Sequence Tags (ESTs) are short sub-sequences of cDNA (complementary DNA) sequences that are generated from mRNA transcripts
Why study repeats?
Repeats are believed to play significant roles in genome evolution and disease
Mobile elements (transposons and retrotransposons) may contain coding regions that are hard to distinguish from other types of genes
Repeats often induce many local alignments, complicating sequence assembly, comparisons between genomes and analysis of large-scale duplications and rearrangements
What is the K-mer approach to find repeats?
Sequences are scanned for overrepresented string of certain length
Since repeats and transposons in particular are not exactly the same, some mismatches must be allowed when oligo frequencies are calculated.
Challenge: to determine optimal size of an oligo (k-mer) and the number of mismatches allowed
What does REPUTER do?
Determines all exact repetitive substrings in complete genomes
What is RepeatFinder?
A clustering method for repeat analysis in DNA sequences
First identify all exact repeats in the input sequence (Reputer)
Then define repeat classes by merging and extending these short exact matches
What is RepeatMasker?
Best known program
Uses precompiled representative sequence libraries to find homologous copies of known repeat families
What is the density of SNPs
about 1 every 100 to 300 bases
What is the goal of Polyphen?
Structural consequences of the respective non-synonymous mutations in proteins
Goal: to obtain a lower limit estimate for the quantity of non-synonymous SNPs that might have phenotypic effects
How many SNPs are associated with multifactorial human disorders?
Name 4 impact of amino acid variants
Folding
Interaction sites
Solubility
Stability
What does Gibbs sampler do?
Stochastic implementation of expectation maximization motif finding
Takes a weighted sample of subsequences (not ALL sequences)
What is the Exon Chaining Problem?
Given a set of putative exons, find a maximum set of non-overlapping putative exons
Input: a set of weighted intervals (putative exons)
Output: A maximum chain of intervals from this set
Last changed5 months ago