what is a phenotyp?
a phenotyp is an individual obervable trait, such as height, eye color, and blood type. the genetic contribution to the phenotype is calles the genotype. some traits detemined by the genotype, while other traits are largely detemined by environmental factors.
what is a genome?
the enirely of DNA in a living cell.
for practical reasons we differentiate between the nuclear, the mitochindrial, and the plastid genomes.
DNA is typically organized in chromosomes, either circular (bacteria) or linear (eukaryotes) molecules
there can be extra chromosomal DNA, e.g. in form of plasmids. in the case of linear chromosomes, we distinguish telomers (chromosome ends) and centromer
how does DNA sequencing work?
extract DNA
prepare Library
make copies
sequncing
analysis of data
what is known about c. reinhardtii?
gene comprises 14,271 putative protein-coding genes
an estimated 20% function in metabolism
the network comprises 1080 genes, associated with 2190 reactions and 1068 unique metabolites
iRC1 080 accounts for the activity of 32% of the estimated genes with metabolic functions
what steps can be done to identify a gene?
why should be do genome sequencing projects?
Acces to the genome sequence of related organisms/species
catalogues of:
the encoded protein-coding and RNA genes
transposable elements
structural variation
genetic diversity within population
metabolic pathways > natural compounds
assesing the metabolic capacities of species
understanding the link between genotype and phenotype
reconstruct evolutionary events both on organisms and molecular level
how can you do a gene prediction?
where do we sequence DNA?
1st generation (1977)
Sanger method: Sequencing by synthesis
maxam gilbert method: chemical sequencing
2nd generation (“next generation”; 2005)
454 - pyrosequencing
SOLiD - sequencing by synthesis
Ion torrent - ion semiconductor
Pac Bio - Single Molecule Real-Time sequencing, 1000 bp
3rd generation (2015)
Pac Bio - SMRT, Sequel system, very long reads
Nanopore - ion current detection, very long reads
how does the genome assembly strategy work?
how does the genome annotation workflow look like?
what does transcription does to the genome?
transcription mobilizes information encoded in the genome
what types of RNA exist?
Ribosomal (rRNA)
responsible for protein synthesis
60% of a ribosome; up to 95% of total RNA in a cell
Messenger (mRNA)
Translated into protein in ribosome
3-4% of total RNA in a cell
Micro (miRNA)
sort (22 bp) non-coding RNA involved in expression regulation
Transfer (tRNA)
bring sepicific amino acid for protein synthesis
Others (IncRNA, shRNA, siRNA, snoRNA..)
what is a transcriptome assembly?
build new or improved profile of transcribed regions (“gene models”) of an uncharacterized genome
rapid access to (protein-coding) genes without bothering with genome assembly and gene predicition
what does metatranscriptomes do?
transcriptome analysis of. community of different species (e.g., gut bacteria, hot springs, soil)
gain insights on the functioning and activity rather than just who is present
what is differential gene expression (DGE)?
quantitative evaluation and comparison of transcript levels, usually between different groups
vast majority of RNA-Seq is for DGE
“under which condition do gene express”
why do you study RNA-Protein interaction?
gain insights into regulatory networks controlling gene expression
what is the general workflow of RNASEQ?
how to you go from RNA to sequence data?
how does the illumination sequencing work?
synthesis in cycles
attached fluorescent marker
pictures taken after removing terminator
what informations are typically in a fastq format?
why is it problematic that you have a huge amount of copies of a sequence?
errors during PCR amplification render copies not 100% identical. Especially errors at an early stage of the PCR can mimic heterozygous positions
not every pool of millions of sequences will incorporate a base in each cycle. with increasing number of cycles the length heterogenity of the already sequenced fraction will increase and the sequencing will get out of phase
how does cycle sequencing work?
size seperation via electrophoresis and detection of fluorescence markers
detecting
base calling
check for quality paramters for Phred Score:
peak sequencing - how is it distributed?
uncalled/calles ration - clear peaks?
peak resolution - how deep is the valley?
how does pacbio single molecule real time sequencing (SMRT) work?
sequencing by synthesis
terminator free technology
fluorescent labeled phosphate chain
uses DNA polymerase, fixed
read length - 20 kbp
individual reads have a substantial sequencing error
optimal for repeat resolution, general genome assembly and saffolding
what are the key points of the library preperation of SMRT?
library of overlapping inserts
hairpin adaptors create a circular molecule
adaptors contain binding site for DNA polymerase
sequencing results in a long sequencing read
can genearte multiple subreads from one template
combine subreads to create circulation consenus reads
How are HiFi reads generated?
how does nanopre sequencing work?
what are the key points of oxford nanopore single molecule sequencing?
sensed DNA by measuring changes to ion flowbase-pairs at a time
multiple base-pairs at a time (k-mers)
characterisitic current signature is comverted to nucleotide sequences
no theoretical upper limit to sequencing read length, practical limit only in delivering DNA to the pore intact
how are the sequencing errors compared to the sequencing data?
Map Illumination reads with bowtie2 and minimap2 for Pacbio and ONT alignment, use samtolls to compare to reference
Illumina reads align almost perfectly, with a per read median of 99,3% correct. Indels almost never occur.
PacBio reads which have an median of 89.2% of the read correct. Most frequent error type: insertions (7.45% median) with mismatches only 1.5% median of % read
ONT reads have per read median of 92.4% correct, with deletions (9%) and mismtaches (4.5%) both at a relatively high mediat per read
what is the firs step of data anlaysis?
pre-processing
Zuletzt geändertvor 19 Tagen