Which statements regarding the following genetic cross of Drosophila are correct?
Females heterozygous for grey body (Gg) and normal wings (Aa) are crossed with males with black body (gg) and krippled wings (aa). This results in the following offspring:
236 grey body, normal wings
253 black body, krippled wings
50 grey body, krippled wings
61 black body, normal wings
Which statement is correct?
A) The two studied genes are linked with a distance of 50cM
B) The two studied genes are unlinked
C) There are more recombinant offspring than non-recombinant offspring
D) The two studied genes are linked with a distance of ~18.5cM
E) The results are what one would expect from Mendels 3. law of Independent Assortment
d
A) Larger DNA fragments can be cloned when using BACs compared to using plasmids, also because BACs have fewer copy numbers per E.coli cell
B) BACs were chosen for the HGP as they had especially high DNA yields
C) The average insert size of a BAC is ~1000kbp
D) BACs were used as the main cloning system as they allowed to sequence the inserts completely from both sides in just two sequencing reactions
E) A physical map of human BACs was mainly done to link the physical map to the cytogenetic map
a
If sequencing errors are fully unbiased (=random) and the quality score of a single coverage is 5, what is the theoretically expected quality score for 2x coverage?
10
11
20
100
Cannot be calculated
1% quality score: 20
In human genomics, a 1% quality score of 20 typically refers to Phred quality scores (Q scores) used in DNA sequencing. Here’s why:
Phred scores are logarithmic measures of base call accuracy in sequencing.
The formula for Phred scores is: Q=−10log10(P)Q = -10 \log_{10} (P)Q=−10log10(P) where P is the probability of an incorrect base call.
A quality score of 20 (Q20) means: P=10−(20/10)=10−2=0.01P = 10^{- (20/10)} = 10^{-2} = 0.01P=10−(20/10)=10−2=0.01 This means there is a 1% probability of error per base call (hence, 99% accuracy).
In sequencing, a Q20 score means 99% base call accuracy or 1% error rate.
This is a widely accepted benchmark in sequencing, where:
Q30 = 99.9% accuracy (0.1% error rate)
Q40 = 99.99% accuracy (0.01% error rate)
A 1% quality score refers to a Phred score of 20 because it corresponds to a 1% base call error rate in sequencing.
Which statements regarding Genome Assemblies are correct?
1. Long, repetitive sequences in the genome are a major problem for genome assemblies.
2. Several assembly versions of the human genome exist.
3. A N50 contig length of 100,000 is better than a N50 contig length of 50,000.
4. A N90 contig length of 50,000 is better than a N50 contig length of 50,000.
5. ScaƯolds are an ordered set of contigs and can contain gaps.
Answer choices:
( ) Only 1, 3, and 5 are correct
( ) Only 2, 3, and 4 are correct
( ) Only 2, 3, 4, and 5 are correct
( ) Only 1 and 2 are correct
( ) All are correct
all are correct
A genome has been sequenced at 3x coverage using whole genome shotgun sequencing. Assuming a Poisson distribution (definition below), what percentage of the genome is covered by the read?
~50%
~33%
~15%
~10%
~5%
E
In the last fifteen years sequencing costs have dropped faster than computer chip costs
1 million bp of DNA can be sequenced today at less than 0.1% of the price than 15 years ago
A human genome can be sequenced today routinely for less than 100 dollars
The major drop in sequencing costs in the last fifteen years has been caused by so-called Next-generation sequencing
The major drop in sequencing costs in the last fifteen years has been caused by automatization of capillary sequencing
Only 2 is correct
Only 4 is correct
Only 1, 2 and 4 are correct
Only 1, 2 and 5 are correct
Only 1, 2, 3 and 4 are correct
c)
human genomics: The 2018 (only one option)
GOAL human genomics 1990
map variation tumor
Illumina cluster process not correct
clusters are generation by synthesis process
illumina sequencing
illumina sequencing: 3,4
Illumina data: base calling refers to conversion of image data into nucleotide sequences
Which statement regarding these Next-Gen Sequencing technologies is correct?
The BGI sequencies are based on complete Genomic technology and have the advantage that they have longer read lengths.
The BGI sequencers use emulsion amplification
Ion Torrent provides the cheapest per base sequencing currently available
Ion Torrent uses emulsion PCR that amplifies DNA on beads
BGI Sequencers detect protons to sequence DNA
Pac Bio-sequencing
1,3,4: florescence group cleaved off pyrosphosphate, single fluorescent group incorporated DNA,
accuracy of the DNA sequence increased by sequencing circular DNA templates
Which statement regarding Oxford Nanopore sequencing are correct?
An advantage of the technology is that the highest accuracy (=highest Phred score) per base sequenced
The used nanopores are small holes in a lipid bilayer that are cut with a laser.
Oxford Nanopore can generate sequence reads over 100kbp.
Oxford Nanopore sequencers are the current standard for de novo genome sequencing.
The Minion Oxford Nanopore sequencers are about the size of a candy bar.
None is correct
Only 1 and 5 are correct
Only 3 and 4 are correct
Only 3 and 5 are correct
Only 2, 3 and 5 are correct
SDS NOT correct: SPs make VP 50 % of human
de novo: 4,5 -> the T2P consortium sequenced effectively haploid human cell line avoided diploid, T27 recently gapless assembly of female human genome
What is meant by exome sequencing?
( ) The sequencing of formalin-fixed DNA
( ) The sequencing of genomic DNA after enriching for known exons
( ) The sequencing of mRNAs, respectively cDNAs
( ) The sequencing without (ex) reference to a genome (ome)
( ) The sequencing of patients versus controls
2
Sequencing Technologies
Which statement regarding the scale of current genome sequencing are correct?
( ) UK Biobank has just released 500,000 whole genomes sequenced at ~30x coverage with PacBio and/or Oxford Nanopore.
( ) GnomAD is an eƯort to re-sequence genomes worldwide under new ethical standards.
( ) The PanCancer Analysis of Whole Genomes Consortium just analyzed >2800 whole genomes sequenced solely by Oxford Nanopore.
( ) The 1000 Genomes project is completed and has sequenced over 100,000 US citizens.
( ) The Vertebrate Genomes Project plans to sequence 70,000 vertebrate genomes de novo in the coming 10 years using long-read sequencing technologies.
1,5
2,3 scoring matrices alignments: one sets of gap penalty often higher gap extension penalty, scoring matrix best alignment
k = 5: 8
mapping short reads: algorithm uses heuristics speed up alignment process
gene annotations correct: the sequence contains open reading frame
Which statements regarding the number of (protein-coding) genes in the human genome are true?
1) The number of protein-coding genes has been continuously corrected upwards in the last 15 years
2) In the vertebrates the number of genes correlates well with genome size
3) There are several bacteria known that have more genes than humans
4) Daphnia (water flea) is estimated to have more genes than humans
5) The chicken has ~twice as many protein coding genes as humans
4
human protein coding genes: 1,2,4:
1 % of protein coding genes, 5 UTRs average shorter 3U%3S, protein coding more 3 exons on average
Genes and Gene Expression 4.1 Which statements regarding splicing in human protein-coding genes are true?
1. All genes with more than one exon can get spliced.
2. The splice acceptor site is located in the exon.
3. Alternative splicing aƯects most genes.
4. Alternative splicing aƯects, by definition, the open reading frame.
5. When aligning the cDNA sequence of an mRNA to the genome, the introns are not aligned.
( ) Only 2, 4, and 5 are correct
( ) Only 1 and 3 are correct
( ) Only 1, 2, 3, and 4 are correct
Only 1, 3, and 5 are correct
hich statements regarding non-coding RNAs (ncRNAs) are correct?
1,3,5
20.000 annotated
Most ncRNAs are expressed at lower levels than protein coding RNAs
most annotated ncRNAs are linCRNAs
Which statements regarding human pseudogenes are correct?
1) In order to be annotated as a pseudogene, it needs to be experimentally show to have no function
2) Most of the pseudogenes originated via non-allelic homologous recombination
3) Pseudogenes are usually homologs of functional genes
4) The disruption of the ORF is in most cases the decisive hallmark of a pseudogene
5) There are more pseudogenes than protein-coding genes annotated in the human genome
A) Only 3 and 4 are true
B) Only 1, 3 and 4 are true
C) Only 1 and 3 are true
D) Only 2, 4 and 5 are true
E) Only 1 and 4 are true
repeated sequence is not true:
exaptation refers specifically to how integration of transposable elements leads to changes in gene regulation
Which statements regarding quantitative RNQ-seq analysis are correct?
Demultiplexing is the process in which reads are separated based on their barcodes that label different RNA-seq libraries
Mapping of reads is always done before gene count normalization
Normalization is the process that corrects for different sequencing depths among libraries
BLAT is not used for mapping since it is to slow
Read counts per transcript are more difficult to estimate than read counts per gene
1, 2 and 3
1, 3 and 4
2, 4 and 5
1, 3, 4 and 5
All are correct
TEs: The full-length Alu-element reverse transcriptase
vertebrates not correct: largest genome 1000 fold larger human genome
Which statements regarding RNA in a mammalian or human cells are correct?
10-30pg is the average RNA content of a cell
mRNA just makes up 1-5% of the total RNA in a cell
Silica membranes, as e.g. sold by Quiagen, are used to separate mRNA from rRNA
An Agilent Bioanalyzer is an instrument to purify RNA
The so-called ?RIN-value? is a measure of RNA quality
Only 1 is correct
Only 2 and 4 are correct
Which statements regarding Transcriptomics are correct?
1) Additional Transcriptome annotation (=qualitative transcriptomics) is usually not needed when performing RNA-seq experiments in humans or mice
2) Long reads are more crucial for Transcriptome annotation (=qualitative transcriptomics) than for quantifying transcript levels
3) PacBio or Oxford Nanopore sequencing are currently the best choices for Transcriptome annotation (=qualitative transcriptomics)
4) PacBio or Oxford Nanopore sequencing are currently the best choices for quantifying expression levels (=quantitative transcriptomics)
5) Transcriptome assembly using RNA-seq with Illumina sequencing is generally done by cloning cDNA into plasmids followed by shotgun sequencing of individual plasmids
A) Only 2 is correct
B) Only 3 and 4 are correct
C) Only 1, 2 and 3 are correct
D) Only 5 is correct
E) All are correct
Only 1, 2 and 3 are correct
gene expression analysis is FALSE:
to quantify expression on microarray unlabeled RNA from sample is bounded to the labeled probe
Genes and Gene Expression
Which statements regarding single-cell RNA-seq (scRNA-seq) are correct?
2,5: UMIs reduce noise, caused Human Cell Atlas
International Project aims to characterize cell types
n the standard error of the mean is expected to be:
A) 10
B) 10000
C) 100
D) 1000
E) 1
Which of the following statements is NOT true for the standard error of the mean (S.E.M.)?
( ) The S.E.M. cannot be a negative number.
( ) The standard error of the mean is usually estimated as the standard deviation divided by the square root of the number of samples.
( ) The S.E.M. is an estimate of how far the sample mean might diƯer from the population mean. ( ) The S.E.M. is the standard deviation of the sampling distribution of the mean.
( ) The S.E.M. increases as the sample size increases.
The S.E.M. cannot be a negative number.
In a paper it is written that “… Gene TBP53 is 2.3-fold higher expressed in patients (mean=8.8, sd= 0.56, N=5) than in controls (mean=7.7, sd=0.37, N=5), which is significant (t-test,p=0.005).” Which statements are correct?
1) Mean expression levels are given in log2 space
2) The standard error of the mean for patients is 0.56
3) The standard error of the mean for patients is 0.56/square root(5) = 0.25
4) Less than 0.5% of the patients have lower expression levels than controls
5) If they tested 10.000 genes on a microarray and this is the gene with the most significant p-value, it is very likely a false positive result
1,3,5:
Which statements regarding the False Discovery Rate (FDR) are true?
If the FDR is 5% for 1000 discoveries (e.g. differently expressed genes), there is maximally a 5% chance that at least one of them is a false positive.
If the FDR is 5% for 1000 discoveries (e.g. differently expressed genes), it is very likely that there is at least one false positive among them.
If the FDR is 5% for 1000 discoveries (e.g. differently expressed genes) one expects fifty false positive among them
The Benjamini-Hochberg procedure is a common way to calculate the FDR
The Bonferroni correction is a common way to calculate the FDR
Only 1 and 5 are true
Only 3 and 4 are true
Only 1, 2 and 4 are true
Only 2, 3 and 4 are true
Only 2, 3 and 5 are true
value correct 2,3,5
publication bias etc
Which statement regarding unsupervised learning methods for gene expression is FALSE?
A) An advantage of PCA versus t-SNE is that the distance between clusters can be better interpretated
B) In a Principal Component Analysis (PCA) the first principal component (PC1) always explains more or equal variation than the second principal component (PC2)
C) An advantage of tSNE versus PCA is that local cluster structures are better visible
D) Hierarchical clustering of gene expression can be done with different measures of similarity, such as the Pearson correlation coefficient or the Euclidian distance
E) PCA, t-SNE and UMAp identify clusters and assign p-values to cluster differences
the size of cluster in tSNA visualization proportional variance in that cluster
Which statement regarding the Gene Ontology is correct?
( ) The Gene Ontology is a computational method to detect enriched pathways in diƯerentially expressed genes.
( ) The Gene Ontology only covers mammalian gene annotations.
( ) The minimal data structure is a directed acyclic graph, i.e., each child annotation can have more than one parent annotation.
( ) The Gene Ontology consortium was founded to facilitate gene expression analyses.
( ) For each major model organism, there exists one specific Gene Ontology domain.
The minimal data structure is a directed acyclic graph, i.e., each child annotation can have more than one parent annotation.
Last changed8 days ago