What is the difference between proximate and ultimate causes in biology?
Proximate causes: Explain how a trait works mechanistically
Ultimate causes: Explain why a trait evolved and persists in terms of reproductive success and evolutionary history
(Intro)
What are the key properties of evolution relevant for evolutionary genetics?
Genetic variation is the raw material for evolution
Evolution acts at the population level through inheritance with modification
Evolution is not goal-directed and not equivalent to natural selection
Evolution is ongoing and observable today
What are the core ingredients of Darwinian evolution?
Heritable variation
Overproduction of offspring with limited resources
Differences in reproductive success linked to trait variation
Why are germline mutations central to evolutionary genetics, but somatic mutations are not?
Germline mutations are heritable and can be passed to future generations, whereas somatic mutations affect only the individual and do not contribute to evolutionary change.
(L1 - Mutation)
Why does DNA sequence variation alone not directly determine evolutionary outcomes?
Selection acts on phenotypes, not directly on DNA; only the genetically encoded component of the phenotype influences fitness and evolution.
What is the infinite sites model and when is it justified?
It assumes an infinite number of mutable sites and that each mutation occurs at a previously unmutated site; it is justified when mutation rates are low relative to population timescales.
Why do mutation rates differ between autosomes and sex chromosomes?
Because male and female germ lines undergo different numbers of cell divisions, causing male-biased mutation rates that affect chromosomes differently depending on how much time they spend in each sex.
What is the Hardy–Weinberg equilibrium (HWE)?
Hardy–Weinberg equilibrium is a null model stating that in an ideal population, genotype frequencies are determined solely by allele frequencies and remain constant across generations.
(L2 - HWE)
What assumptions are required for Hardy–Weinberg equilibrium?
infinitely large population
random mating
no selection
no mutation
no migration
no recombination (single locus)
no segregation distortion
What are the Hardy–Weinberg genotype frequencies for a bi-allelic locus?
For alleles A and a with frequencies p and q:
Why do allele frequencies remain constant under Hardy–Weinberg equilibrium?
Because Mendelian segregation and random mating do not change allele frequencies; inheritance alone does not reduce or bias allelic variation.
Why does recessiveness not cause an allele to disappear under HWE?
Because allele transmission is independent of dominance; recessive alleles are transmitted as frequently as dominant ones when no selection acts.
Why is Hardy–Weinberg equilibrium useful in population genetics?
It allows genotype frequencies to be inferred from allele frequencies and provides a baseline to detect evolutionary forces when populations deviate from HWE.
What does a heterozygote deficiency indicate relative to Hardy–Weinberg expectations?
It suggests non-random mating, inbreeding, population structure (Wahlund effect), or selection acting on genotypes.
How do assortative mating and inbreeding affect Hardy–Weinberg equilibrium?
Both increase homozygosity and reduce heterozygosity without changing allele frequencies; inbreeding affects the entire genome, assortative mating affects specific loci.
Why must population structure be considered before testing for Hardy–Weinberg deviations?
Pooling genetically distinct subpopulations can create apparent heterozygote deficiency even when each subpopulation is individually in Hardy–Weinberg equilibrium.
What is genetic drift?
Genetic drift is the random change of allele frequencies across generations due to chance sampling of alleles in finite populations, independent of fitness.
(L3 - Genetic Drift)
How does genetic drift differ from natural selection?
Selection changes allele frequencies due to fitness differences, whereas genetic drift changes allele frequencies due to random chance (“bad luck”), even when alleles are selectively neutral.
Why is genetic drift stronger in small populations?
Because random sampling variance is larger when fewer gene copies are drawn each generation, leading to larger fluctuations in allele frequencies.
What is the Wright–Fisher model and why is it important?
The Wright–Fisher model describes allele frequency change under random sampling in finite populations and serves as the null model for genetic drift in population genetics.
What happens to allele frequencies under genetic drift in the long run?
In the absence of mutation, every allele will eventually either fix (frequency = 1) or be lost (frequency = 0).
What is the probability that a neutral allele fixes under genetic drift?
The probability of fixation of a neutral allele equals its current allele frequency p.
How does genetic drift affect heterozygosity within populations?
Genetic drift reduces heterozygosity over time; the expected heterozygosity declines by a factor of 1-1/2N per generation.
How fast does genetic drift erode genetic variation?
The half-life of heterozygosity is approximately proportional to population size (~1.39N generations), meaning drift acts slowly in large populations and rapidly in small ones.
Why does genetic drift increase variation among populations but decrease variation within populations?
Drift causes populations to diverge randomly in allele frequencies, increasing among-population variance, while within each population alleles are lost or fixed, reducing genetic diversity.
What is effective population size (Ne)?
The effective population size is the size of an idealized population that would experience genetic drift at the same rate as the real population, often much smaller than the census size Nc.
(L4 - Neutral Theory)
Why is Ne usually smaller than the census population size Nc?
Because factors such as unequal sex ratios, variance in reproductive success, population size fluctuations, overlapping generations, and population structure increase drift and reduce the number of individuals effectively contributing genes to the next generation.
How do population bottlenecks affect effective population size?
Bottlenecks strongly reduce Ne because Ne is approximately given by the harmonic mean of population sizes over time, which is dominated by the smallest population sizes.
How does unequal sex ratio affect Ne?
Unequal contributions of males and females reduce Ne; when few individuals of one sex reproduce, genetic drift is greatly intensified despite a large census size.
Why can the Wright–Fisher model still be used in natural populations?
Even when its assumptions are violated, the Wright–Fisher model remains applicable if the census size Nc is replaced by an appropriately defined effective population size Ne.
Why is mutation alone a weak force for changing allele frequencies?
Because mutation rates are very low, allele frequencies change extremely slowly under mutation alone, often requiring hundreds of thousands to millions of generations for noticeable change.
What is the Neutral Theory of Evolution?
The Neutral Theory states that most molecular genetic variation and divergence result from the interaction of mutation and genetic drift, with selection mainly removing strongly deleterious mutations.
What is mutation–drift equilibrium?
Mutation–drift equilibrium is the balance point where mutation introduces new genetic variation at the same rate that genetic drift removes it, leading to a stable expected level of heterozygosity.
What is θ (theta) in population genetics?
θ = 4Ne u is the population mutation parameter that determines expected heterozygosity, nucleotide diversity, and the number of differences between two randomly sampled sequences.
Why does the Neutral Theory predict a molecular clock?
Neutral Theory predicts a molecular clock because the rate of neutral substitutions equals the mutation rate, since the number of new mutations and their fixation probability scale inversely with population size and cancel out.
What is the main purpose of coalescent theory?
To infer past evolutionary processes from present-day genetic samples by modeling genealogies backward in time.
(L5 - Coalescent)
Why does coalescent theory trace gene copies rather than individuals?
Because recombination causes different loci to have different genealogies; the coalescent applies to single, non-recombining loci.
How does effective population size Ne affect coalescence times?
Larger Ne leads to longer waiting times to coalescence, while smaller Ne causes faster coalescence and reduced genetic diversity.
Why do diploid autosomal genes have longer coalescence times than haploid genes?
Because there are twice as many gene copies, reducing the chance that two lineages share a parent in the previous generation.
How does sample size n affect the depth of a coalescent tree?
Increasing n mainly adds short terminal branches but has little effect on the total depth (TMRCA).
Why is most coalescent time spent waiting for the last two lineages to coalesce?
Because coalescence probability decreases as the number of remaining lineages decreases, making the final coalescent event dominate total tree height.
How does population growth affect coalescent genealogies?
Population growth produces long external branches and short internal branches, reflecting recent expansion and compressed ancestral history.
How does a population bottleneck affect genealogies and variation?
Bottlenecks reduce Ne, accelerate coalescence, shorten trees, and strongly reduce genetic diversity.
Why is sequencing more loci usually more informative than sequencing more individuals?
Because different loci have independent genealogies, whereas additional individuals mainly add short terminal branches.
Why is time rescaled by N in the continuous-time coalescent?
Rescaling removes explicit population-size dependence, simplifying analysis and simulation of genealogies.
What does it mean that coalescent waiting times are exponentially distributed?
Coalescent events occur randomly in time with no memory, meaning the chance of coalescence does not depend on how long one has already waited.
Why can mutations be added after generating the genealogy?
Under neutrality, mutations do not affect reproduction, allowing genealogy (descent) and mutation (state) to be treated independently.
How does mutation rate affect observed genetic variation under a fixed genealogy?
Higher mutation rates increase the number of observed differences but do not change the underlying genealogy.
What does θ=2 Ne μ summarize biologically?
It captures how mutation and drift together determine expected genetic diversity in a population.
Why does adding more samples yield diminishing returns for detecting genetic variation?
Because most branch length lies deep in the tree; additional samples mainly add short tips with few new mutations.
Why is population structure important in evolutionary genetics?
Because non-random mating due to spatial structure affects allele frequencies, heterozygosity, and can confound inference about selection or demography.
(L6 - Population Subdivision)
What is the Wahlund effect?
A reduction in heterozygosity caused by pooling subpopulations with different allele frequencies, even if each subpopulation is in Hardy–Weinberg equilibrium.
Why can heterozygosity be reduced even if all subpopulations are in HWE?
Because allele frequencies differ among subpopulations, causing fewer heterozygotes than expected under a single panmictic population.
What do Hs and Ht represent conceptually?
Hs is the average expected heterozygosity within subpopulations, while Ht is the expected heterozygosity of the pooled population.
What does FST measure?
The relative reduction in heterozygosity due to population subdivision, quantifying genetic differentiation among populations.
When is FST = 0?
When allele frequencies are identical among populations, even if there is inbreeding within populations.
When does FST approach 1?
When populations are fixed for different alleles and there is no shared genetic variation.
Why does FST depend not only on allele-frequency differences but also on overall diversity?
Because FST is standardized by total heterozygosity HT, so the same allele-frequency difference can yield different FST values depending on diversity.
How does genetic drift affect FST over time in isolated populations?
Genetic drift increases FST over time as allele frequencies diverge and heterozygosity is lost.
Why can low FST values be ambiguous?
Because low FST can result from either high migration rates or short divergence times; FST alone cannot distinguish these scenarios.
What does FIS measure?
Deviations from Hardy–Weinberg equilibrium within subpopulations, often interpreted as inbreeding or assortative mating.
How can FIS = 0 but FST > 0?
When subpopulations are each in Hardy–Weinberg equilibrium but differ in allele frequencies, indicating population structure without local inbreeding.
How does migration affect population differentiation at equilibrium?
Migration counteracts drift by homogenizing allele frequencies; even very low migration rates can strongly reduce FST.
Why is the parameter Ne * m (effective migrants) difficult to interpret biologically?
Because it combines population size and migration into a single value and assumes equilibrium conditions that are often violated in nature.
In the basic selection model, at which life stage does selection act and what does that imply for HWE?
Selection acts from zygotes → adults via differential viability. Adults can deviate from HWE, but random mating restores zygotes to HWE each generation.
(L7 - Selection)
How is mean fitness \overbar{w} computed and what is its role?
It is the normalization factor to convert “after selection” genotype frequencies back to frequencies summing to 1.
Why does only relative fitness matter for allele-frequency change?
Scaling all genotype fitness values by the same constant changes population size but not allele-frequency dynamics; only fitness differences affect selection response
What is the definition of the selection coefficient s in this lecture’s convention?
Relative fitness can be expressed as w=1−s, where s measures the fitness reduction relative to the best genotype (which is scaled to 1).
For the same dominant advantageous case, what is the recursion for allele frequency after selection?
What is the change in allele frequency Δp in the dominant advantageous case and what does it imply?-
Implications: selection is weak when q is small (because of q^2) and also weak when p is tiny (because of the p term).
Why does a dominant beneficial allele increase slowly when rare (parameter reasoning)?
Because Δp∝q^2: when the deleterious allele is rare, it hides in heterozygotes, and selection only “sees” it in homozygotes.
What is the general selection model fitness scheme using dominance h?
For alleles A1 (advantageous) and A2 (deleterious):
h controls how much the heterozygote is affected.
What is the “most important” allele-frequency change equation given for the general dominance model (interpret, don’t memorize derivation)?
How do special cases of h change the dynamics in the context of selection?
What is the weak-selection approximation mentioned and what does it tell you?-
So changes are tiny and drift can dominate.
When does selection maintain genetic variation instead of removing it?
Under heterozygote advantage (overdominance), where w12>w11 and w12>w22, producing a stable polymorphic equilibrium.
Under heterozygote advantage with fitness w11=1−s, w12=1, w22=1−t, what are the equilibrium allele frequencies?
Why is the overdominance equilibrium stable (exam logic)?
Why is the fate of a novel mutation mainly determined by drift, even if it is beneficial?
Because when a mutation is very rare, stochastic loss due to Mendelian segregation dominates; selection only acts effectively once the allele has survived the first few generations.
(L8 - Selection Drift)
What is the fixation probability of a neutral mutation entering a diploid population?
This is very small in large populations, meaning most neutral mutations are lost.
What is the approximate fixation probability of a beneficial mutation with selection coefficient s?
pfix ~ 2s
Even strongly advantageous mutations usually fail to fix.
Why is selection “not omnipotent” for new beneficial mutations?
Because even mutations with substantial fitness advantages are likely to be lost early by chance; selection can only act on mutations that escape initial stochastic loss.
How does effective population size modify fixation probability?
When does population size start to matter for allele dynamics?
Population size matters once alleles are common enough that frequency changes (rather than copy number changes) dominate, i.e. when drift acts on allele frequencies.
What is Kimura’s general fixation probability for a beneficial allele with initial frequency p?
What does the parameter cNes represent biologically?
It measures the relative strength of selection compared to drift; selection matters only when fitness differences are large relative to random genetic drift.
Under what condition does selection dominate drift?
What is the nearly neutral regime?
Why does the same mutation behave differently in small vs large populations?
Because the threshold at which selection overcomes drift depends on Ne; small populations require much larger fitness effects for selection to be effective.
Why do small populations accumulate deleterious mutations more easily?
Because weakly deleterious mutations fall into the nearly neutral range and can drift to fixation before selection removes them.
How does the nearly neutral theory extend the neutral theory?
It includes mutations with small fitness effects whose fate depends on population size, explaining why diversity does not scale linearly with Ne.
Why does the neutral theory predict less heterozygosity variation than observed?
Because it assumes all mutations are strictly neutral; allowing nearly neutral mutations weakens the dependence of heterozygosity on population size.
What is Wright’s key insight about the role of drift in adaptation?
Genetic drift can allow populations to cross fitness valleys by chance, enabling selection to reach higher adaptive peaks that would be inaccessible by selection alone.
What does Mendel’s law of independent assortment predict for a dihybrid test cross?
If loci assort independently, the four offspring genotype combinations occur in a 1 : 1 : 1 : 1 ratio.
(Merril - Recombination)
What inheritance pattern indicates complete linkage between two loci?
Only parental haplotypes are observed (1 : 1 : 0 : 0 ratio); no recombinant genotypes occur.
What observation led to the discovery of recombination?
Many crosses showed offspring ratios intermediate between independent assortment and complete linkage, implying crossing-over between loci on the same chromosome.
What is recombination frequency and how is it interpreted?
It is the proportion of recombinant offspring; it reflects the genetic distance between loci (more recombination → larger distance).
What is a centimorgan (cM)?
A genetic map unit equal to 1% recombinant offspring (1 recombinant per 100 progeny).
Why do genetic maps underestimate physical distance when markers are far apart?
Because double crossovers can restore parental haplotypes and go undetected unless markers are densely spaced.
Why can recombination rates differ between sexes and species?
Because crossing-over is regulated biologically and can be sex-limited (e.g. absent in male Drosophila) or species-specific.
Why do small chromosomes often have higher recombination rates?
Because each chromosome arm requires at least one obligate crossover during meiosis, inflating recombination per unit length.
What is the four-gamete test used for?
To infer the presence of recombination by detecting all four possible haplotypes between two bi-allelic loci.
Under the infinite-sites assumption, what does observing four haplotypes imply?
That at least one recombination event must have occurred (mutation alone can produce at most three haplotypes).
Why does recombination generate continuous phenotypic variation from discrete loci?
Because recombination reshuffles alleles across multiple loci, producing many new genotype combinations each generation.
How many gamete (haplotype) combinations are possible for n freely recombining diallelic loci?
2^n haplotypes; the number grows exponentially even for small n.
Why does recombination speed up adaptation?
It allows beneficial mutations at different loci to be combined into the same genome instead of competing on separate haplotypes.
What is Hill–Robertson interference?
Reduced efficiency of selection when linked loci interfere, causing beneficial mutations to be lost due to linkage with deleterious alleles.
How does recombination counteract Muller’s ratchet?
By recreating mutation-free or low-mutation haplotypes, preventing irreversible accumulation of deleterious mutations in finite populations.
What is linkage disequilibrium (LD)?
LD is the non-random association of alleles at different loci (statistical association), and it is not the same as physical linkage—it can even occur between loci on different chromosomes.
(Merril - LD)
For two biallelic loci with haplotypes AB, Ab, aB, ab, how do you compute allele frequencies from haplotype frequencies?
Add the relevant haplotypes, e.g.
What is the coefficient of LD D (formula), and what does D=0 mean?
What does the sign of D tell you (coupling vs repulsion)?
D>0 means excess of coupling haplotypes (AB and ab) relative to expectation; D<0 means excess of repulsion haplotypes (Ab and aB). Sign depends on labeling order, so magnitude matters most.
Why can’t you compare raw D values across loci or populations?
Because the maximum possible |D| depends on allele frequencies, so the same D can mean “weak” or “strong” LD depending on pA,pB.
What is D’ and when is it especially informative?
D′ scales D by its maximum possible value so it ranges from 0 to 1; D′=1 implies maximum LD and typically a missing haplotype. Immediately after a new mutation, D′=1 with nearby loci on the same chromosome.
How does recombination rate c affect LD over time in a large randomly mating population?
LD decays geometrically:
Higher c → faster LD decay; c ranges from 0 (no recombination) to 0.5 (free recombination).
What is the key “timescale intuition” for LD decay with recombination?
Roughly 1/c generations are needed for D to drop to about 37% of its initial value (exponential-decay intuition). Small c means LD persists a long time.
How can population structure or admixture create LD even if each subpopulation has none?
Mixing populations with different allele frequencies generates LD (“two-locus Wahlund effect”); the LD magnitude depends on differences in allele frequencies between populations.
How does a selective sweep affect LD around the selected site?
A sweep increases LD and reduces local variation near the beneficial allele; recombination during the sweep can “rescue” variation further away, so LD and diversity change with recombination distance from the selected site.
What defines a quantitative trait?
A quantitative trait shows continuous variation, is influenced by many loci (polygenic or oligogenic), and is affected by the environment, leading to approximately normal phenotypic distributions.
(Merril - Quantitative Genetics)
How is an individual’s phenotypic value decomposed in quantitative genetics?
P=G+E
where G is the genotypic value and E is the environmental deviation (mean E=0 in the population).
How is the genotypic value G further decomposed, and which part matters most for evolution?
G=A (additive genetic component) + D (dominance deviation)+I (interaction deviation)
Additive effects A (breeding value) are most important because they are transmitted from parents to offspring and drive evolutionary change.
How is phenotypic variance partitioned, and which component is “visible to selection”?
Only the additive genetic variance V_A contributes directly to evolutionary response to selection.
What is broad-sense vs narrow-sense heritability, and why does the distinction matter?
In sexually reproducing populations, narrow-sense heritability h^2 determines evolutionary response. (Recall: V is the phenotypic variance repsectively)
What does heritability actually measure (and what does it not measure)?
Heritability measures the fraction of phenotypic variance due to additive genetic variation in a population; it does not measure how “genetic” an individual trait value is.
How can heritability be estimated empirically?
From the slope of a parent–offspring regression (offspring phenotype regressed on mid-parent phenotype); the slope equals h^2.
Why must parent and offspring environments be uncorrelated when estimating heritability?
Shared environments inflate resemblance between relatives and lead to overestimation of genetic effects.
What is the selection differential S?
What is the Breeder’s equation and how is it interpreted?
R = h^2*S
The response to selection R (change in mean phenotype across generations) is proportional to heritability and selection strength.
Why do fitness-related traits often have low heritability?
Because selection depletes additive genetic variance in fitness, and because fitness is strongly influenced by environmental variation.
Why can heritability differ between populations or environments?
Because heritability depends on allele frequencies and environmental variance; it is population- and context-specific, not a fixed species property.
What is the (relaxed) Biological Species Concept used in this course?
Species are groups of interbreeding populations that are normally reproductively isolated from other such groups; gene flow persists within species but is reduced or absent between species.
(Merril - Genetics of Speciation)
Why is defining species inherently difficult?
Because speciation is a process, not an instantaneous event; closely related species may look identical (cryptic species), while distant species may converge morphologically.
What are the two main categories of reproductive barriers?
Prezygotic barriers: act before fertilisation (geographic, ecological, behavioural, mechanical).
Postzygotic barriers: act after fertilisation (hybrid inviability, sterility, reduced hybrid fitness).
Why can postzygotic isolation evolve even though sterility or inviability seems maladaptive?
Because incompatible allele combinations can arise independently in isolated populations and only cause problems when combined in hybrids (Dobzhansky–Muller incompatibilities).
What is allopatric speciation and why is it considered “easy”?
Speciation via geographic isolation; gene flow is absent, so drift and/or selection can accumulate differences without being broken down by recombination.
Why is speciation with gene flow considered difficult?
Because recombination breaks down genetic associations between alleles under divergent selection and alleles causing reproductive isolation, preventing stable isolation from evolving.
What is Felsenstein’s key insight about speciation with gene flow?
That recombination opposes the build-up of linkage disequilibrium between ecological traits and mating traits, creating a fundamental constraint on speciation.
What is the difference between the two-allele and one-allele models of speciation?
Two-allele model: different alleles are fixed in different populations → recombination breaks associations.
One-allele model: the same allele spreads in both populations and causes assortative mating → recombination cannot break the association.
Why is speciation easier under a one-allele mechanism?
Because recombination cannot disrupt the association between the mating trait and mating preference when the same allele causes isolation in both populations.
What is a “magic trait”?
A trait that is under divergent ecological selection and simultaneously affects assortative mating, automatically coupling adaptation and reproductive isolation.
Why does physical linkage (or inversions) facilitate speciation with gene flow?
Tight linkage reduces recombination, preserving associations between alleles involved in ecological divergence and mating isolation.
What is the Dobzhansky–Muller model and what does it explain?
It explains how hybrid inviability or sterility can evolve without being directly selected for: incompatible alleles evolve separately in isolated populations and only cause problems when combined in hybrids.
Why did molecular evolution historically focus on protein sequences before DNA sequences?
Because methods to determine amino-acid sequences of proteins were developed in the 1950s, while DNA sequencing only became available in the 1970s. Early molecular evolution theory was therefore built mainly on protein data.
(Parsch - Protein Evolution)
Why are protein sequences often easier to align and compare across distant species than DNA sequences?
Proteins are generally more conserved (stronger functional constraint), so alignment is easier across deep evolutionary time. DNA is more complex because it includes coding vs non-coding regions and synonymous vs non-synonymous sites in coding regions.
What is “D” (observed amino-acid divergence) and how is it calculated?
D is the observed proportion of amino-acid positions that differ between two aligned protein sequences. Example: if 2 out of 10 comparable amino-acid sites differ, then D=2/10 = 0.2.
What is “K” (corrected amino-acid divergence) and why do we compute it from D?
K is the expected (corrected) number/proportion of amino-acid substitutions per site between two sequences, accounting for the possibility of multiple substitutions at the same site (“multiple hits”). A simple correction is
K=−ln(1−D)
where D is the observed proportion of differing sites.
Why does the “multiple-hit problem” make D underestimate true evolutionary change?
Because a site may have changed more than once over time; the final amino acid may differ by only one step from the ancestral state (or even match again), so counting observed differences misses hidden substitutions. This becomes more important as sequences become more divergent.
What is the molecular clock hypothesis (in protein evolution)?
The molecular clock hypothesis states that for many proteins, amino-acid substitutions accumulate at an approximately constant rate over long time periods, so protein sequence divergence can be used as a rough measure of time since divergence.
Why do different proteins show different “clock rates” (different substitution rates)?
Because proteins differ in functional constraints and in how many amino-acid changes are effectively neutral; proteins with fewer constraints can accumulate substitutions faster than proteins where changes are strongly deleterious.
What is the relative rate test and what result supports a molecular clock?
The relative rate test compares two closely related species (A and B) to an outgroup (C). If a molecular clock holds, species A and B should be equally divergent from C (i.e., the distance A–C should be about the same as B–C). A consistent difference suggests rate variation.
What is a DNA “transition” and what is a DNA “transversion”?
A transition is a nucleotide change within the same structural class: purine↔purine (A↔G) or pyrimidine↔pyrimidine (C↔T). A transversion is a change between classes: purine↔pyrimidine (A/G ↔ C/T). Transitions are typically more common; the transition/transversion ratio κ is often ≈ 2.
(Parsch - DNA Evolution)
What does d mean in DNA sequence comparison, and how is it computed?
d is the observed proportion of nucleotide sites that differ between two aligned DNA sequences (e.g., 3 differences out of 10 sites → d=0.3).
What does k mean (corrected DNA divergence), and why isn’t d enough?
k is the corrected number/proportion of substitutions per site, accounting for “multiple hits” and the fact that with only 4 nucleotides, sites can change and later match again by chance. A standard correction is
where d is the observed fraction of differing sites.
Why does the observed difference d “saturate” at 0.75 for DNA sequences?
With four nucleotides, two random sequences are expected to match at ~25% of sites by chance, so the maximum expected fraction of differences is ~75% (d=0.75); beyond that, extra substitutions are hidden.
What are synonymous vs nonsynonymous substitutions in coding DNA?
A synonymous (silent) substitution changes a codon but not the amino acid. A nonsynonymous (replacement) substitution changes the amino acid. Typically, many 1st/2nd codon positions are nondegenerate (more likely nonsynonymous), while 3rd positions are often 2-fold or 4-fold degenerate (more likely synonymous).
What does “4-fold degenerate site” mean, and why is it important for molecular evolution?
A 4-fold degenerate site is a codon position (often the 3rd base) where any nucleotide change does not change the amino acid. These sites tend to evolve fastest, often similarly to introns/pseudogenes, suggesting weak constraint. Nondegenerate sites evolve slowest due to strong constraint.
What is a pseudogene, and why is it useful as a baseline?
A pseudogene is a duplicated gene that has lost function. Because it is (approximately) under no functional constraint, it accumulates changes mostly according to the organism’s mutation rate, making it a useful “neutral” comparison for constraint in functional regions.
What is codon bias, and what are two competing explanations for it?
Codon bias is the unequal usage of synonymous codons for the same amino acid. Two explanations:
Selection for faster/more efficient translation (preferred codons match abundant tRNAs; strongest in highly expressed genes).
Mutation bias affecting nucleotide composition at degenerate sites. Evidence favoring selection includes higher GC at 3rd positions in highly biased genes compared with neighboring introns.
Why is selection on synonymous codon usage often hard to detect, and where is it easiest to detect?
Selection on codon usage is typically very weak (often around Ne*s ≈ 1 , where Ne is effective population size and s is the selection coefficient), so drift can obscure it. Evidence is clearest in species with large Ne (bacteria, yeast, Drosophila), and it’s unclear in mammals.
Why can “synonymous” mutations still matter biologically?
Synonymous mutations can affect splicing (e.g., by creating or disrupting splice sites or exon splicing enhancers) and can influence RNA structure, so “silent” does not always mean “neutral.”
What is molecular phylogenetics?
Molecular phylogenetics infers evolutionary relationships among organisms or genes using molecular sequence data (DNA or proteins) and statistical or algorithmic tree-building methods.
(Parsch - Molecular Phylogenetics)
In a phylogenetic tree, which aspects carry evolutionary meaning?
The topology (branching order) always carries meaning. Branch lengths may represent either evolutionary time or amount of change, depending on the method. The visual layout is arbitrary.
What is an OTU (Operational Taxonomic Unit)?
An OTU is a terminal node in a phylogenetic tree, representing a species, population, or gene sequence included in the analysis.
What is the difference between monophyletic and paraphyletic groups?
Monophyletic (clade): ancestor + all descendants.
Paraphyletic: ancestor + some, but not all, descendants (not allowed in cladistics).
Why can gene trees differ from species trees?
Because genes have their own histories; processes like gene duplication, gene loss, and incomplete lineage sorting can cause gene trees to differ from the species tree.
What are orthologs and paralogs, and why does this matter for phylogenetics?
Orthologs arise by speciation and reflect species relationships.
Paralogs arise by gene duplication and can mislead species phylogenies if mistaken for orthologs.
What is the core idea of distance-based phylogenetic methods?
Distance-based methods convert sequence alignments into a matrix of pairwise evolutionary distances and build trees that best reflect those distances.
What are the assumptions, pros, and cons of UPGMA?
Assumption: constant evolutionary rate (molecular clock).
Pros: very fast, simple, produces ultrametric trees.
Cons: incorrect if rates differ among lineages; often unrealistic biologically.
What are the pros and cons of Neighbor-Joining (NJ)?
Pros: fast, does not assume a molecular clock, often produces good topologies.
Cons: uses only pairwise distances (loses site-specific information), no explicit evolutionary model beyond distance correction.
What is the principle behind maximum parsimony, and what are its weaknesses?
Principle: choose the tree requiring the fewest evolutionary changes.
Weaknesses: sensitive to homoplasy and long-branch attraction; lacks an explicit evolutionary model.
What are the pros and cons of maximum likelihood (ML) methods?
Pros: statistically rigorous, uses explicit evolutionary models, handles multiple substitutions and rate variation well.
Cons: computationally expensive, results depend on model choice.
What is bootstrapping in phylogenetics, and how should it be interpreted?
Bootstrapping resamples alignment sites and rebuilds trees repeatedly; the bootstrap value of a node is the percentage of replicates supporting that node. It reflects data support, not the probability that the tree is true.
What kinds of evolutionary questions can molecular phylogenetics answer using domesticated dogs as an example?
It can address (1) dog relationships to wild canids, (2) when domestication started, (3) whether domestication happened once or multiple times, and (4) relationships among modern dog breeds.
(Parsch - The Domesticated Dog)
What fossil evidence in the lecture supports an early history of domesticated dogs?
Dog-like remains are reported from (a) central Europe (Germany) ~14,000 years ago and (b) Israel ~12,000 years ago, including dog-like skeletons buried with humans.
What is the key phylogenetic conclusion about dogs and their closest wild relatives?
Large-scale sequence studies (intron/exon across many species) indicate domesticated dogs are most closely related to wolves, consistent with morphology.
According to the lecture, what do the genetic results suggest about how many domestication events occurred and when domestication began?
The lecture reports multiple major dog clades, which suggests multiple domestication events, and places the start of domestication around ~15,000 years ago, with an upper possibility of up to ~40,000 years.
What does the lecture conclude about the origin of American dogs (independent domestication vs introduction)?
It concludes ancient American dogs were derived from previously domesticated Eurasian dogs and were not domesticated independently in the Americas.
Why does the lecture use microsatellites (instead of slowly evolving markers) to study relationships among dog breeds, and what was the main result?
Microsatellites evolve faster and are highly variable within species; in the lecture, typing 96 microsatellite loci across 85 breeds revealed four major dog clades, with Asian breeds in the oldest clades, supporting an Asian origin signal.
What does the Neutral Theory of Molecular Evolution state?
The neutral theory states that most molecular polymorphism within species and most molecular divergence between species are caused by neutral mutations evolving under mutation–drift equilibrium, rather than by positive selection.
(Parsch - Testing Neutral Theory)
What does the neutral theory not claim about evolution?
It does not claim that all mutations are neutral; deleterious mutations are often removed by purifying selection, and adaptive mutations may occur but are rare relative to neutral ones.
What are the two key quantitative predictions of the neutral theory?
Molecular divergence between species:
K=2μt, where K is divergence, μ the neutral mutation rate, and t time since divergence.
Genetic diversity within species:
θ=4Ne*μ where Ne is effective population size.
What does Tajima’s D test, and what data does it require?
Tajima’s D tests whether the frequency spectrum of segregating sites matches neutral expectations, using only within-species polymorphism data. It compares two estimates of genetic diversity: one based on the number of segregating sites and one based on pairwise differences.
How is Tajima’s D interpreted biologically?
What does the HKA test (Hudson–Kreitman–Aguadé) test, and what data does it require?
The HKA test compares the ratio of polymorphism to divergence across two or more loci, using both within-species polymorphism and between-species divergence data, to test whether loci evolve neutrally.
What neutral expectation underlies the HKA test?
Under neutrality, polymorphism (θ) and divergence (K) are proportional because both depend on the mutation rate; therefore, the ratio θ/K should be similar across loci evolving neutrally.
Why is it often difficult to interpret why the HKA test rejects neutrality?
Because a deviation can result from different causes (e.g. recent positive selection at one locus, background selection, or demographic effects), and the test only compares ratios, not mechanisms.
What does the McDonald–Kreitman (MK) test compare, and what assumption does it make?
The MK test compares polymorphism vs divergence at synonymous (assumed neutral) and nonsynonymous (amino-acid changing) sites in a coding gene, assuming synonymous substitutions are neutral.
How are MK test outcomes interpreted biologically?
Excess nonsynonymous divergence → evidence for positive selection
Excess nonsynonymous polymorphism → balancing selection or weak purifying selection (slightly deleterious mutations persist but do not fix)
What is a selective sweep (genetic hitchhiking), and how does it affect polymorphism and divergence?
A selective sweep occurs when a beneficial mutation fixes and drags linked neutral variants with it, reducing polymorphism in regions of low recombination while leaving divergence largely unchanged.
What is background selection, and how does it differ from a selective sweep?
Background selection is the removal of neutral variants linked to deleterious mutations under purifying selection, effectively reducing the local effective population size. Unlike selective sweeps, background selection is compatible with the neutral theory and does not require positive selection.
To which primate group do humans belong, and when did the human lineage split from our closest relatives?
Humans belong to the Great Apes. Molecular evidence indicates that the hominin lineage split from chimpanzees and bonobos about 5–6 million years ago.
(Parsch - Human Evolution)
What are hominins, and what are the oldest well-accepted hominin fossils?
Hominins are species more closely related to humans than to chimpanzees. The oldest well-accepted hominin fossils are about 4.4 million years old (Australopithecus anamensis and A. afarensis) from East Africa
What are the three major adaptive trends in human evolution?
Bipedality (upright walking)
Brain enlargement (≈400 cm³ → ≈1400 cm³)
Complex social and cultural behavior, including language These traits evolved with major anatomical and energetic costs.
When and where did the genus Homo first appear, and which species first left Africa?
The genus Homo appeared about 2.5 million years ago (Homo habilis). Homo erectus was the first hominin to leave Africa, colonizing Asia and Europe around 1.8 million years ago.
What are the three main models for the origin of modern humans?
Candelabra model: independent evolution without gene flow (little support).
Multiregional model: continuous gene flow among regions.
Out-of-Africa (replacement) model: modern humans evolved in Africa and spread globally.
Which model for modern human origins is best supported by genetic evidence, and why?
The Out-of-Africa model is best supported. Genetic data show African populations are most diverse, all lineages trace back to Africa, and non-African populations show bottleneck signatures.
What do ancient DNA studies reveal about interbreeding with Neanderthals and Denisovans?
Modern Europeans and Asians carry ~1–3% Neanderthal DNA, and Oceanic populations carry ~4–6% Denisovan DNA, indicating limited interbreeding after humans left Africa.
What is “mitochondrial Eve,” and what does she represent?
“Mitochondrial Eve” is the most recent common ancestor of all human mitochondrial DNA, who lived in Africa about 200,000 years ago. She was not the only woman alive and is not the ancestor of all nuclear genes.
How is the time to the most recent common ancestor (TMRCA) estimated from DNA data?
How can DNA polymorphism be used to estimate the effective population size of humans?
Under the neutral theory, nucleotide diversity π satisfies
π≈4Neμ
Using human nuclear data (π≈0.001), this gives an effective population size of about 12,500, far smaller than the census size.
Why is human effective population size so small compared to census population size?
Because humans historically lived in small populations, and effective population size is dominated by past bottlenecks and small population sizes (harmonic mean effect).
What does Wright’s F_ST tell us about genetic differentiation among human populations?
Human F_ST values are low (≈0.05–0.15), meaning 85–95% of genetic variation exists within populations, not between them. This supports high gene flow and/or recent common ancestry.
What genomic pattern is expected after recent positive selection in humans?
Recent positive selection produces long haplotypes at high frequency around the selected allele, because the beneficial mutation rises to high frequency faster than recombination can break down linkage
(Parsch - Selection in Humans)
Why can strong haplotype structure be used as evidence for recent positive selection?
Under neutrality, recombination breaks down haplotypes over time. Long, frequent haplotypes therefore indicate that an allele rose to high frequency rapidly, consistent with recent positive selection.
What does the lactase persistence (LP) example illustrate about human adaptation?
Lactase persistence shows positive selection driven by culture (milk consumption), where the ability to digest lactose as an adult rose to high frequency in populations practicing dairy farming.
Why is lactase persistence an example of convergent evolution in humans?
Different mutations near the lactase gene independently rose to high frequency in European and African pastoralist populations, producing the same phenotype (adult milk digestion) via different genetic changes.
What does the AMY1 (amylase) gene example demonstrate about copy-number variation (CNV)?
It shows that gene copy number can be under positive selection: populations with high-starch diets have higher AMY1 copy number, increasing salivary amylase production and starch digestion efficiency.
What evidence supports positive selection on a chromosome 17 inversion (H2) in Europeans?
The H2 inversion is unusually frequent in Europeans (~20%), and women carrying H2 have slightly higher reproductive success, suggesting positive selection despite the underlying mechanism being unknown.
How does the prion-disease (PRNP) example illustrate balancing selection in humans?
Heterozygotes at codon 129 are resistant to prion disease, while homozygotes are susceptible. In populations exposed to prion transmission (e.g. cannibalism), heterozygotes are strongly over-represented, indicating balancing selection.
What does the FGFR2 (Apert syndrome) example illustrate about selection acting at different biological levels?
It shows evolutionary conflict: a mutation is positively selected in male germ-line cells (spermatogonia divide faster) even though it reduces organismal fitness by causing a severe disease in offspring.
What does the term “genomics” mean today, and how has its meaning changed over time?
Genomics originally referred to determining the complete DNA sequence of an organism. Today, it is a broad field encompassing several subdisciplines, including functional genomics (gene function and expression), comparative genomics (genome comparison across species), and evolutionary genomics (how genomes change over time).
(Parsch - Genomics 1)
What is the “C-value,” and what is the C-value paradox?
The C-value is the amount of DNA in a haploid genome. The C-value paradox refers to the observation that genome size varies widely among organisms and does not correlate well with organismal complexity (e.g. some simple organisms have much larger genomes than humans).
What are the two main strategies for sequencing whole genomes, and how do they differ conceptually?
Clone-by-clone sequencing: the genome is first mapped into large overlapping fragments (e.g. BACs), which are then individually sequenced.
Shotgun sequencing: the entire genome is randomly fragmented into small pieces, sequenced, and computationally assembled. Clone-by-clone simplifies assembly but requires prior mapping; shotgun sequencing is faster but computationally demanding.
What are the three main approaches to identifying genes in a newly sequenced genome, and what is the key limitation they share?
De novo (ab initio) prediction using sequence features (ORFs, splice sites).
Comparative prediction using homology to known genes in other species.
Experimental identification by sequencing expressed mRNA (cDNA). All approaches can miss genes (e.g. short, fast-evolving, or lowly expressed genes).
What makes up most of the human genome if only ~2% codes for proteins?
Most of the human genome consists of non-coding DNA, including repetitive DNA (often in centromeres and telomeres), transposable elements (which make up about half of the genome), and pseudogenes (non-functional copies of genes).
What is functional genomics, and what question does transcriptomics address?
Functional genomics studies how genes function and interact at the genome-wide level. Transcriptomics specifically asks when, where, and how strongly genes are expressed, typically by comparing gene expression between different conditions (e.g. healthy vs diseased).
(Parsch - Genomics 2)
What is the conceptual difference between microarrays and RNA-seq for measuring gene expression?
Microarrays measure gene expression by hybridization to predefined DNA probes, giving relative expression between samples, whereas RNA-seq sequences cDNA fragments directly, allowing quantitative expression measurement without prior knowledge of genes.
What is reverse genetics, and how does it differ from classical (forward) genetics?
Reverse genetics starts with a known gene and asks what phenotype results when that gene is disrupted. Forward genetics starts with a phenotype and seeks the underlying gene.
What is a key general result from large-scale gene knockout studies?
Many genes are not essential under standard conditions; for example, most yeast and many mouse genes can be knocked out without causing lethality, leading to the distinction between essential and dispensable genes.
Why is comparative genomics powerful for understanding gene function and evolution?
Genes conserved across distant species are likely to perform essential functions, while genes that are not conserved or are pseudogenized can explain phenotypic differences between species (e.g. pathogen lifestyles or species-specific traits).
What is evolutionary developmental biology (Evo-Devo), and why is development important for evolution?
Evo-Devo studies how changes in developmental processes lead to evolutionary differences in morphology. Development is important because large differences in adult form often arise from differences in how organisms develop, not from entirely new genes.
(Parsch - Evolutionary Developmental Biology)
What are Hox genes, and what do homeotic mutations show about development?
Hox genes are developmental genes that control the identity of body segments along the anterior–posterior axis. Homeotic mutations (e.g. legs instead of antennae, extra wings) show that single genes can control major aspects of body plan organization.
What is the homeobox, and what does it tell us about Hox gene function?
The homeobox is a conserved 180-bp DNA sequence found in Hox genes that encodes a DNA-binding homeodomain. This shows that Hox genes function as transcription factors, regulating the expression of many other genes during development.
What does the conserved order and expression of Hox genes across animals imply?
The correspondence between gene order in Hox clusters and their anterior–posterior expression domains (in flies and vertebrates) implies a deeply conserved developmental program shared by all animals.
What does the eyeless / Pax6 example demonstrate about the evolution of complex traits?
The eyeless gene in flies and its vertebrate homolog Pax6 control eye development in very different organisms. This shows that the same developmental gene can underlie very different morphological structures, likely inherited from a simple ancestral light-sensing system.
Why are cis-regulatory changes often proposed as a major driver of morphological evolution?
Cis-regulatory changes affect when and where a single gene is expressed, usually without disrupting other functions. In contrast, changes in protein sequence or trans-acting factors often have pleiotropic effects. Thus, cis-regulatory evolution can produce new traits with fewer harmful side effects.
What are the main mechanisms of sex determination, and what is the key difference between them?
Sex can be determined by (1) environment (genetically identical sexes; environment decides sex, e.g. temperature in some reptiles), (2) haplodiploidy (females diploid from fertilized eggs; males haploid from unfertilized eggs, e.g. honeybees), (3) male heterogamety (XY) where males are XY and females XX (e.g. humans, Drosophila), or (4) female heterogamety (ZW) where females are ZW and males ZZ (e.g. birds, butterflies).
(Parsch - Sex Chromosome Evolution)
How do sex chromosomes (XY or ZW) originate from autosomes, and why does the Y (or W) degenerate?
Sex chromosomes start as an autosome pair where one chromosome gains a sex-determining locus. Sexually antagonistic mutations (beneficial in one sex but harmful in the other) then accumulate near that locus, favoring suppressed recombination between the pair. Over time, lack of recombination causes the sex-specific chromosome (Y or W) to degenerate.
In Drosophila, what does “Y degeneration” look like in terms of gene content?
The Drosophila Y chromosome is highly degenerated and has ~12 protein-coding genes (required for male fertility), while the X has >2,000 genes (many essential). The X contains ~16% of genes in the genome, whereas the Y has <0.1%.
What is “demasculinization” vs “feminization” of the X chromosome in Drosophila?
Male-biased genes (higher expression in males) are under-represented on the X (demasculinization), while female-biased genes are over-represented on the X (feminization). An exception noted is that in the brain there can be an excess of male-biased genes on the X.
What is the “Fast-X effect,” and why is the X expected to show faster adaptive evolution?
The Fast-X effect is an increased rate of adaptive evolution on the X because recessive beneficial mutations are exposed to selection in hemizygous males (males have only one X), making it easier for such beneficial recessive alleles to fix.
What is the “Large-X effect,” and what does it imply for speciation?
The Large-X effect is the observation that the X chromosome is enriched for loci causing hybrid incompatibilities (e.g. hybrid male sterility), implying the X plays a disproportionately large role in the evolution of postzygotic reproductive isolation (at least in Drosophila).
What is X-chromosome dosage compensation in Drosophila males, and what is its purpose?
In male somatic tissues, expression of the single X is up-regulated ~2-fold to match expression of autosomes and the two X copies in females. This is mediated by a binding RNA/protein complex called the Dosage Compensation Complex (DCC) across many X-linked sites.
What is meiotic sex chromosome inactivation (MSCI) in Drosophila, and where is it observed?
In the male germline (testes), X-linked expression is suppressed (analogous to MSCI in mammals). This suppression is tissue-specific (testes but not other tissues) and can be shown using testis-specific reporter genes comparing autosomal vs X-linked copies.
Zuletzt geändertvor 6 Tagen