what is functional annotation good for?
description of gene function
restricted vocabulary
hierachical organized
directect acylic graph
3 sub-ontologies
evidence codes
What is GO goof for?
we can identify the taxonomy
what is KO (KEGG Orholpgy)?
database of molecular function represented in terms of functional othologs
what contains the genome annotation in KEGG?
contains two uniques aspects, KO assignments and KEGG mapping
KO assignment: Molecular function are stored in the KO (KEGG orthology) databse containing orhtologs of experimentally characterized genes/proteins
Genome annotation in KEGG is to assign identifiers (or K numbers) to individual genes in the genome, rather than giving text description of functions
KEGG mapping: cellular and organism-level function are stored in the Pathway,Brite and Module databse in terms of the molecular network, which are all created as networks of K number nodes
The KO assignment procedure converts a gene set in the genome to a K number set and leads to automatic recosntrcution of KEGG pathways and other networks by the process call KEGG mapping, enabling interpretation of high level functions
what is KEGG?
a curated resource devoted to proteins and their function
comprising many different databases, of which the KEGG Orthology database is the most basal annotation layer
the KO assigns orthologs to experimentally characterized genes the same KO number, i.e. assigns the as functional equivalents
the pathway database integrates gene into higher-order functional units
a pathway is a network if KO numbers
assigning a KO number to a protein resembles
a functional annotation transfer
an integration into corresponding pathway
what are the options for the functional annotation of genes in silicon?
significant sequence similarity (sequence conservation)
orthology relationships - descendants of the same gene in the last common ancetsral species
conserved genomic position (positional homologs/orthologs)
similar/identical domain architectures
similar/identical 3D strcutures
agreeing (conserved) expression pattern
Interaction partners present / conservation of interaction networks
what is sequence homology?
homology is only about evolutionary relationships
sequence similarity is not involved in the concept of homolgy. It measures the extent to which two sequences agree
Homolgy is - in the first - approximation - a Yes/no state
similarity can be measured in the percent
homology is not about function
how can the similarity and homology be approached?
Sequence space is tremendous
sequence are made up from infinite alphabet
from 1 and 2 follows the sequence are not related are no similar than it is expected by chance. Extent of similarity by chance follows a random distribution
From 1 to 3 does not follow that sequences that are homologus must share a significant sequence similarity. Remember, sequence similarity decays with time, the relatedness of sequences doesnt change
how do homologs emerge?
gene duplication
how can you do a ortholog prediction?
the reciprocal best blast hit approach
what is the goal of phylogenetic profile?
instead of comparing proteins pair-wise, we want to get an overview about as many proteins as possible in the clade of interest
> reduce the impact of chance and noise
which species harbour an ortholog to your gene of interest
how does the ortholog “look like”
what do we struggle with in the evolutionary research?
False positive
contamination
limited specificity
False negatives
data quality
limited senstivity
misleading interpretation of true positives
lineages specific change of function
size of the matrix - how many taxa do we need
Last changed14 days ago