Buffl

ak23

B
von BlackF3d4

Metaproteomics is an umbrella term for an experimental approaches to study all proteins in microbial communities and microbiomes from environmental sources. An example is the human gut, which is estimated to host about 10¹³–10¹⁴ microbial cells from thousands of different bacterial strains, all of which share large portions of their genome, resulting in a very large search space of potentially millions of protein entries in fasta file. Describe one computational problem you foresee when analyzing a mass spectrometry-based metaproteomics experiment which can substantially impair the number of confidently (<1% FDR) identified peptides or proteins.

  1. Base problem: a large number of proteins (and thus peptides) are shared across the thousands of strains. Minor sequence variations will result in very similar peptides.

  2. Option 1: Scoring candidate peptides may fail because of the large number of peptides which may only exhibit minor sequence variations (e.g. permutations)

  3. Option 2: FDR estimation may fail because we generate decoys by e.g. reversing target proteins. The peptides (given the very large search space) may be too similar to target to be able to differentiate those from another. The target/decoy score distribution may overlap significantly and thus no reasonable FDR estimation can be done.

  4. Option 3: Protein inference will be a problem due to the large number of peptides shared across proteins from different strains. We may only identify very large proteins groups and are unable to pinpoint a particular protein which may be present.

  5. Other options are also possible.


Author

BlackF3d4

Informationen

Zuletzt geändert