Monday, June 18, 2018

MICROBE 2018 recap - Bioinformatics - Comparison of Metaproteomics Tools

So Dr. Pratik Jagtap is from my current stomping grounds...Minnesota. He's a research assistant professor at the University of Minnesota, Minnesota Supercomputing Institute (MSI). His work focuses on tool development for proteomic analysis specifically for Galaxy-P and he has an impressive array of publications in the field. His latest 2018 offering:



Stalking his twitter account it is chock full of great links to studies and research being conducted in the are of proteomics so if that's a field of interest to you I recommend you follow his account.

In this particular presentation he was investigating the latest metaproteomic software offerings and evaluating their performance with an oral microbiome set courtesy of Rudney and colleagues 2015. He wasn't specific as to which Rudney 2015 article but I'm guessing it's this one - which appears to have links to fastq datasets from an oral microcosm study. Or it could be this publication from 2015 which Rudney is on, but not first or last author which talks about an oral microbiome dataset used specifically for Galaxy P, though I cannot be sure because the article is paywalled.

Comparison studies in bioinformatics are always informative because it sets down metrics by which we can start using to interrogate these programs and evaluate which ones are best for our experimental design, which are potentially flawed, what are the pros and cons for each such that we can justify their use or exclusion in our studies.



For those just joining us in the wild west of 'omics' analysis - metaproteomics seeks to understand the functional role of expressed (or unexpressed I would hedge) - basically ALL proteins within a community level system. So we're not talking about just 'one' organisms proteins, we are talking about all organisms within your sample/system - not only in terms of identification, classification and characterization but also of interactions with/between other proteins in the system. That's a HUGE umbrella to cast right? Here are some publications to help you frame up considerations in metaproteomic study design and analysis:

First some primers:

And now some current considerations, challenges and perspectives:

The following programs were evaluated: (i) MEGAN6 (ii) EggNOG Mapper (and associated database) (iii) Unipept (and associated Tutorial) (iv) Metaproteome Analyzer and (v) MetaGomics - all of which have been available/published between 2016 and 2018.

So what did they find?

  • MEGAN was generally seen as least similar in output as compared to the other tools even when using the same database and same ontology.
  • There was a modest correlation in fold changes with MetaGOmics and Unipept but this wasn't seen as very good considering the exact same input files were used in both analyses.
  • All methods return distinct lists of GO terms
  • The differences logged could be a result of the algorithm in the specific program or the database that was used with the program.
  • We really need more benchmark datasets to compare these tools.
  • We need to use ontologies that are similar across microbiomes.
  • When asked..."So, which is the best to use?" He responded as far as databases go EggNOG Mapper seemed to perform the best as genome annotation is used then the rest of the database is based on sequence homology which is a good approach but again more benchmark datasets need to be available to really evaluate which method is the 'best' to use.
Welcome to the wild wild omics-west right? So many tools, so little correspondence and lack of 'best practices' currently available and vetted for these types of large scale, highly complex analyses. 

Many investigators want a catch all solution they can use as their go-to but often times, reality is, it will depend...seriously, words that will be engraved on tombstones! 


It depends on: 
  • Your questions 
  • Your hypothesis
  • Your study design
  • Your community
  • The success in the sequencing of your community (any type of sequencing you do, with what technology...)
  • Your data quality control measures
  • The underlying database you select
  • The list can go on...
...these are all issues that cannot be resolved by a bioinformatician unfamiliar with your system. Many of these tools are built with specific systems or use-cases in mind and it's up to us, the biological experts in our systems to evaluate the applicability of these tools in our system to answer our questions.

Interesting talk and check out GalaxyP and publications if you want to know more about their platform for metaproteomics analysis.



No comments:

Post a Comment