Tuesday, July 3, 2018

Microbiome abundance or relative abundance...that is the question?!

Really great talk by Shayamal Peddada about metrics for measuring and comparing abundance/relative abundance within and between microbiome samples.

Worth a watch!

For those TLDW (too long didn't watch)...



(1) You should watch his actual talk is only 36 min - though there are good answers to questions during the Q/A part at the end.

(2) Highlights:

  • Why measuring absolute abundances is challenging - 
    • Think of comparing animal groups in 2 forests - just comparing direct counts between the two forests isn't enough because it doesn't tell you the size of the forest. So we use relative abundance - but Peddada's group is working on this (using OTU absolute abundance data).
  • Features of datasets:
    • Unequal library sizes
    • Relative abundances are non-negative and sum to 1 which means they are inside a simplex (compositional data)
  • Because your data is in a simplex you cannot use standard methods: ANOVA and Kruskal-Wallis, insufficient compositional data, may not be applicable directly.
  • The dangers of the black box:
    • Users not clear on what parameters are being tested nor their 'true' null hypothesis in their 'favorite' method therefore these OTU tables are pushed through, p-values are obtained, without knowing exactly what's being tested on their data.
  • Other methods:
    • Dirichlet-multinomial distribution (relative abundances)
      • Because of the modeling - all taxa have to be negatively correlated, artifact of sum constraint on random variables.
      • Not biologically reasonable
      • Mosiman (Biometrika, 1962) - really a model for independence
    • DESeq2 (abundances BUT not exact)
    • EdgeR (relative abundance)
    • Metagen.Seq2 (abundance)
      • controls for library size in some sense
      • I think he's actually talking about metagenomeSeq (nature paper)
    • ANOVA/Kruskal-Wallis/T-test etc (abundance/relative abundance)
"What null hypothesis are you testing!?"
  • Lots of zeros - Why do we see zeros? Types include that we can try to account for:
    • Structural zeros - taxa is absent
    • Outliers
    • Sampling zeros - caused by sampling depth or library size
  • ANCOM
    • You work with log ratios so log transform you data to go from simplex to euclidean space for each specimen.
    • Not good for small sample sizes
    • You need to know you have at least 2 taxa that will not change between samples/conditions/ecosystems etc
    • Lemma: relative abundance data can be use to infer about abundace (17:32 in talk)
      • He goes through a very straightforward example to understand this.
    • Process:
      • ID types of zeros and deal with them
      • Test for equality of abundance between 2 ecosystems relative to each remaining taxa
      • Apply multiple testing correction
      • Develop a number of null hypotheses (Wi) rejected in previous step
      • Repeat above steps for all taxa
      • Using empirical distribution of (W) declare significance of a taxon
    • Simulation Study from Weiss et al., 2017 using data from Caporaso et al., 2011
      • Shows performance of T-tests on abundance and relative abundance, metagen.seq2, EdgeR, DESeq2 as compared to ANCOM for False Discovery Rate (FDR) and Power.
    • Better control of FDR
    • Can be extended for testing patterns among different ecosystems
    • Can be generalized to covariate adjusted analysis, repeated measurement analysis
    • R Code: contact at sdp47@pitt.edu
    • Python: Available in QIIME2
  • More than 2 ecosystems?
    • You could use a Global test BUT it's not very useful because rejection of the null implies only tells you at least one system is significantly different.
    • We are more interested in directionality - increase or decrease between all pairs of ecosystems.
    • ANCOM steps are modified by applying mdFDR method 
(3) Other Links
Final thought...discussed during the question/answer session - Data size

Yes as data size increases, so will computational time, we all know this. His response:
"Biologists spend years collecting data [samples],
I should get a few days to analyze the data"
Touche...!

No comments:

Post a Comment