Tuesday, June 19, 2018

MICROBE 2018 recap - Bioinformatics - Microbial Genomes Atlas (MiGA)

So I don't typically tweet during meetings because it's hard enough to listen and take notes without attempting to be witty and post on twitter accurately at the same time, it's just not one of my gifts. My twitter account in general comes and goes much like this blog with my postings limited to what I find interesting in the field or what I am learning in the field. So - like this blog, my twitter goes dark here and there and lights up when I'm inspired to share. I'm impressed I have as many followers as I do - thank you faithful followers!

BUT, for this talk I found this amusing enough given we were in a bioinformatics software talk that we saw this...
"Out of Memory" is what that says. So I was amused for sure.

So I have my moments of within meeting tweeting.

During this talk we were updated on the Microbial Genomes Atlas (MiGA).



Luis M Rodriguez-R:
Expanding the Catalogued Diversity of Archaea and Bacteria

Luis is a Postdoctoral Researcher in Kostas Konstantinidis Lab at Georgia Tech working on developing new bioinformatics tools for studying microbial communities.

The idea behind MiGA is that we are gaining genome after genome via metagenomic surveys of microbial communities. Some of these datasets contain 16S rRNA genes (which is the traditional method of classification and analysis of microbial players within a system) and some do not. Additionally, 16S rRNA has it's limits on taxonomic resolution sometimes only getting you as far as the genus level and if you want to know more about the predicted organism you have to rely on annotated genomes which may or may not already exist. Enter MiGA. MiGA is a genomic data management and processing tool. It features indexing based on medoid clustering...

medoid clustering - representative objects of a cluster with a data set whose average dissimilarity to all objects in the cluster is minimal. Similar to a mean or centroid with the restriction being that only members of the data set are involved. You use a medoid when you cannot define a centroid or mean (ie. a graph).

For MiGA this clustering is over matrices of average nucleotide identity and average amino acid identity (ANI/AAI) and is guided by heuristic approximations which speed things up.

Just recently they integrated a tool called FastANI (bioRxiv paper) which allowed them to explore the distribution of ANI values in >90,000 microbial genomes (finished and unfinished). What they found was a 'valley' in the ANI range 83-95%. This indicated that "...genetic discontinuity, previously observed in smaller sets of genomes [and] indicative of discrete species, is maintained at larger scale regardless of taxonomic diversity or historical sequencing trends."

The presence of this resource along with RefSeq and RefSoil (Github Project, ISME RefSoil paper 2017, original bioRxiv paper on RefSoil 2016, RefSoil Database on FigShare and Iowa State Digital Repository Publication for RefSoil) allows for the rapid search for close relatives of any complete or draft genome and provides a predictive framework to infer if your genome is known or potentially novel.

No comments:

Post a Comment