Monday, June 18, 2018

MICROBE 2018 recap - Bioinformatics - Gi-Scanner

So half, if not more, of my life is dedicated to using (and breaking) bioinformatic software, so of course, when I heard there was a whole poster talk session on new bioinformatic tools I grabbed my computer staked out a sweet spot where I could charge everything up and waited for the black-box buttons to be pushed...

Dr. Sophie Shaw (right) and I
Before I jump in...a shout out to my lovely fellow bioinformatician who I haven't seen in 4 years (we met at the Workshop for Genomics in 2014) yet prolifically follow on twitter and instagram. We found eat other at Daniel McDonald's poster presentation for the American Gut Project...awww Dr. McDonald bringing together globally floating around bioinformaticians since 2018.

Alright - back to the session: First up -

YoungJae Hur, Seoul National University: 
Gi-Scanner: An algorithm to predict genomic islands by 
comparative genomics

So, the first thing I typically do when someone touts a new piece of software is google/google scholar it. I want to find the software page...github, sourceforge, lab webpage - I like going to the source, looking at documentation and making my own assessments. Alas for Gi-Scanner it is not published or available on a website yet. I was a bit confused in the presentation as to whether we were looking at a piece of software to identify genomic islands OR if we are waiting on a database of genomic islands for V. cholerae identified by their software but that their software may not actually be available for public use. I want to think the former - that the software will be readily available soon and that the database will also be published/available for those interested in particular with V. cholerae.




Jumping back into the session:

For those of you new to the world of 'genomic islands' (GIs) - these are basically clusters of genes that are mobile via horizontal gene transfer. Yes, exceptions exist but this is for general understanding. GIs play roles in the transfer of virulence genes, antibiotic resistance genes or genes that can assist a microbe in adapting to a new environment and/or becoming more competitive. Detecting them is important in understanding the 'mobilome' - hows that for jargon! The mobilome is the genomic complement that is transferable between microbes - conveying 'super powers' beyond species and genera sometimes. By characterizing the genomic islands and understanding the agents of DNA movement, we can devise strategies to interrupt movement perhaps or functionality once inserted into new microbe hosts.

Check out Frost's paper in Nature Reviews Microbiology from 2005, it's a good primer on this.

Currently identified genomic islands for V. cholerae, Chun et al., 2009
http://www.pnas.org/content/106/36/15442.short

Identifying islands by a comparative approach. Che et al., 2014
http://www.mdpi.com/2076-0817/3/1/36/htm
So what was I able to take from the talk:

  • They made a database for V. cholerae to predict genomic islands (see figure above on currently identified islands).
  • V. cholerae has over 200 serogroups - check out the Shapiro Lab's 2017 paper in Microbial Genomics.
  • In their method they aligned whole genomes using a comparative approach - this was a bit ambiguous to me, there was a question about specifically what does comparative approach mean and the answer pertained to creating a pairwise ortholog matrix using UCLUST/USEARCH where the 'top hit' was used as the 'ortholog' gene. This response was again confusing but given the limited time they had to move on - so hopefully this will be described more fully in a publication.
  • I will not post a picture of his results slide, given it's unpublished work - instead I'll post the numbers he mentions:
    • 758 genomic islands
    • 388 GIs are strain specific
    • 69 GIs found over 74 manually curated GIs
    • 685 GIs found over 721 GI on IslandViewer
    • 549 novel GIs found.
  • They used parsimony to help identify GIs
  • For programming everything was Java and Python
Ok...so lingering questions
  1. How are they defining 'novel'?
  2. In conversations with other bioinformaticians we've discussed the dangers of top-hit dependency in some algorithms when defining microbe or gene hits. While more difficult perhaps, wouldn't it be more prudent to characterize the hit as a summary of the top 100 hits - what gene or microbial species/genera is found more often?
  3. Their results in general were confusing to follow based on the numbers I posted above which he presented in the presentation versus what I am reading in the PDF attached to the session which states they found "94 predicted GIs. Out of 74 predicted in the previous study based on manual curation [they recovered] 72 successfully. The 94 GIs were compared to those predicted by IslandViewer..." - But no results of that comparison were given in the PDF or the presentation so I hope this is clarified in their publication in the future.
  4. Is this method useful outside of the world of V. cholerae or Vibrio in general? 
So, while interesting, as I am a fan-girl of V. cholerae; I'm not convinced of the utility of this tool outside the world of V. cholerae especially since we already have so many published tools for GI detection. 
So I'm hoping that the creators of Gi-Scanner will able to do some comparisons with these tools. It sounds like they have but the results of those comparisons are fuzzy at the moment.

And it's a bit concerning to me when I hear so many (n=549 by their results) novel GIs were detected - what indication do we have that these novel islands are indeed novel? Is it a definition issue within the algorithm? And what is the path forward on this - the workflow to validate the predictions in silico and in vivo?

Lots of questions...but hearing about new software is always exciting!.

No comments:

Post a Comment