The location of an old Uranium mill near the town of Rifle, Colorado is a scenic spot that doubles as our sampling site. Here, we study the microbes that inhabit an aquifer adjacent to the Colorado river. What started out as an effort (over a decade ago) to understand the microbial and chemical factors that control uranium mobility with a focus on bioremediation has turned this remote aquifer into a model system to study microbial ecology.
A panoramic view of the Rifle sampling site adjacent to the Colorado River. (Credit: Lawrence Berkeley National Laboratory, Roy Kaltschmidt)
Our initial idea of studying the Rifle aquifer involved a plan to reconstruct genomes for a vast proportion of the microbial community to enable a holistic understanding of what microbes “are up to”. One of the challenges in accessing an ecosystem and characterizing the microbial community as a whole is that “rare” microbes are not present in high enough of an abundance to recover their genomes. In most cases, a single sample is nothing more than a partial snapshot. To overcome this adversity, we conducted experiments over the course of seven years (2007-2014) and collected thirty-three different samples across fifteen different geochemical conditions. Our hypothesis was that this sampling diversity would manifest in the form of recoverable microbial diversity.
Sampling and monitoring the aquifer with our array of wells and sensors. (Credit: Lawrence Berkeley National Laboratory, Roy Kaltschmidt)
We sequenced these samples to an unprecedented depth generating a total of 4.6 billion reads and conducted all our bioinformatics analyses in-house. Our initial estimates suggested that we could recover up to 200 genomes from our assemblies. For a complex ecosystem like sediments (in contrast to an environment like the human gut or the open ocean), we termed this a “big win”. Little did we know that this would be a significant underestimate.
In total, we ended up recovering 2,540 near-complete genomes (from a total of >17,000 genome bins). We manually curated every single genome using a combination of tetranucleotide frequency, genome-specific profiles of GC content and coverage, and visual inspection of taxonomic profiles. This all-consuming process took the two of us (Jill and me) a total of four weeks (At the rate of ~100 genomes a day).
To address the problem of the same genome appearing in multiple samples, we dereplicated our genomes using an average nucleotide identity of 98%. This reduced the number of genomes for downstream analyses from 2,540 to 1,297. Now, you may be wondering if these were actually the exact same genomes that showed up in multiple samples. Well, it actually turns out that there are significant observable strain variations in the genomes that can be tracked across space and time (described here in another paper of ours: Anantharaman et al., 2016). All we really did here was collapse this strain-resolution to focus on analyses of microbial metabolism instead.
Now for the big question – what did we learn? For one, the terrestrial subsurface hosts an extraordinarily complex and diverse microbial world. We recovered near-complete genomes from about 80% of all bacterial phyla at this single location. Included amongst these are 47 new phylum-level groups that were previously unknown. Tracking the abundance of these organisms across all of the thirty-three samples revealed the dynamism of the microbial communities, which we hypothesize to be a function of metabolic versatility. This again reinforces the importance of conducting multiple experiments as a single time-point “snapshot” may not be informative.
Further, we studied the microbial metabolism associated with carbon, nitrogen, sulfur and hydrogen metabolism by searching for the genetic machinery associated with specific chemical transformations in the microbial genomes. This led us to an important conclusion: the vast majority of microbes in the subsurface cannot perform sequential chemical/redox transformations. Rather, they likely depend on “metabolic handoffs” (transfer of a chemical species to another microbe analogous to the idea that one microbe’s waste is another’s food). This is particularly important from the perspective of developing accurate predictive planetary-scale models for climate dynamics and biogeochemical cycles. Considering microbes are primary players in modulating atmospheric green-house gas concentrations, it is imperative that predictive models adopt an accurate characterization of microbial metabolism. By providing accurate analysis of microbial abundance and metabolism at the genome-level for an unprecedented fraction of the subsurface community (up to 36%), we provide the framework for the development of future trait-based models that can capture microbial function and associated versatility.
Through the comprehensive study of a single aquifer (that includes many studies prior to this), we have discovered as many major bacterial phylum-level lineages as were previously known through traditional methods! Already, we observe that many of the organisms detected in our work are turning up in other systems. These genomes are packed to the brim with novel proteins for which we foresee numerous potential applications in biotechnology.
In support of open science and other microbiologists who may have different perspectives and ideas on the utility of our data (including the treasure-trove of novel genomes and proteins), we have made all of our data publicly available (raw reads, scaffolds, genes, proteins, and taxonomy) through NCBI Genbank and ggKbase.
Banfield lab group. Their support was crucial to the success of this project.
Original article: K. Anantharaman, C. T. Brown, L. A. Hug, I.Sharon, C. J. Castelle, A. J. Probst, B. C. Thomas, A. Singh, M. J. Wilkins, U. Karaoz, E. L. Brodie, K. H. Williams, S. S. Hubbard, and J. F. Banfield (2016). Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system. Nature Communications 7, ncomms13219. doi: 10.1038/ncomms13219.