Session: Leveraging FAIR Data to Discover New Connections in Ecology
Harmonizing microbial community data for broader biodiversity and ecological meta-analyses
Monday, August 2, 2021
Link To Share This Presentation: https://cdmcd.co/dEyGRA
Jeffrey L. Blanchard, Biology, University of Massachusetts, Amherst, Amherst, MA, Emiley Eloe-Fadrosh, Metagenomics Program Head, Joint Genome Institute, Berkeley, CA, Janet K. Jansson, Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA, Margaret O'Brien, Marine Science Institute, University of California, Santa Barbara, Santa Barbara, CA, Jorge L. Mazza Rodrigues, Land, Air, and Water Resources, University of California Davis, Davis, CA and Lee Stanish, Institute of Arctic and Alpine Research, University of Colorado, Boulder, Boulder, CO
Background/Question/Methods With high-throughput DNA sequencing techniques it is now possible to simultaneously capture genetic information on thousands of soil microorganisms. Genomic data is often stored in international nucleotide sequence databases, but often the metadata associated with these projects is sparse and rarely integrated into or cross-referenced with ecology-oriented databases. Environmental sequence data, results such as species abundance tables and traits related to biogeochemical cycling are seldom included in (or even linked to) Long Term Ecological Research (LTER) or the Environmental Database Initiative (EDI) datasets, and when added, are often in an ad hoc manner not suitable for inclusion in broader biodiversity studies. This creates limits collaborations between molecular microbial ecologists and other population ecologists as well as systems ecologists. The specific aims of our NSF LTER Synthesis working group Ecological Metagenome-derived Reference Genomes and Traits (EMERGENT) are to advance efforts to harmonize metagenomic sequence data sets to enable ecological research into microbial taxa and their functional traits, streamline their use in syntheses with climate-related research using EDI data sets, and enable future metagenomic studies that leverage the LTER environmental data. Results/Conclusions To address these challenges we have identified metagenomic data sets from LTER sites, the National Ecological Observatory Network and several other ecological observatories. The individual metagenomes were processed, assembled, annotated and binned into potential environmental/population genomes (MAGs) using bioinformatics workflows at the Joint Genome Institute. The data products are stored in the public Integrated Microbial Genomes & Microbiomes database. For most of the LTER metagenomes less than 3% of the DNA mapped to existing genomes which demonstrates vast discovery space that still exists in soil communities. To increase the reusability of the data we applied an updated algorithm which increased the number of MAGs associated with each metagenome. The taxonomic classification of the MAGs using the Genome Taxonomy Database and the emerging SeqCode Initiative will provide consistent nomenclature for bacteria across these and newer data sets that can be updated. This will incentivize the discovery of new organisms from metagenome samples. While there exists a community standard "Minimum Information about a Metagenome Sequence'', the ecological metadata terms were still inconsistent across data sets and the term "unclassified" was the most abundant term under the "Specific Ecosystem" category. This highlights the extent of the challenge for making data findable for ecosystem studies and for interoperability with other ecological data sets.