Session: Vital Connections in Ecology: Novel Collaborations with Community Stakeholders
Connecting disciplines and data in ecosystem sciences: Practices for efficient sample tracking, integration, and reuse
Monday, August 2, 2021
Link To Share This Presentation: https://cdmcd.co/XGg4J4
Joan Ball-Damerow, Charuleka Varadharajan, Eoin Brodie, Madison Burrus, Robert Crystal-Ornelas, Ricardo Eloy Alves, Zarine Kakalia, Emily Robles and Patrick Sorensen, Earth and Environmental Sciences Area, Lawrence Berkeley National Lab, Berkeley, CA, Kristin Boye, SLAC National Accelerator Laboratory, Menlo Park, CA, K. Dana Chadwick, Dept of Earth System Science, Stanford University, Stanford, CA, Kim S. Ely, Environmental & Climate Sciences Department, Brookhaven National Laboratory, Upton, NY, Amy E. Goldman, Energy and Environment Directorate, Pacific Northwest National Laboratory, Richland, WA, Ted Haberman, Metadata Game Changers, Boulder, CO, Valerie Hendrix, Computational Research Division, Lawrence Berkeley National Lab, Berkeley, CA, Kenneth M. Kemner and Pamela Weisenhorn, Biosciences Division, Argonne National Laboratory, Argonne, IL, Annie B. Kersting, Nancy Merino and Mavrik Zavarin, Lawrence Livermore National Lab, Livermore, CA, Zach Perzan, Department of Earth System Science, Stanford University, Palo Alto, CA, James C. Stegen, Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, Ramona L. Walls, CyVerse, University of Arizona, Tucson, AZ, Deborah A. Agarwal, Lawrence Berkeley National Laboratory, Berkeley, CA
Earth and Environmental Sciences Area, Lawrence Berkeley National Lab Berkeley, CA, USA
Background/Question/Methods The study of natural ecosystems requires multidisciplinary science teams to understand and model multi-scale processes. Research on these complex processes involve diverse collections of samples and associated field or laboratory measurements. For example, studies of organic matter cycling through plants and soil involve analysis of samples that represent soil biogeochemistry, microbial communities, plant structures, and ecophysiological traits of specific organisms involved. When such multidisciplinary data are published, however, they are often disconnected and missing information needed for interpretation, integration, and reuse. Clear data connections/links help represent interacting processes across related ecosystems data and support future data discovery and usability.
While there are widely adopted conventions within certain disciplinary communities to describe sample data, these have gaps when applied in a multidisciplinary context. In this study, we compared existing practices for identifying, characterizing, and linking related environmental samples. We then conducted a pilot test involving eight United States Department of Energy projects, to assess practicalities of assigning persistent identifiers to samples with standardized metadata. Participants collected a variety of sample types, with analyses conducted across multiple facilities. We addressed terminology gaps for multidisciplinary research and made recommendations for assigning identifiers and metadata that supports tracking, integration, and reuse. Our goal was to provide a practical approach for sample documentation, geared towards ecosystem scientists who contribute and reuse sample data.
Results/Conclusions Many multidisciplinary projects have complicated workflows and need an efficient system for tracking samples as they are sent to collaborators, labs, user facilities, and published online. Despite growing need and interest, there was previously no straightforward guidance on how to describe collections of multidisciplinary samples. We therefore recommend registering samples with Global Sample Numbers (IGSNs), using our modified metadata template for ecosystem sciences (IGSN-ESS). The downloadable template, terminology definitions, and instructions for IGSN registration using IGSN-ESS are detailed in an associated github repository.
Overcoming complex challenges that require communities to change behavior and provide standardized data will require a coordinated effort; only coalitions of key stakeholders can establish community consensus, enforce guidelines, and help solve problems. These stakeholders include diverse data contributors and users from different scientific domains, as well as laboratory facilities, repositories, funders, and publishers that take part in institutionalizing and rewarding good data management practices. Community coordination on sample reporting conventions and linked cyberinfrastructure will help solve data management problems, expand access pathways, and make our sample data more useful over time.