Pipelines, Databases, Workflows—Making Phenotypic Data FAIR in a Single Facility and a Global Resource
Tuesday, January 26, 2021
8:30 AM – 9:00 AM EST
Th UK National Phenotypic Screening Centre (NPSC) provides phenotypic screening services using advanced cellular models for academic and commercial partners based around the UK and Europe. Assays include quantitative live cell assays for sperm motility, bronchoepithelial damage responses, inflammatory responses in skin, and T cell exhaustion, to name a few. All these assays produce large heterogeneous data collections that combine imaging data, and chemical and analytic metadata. The scale and heterogeneity of these data, and the fact that most critical data is stored in proprietary file formats, present acute informatics challenges. To store, process, share, analyse and publish these data, we employ tools from the Open Microscopy Environment (OME; http://openmicroscopy.org) an open-source software consortium that builds data management and access platforms. OME releases specifications and software for managing image datasets and integrating them with other scientific data. OME’s Bio-Formats and OMERO are used in 1000’s of labs worldwide to enable discovery with imaging. OME-TIFF is an open, metadata-rich, multi-dimensional, multi-resolution data format for modern bioimaging that has been widely adopted across the bioimaging community. We will present example workflows and queries showing how Bio-Formats and OMERO can be used to make data Findable, Accessible, Interoperable and Reproducible (FAIR) in the context of a single screening facility. Once data is processed and analysed, it must be published—either within an organisation or if part of a publication, in a public data repository. OME has used Bio-Formats and OMERO to build solutions for sharing and publishing imaging data. The Image Data Resource (IDR; https://idr.openmicroscopy.org) includes image data linked to >90 independent studies from genetic, RNAi, chemical, localisation and geographic high content screens, super-resolution microscopy, single cell profiling, light sheet microscopy of developing organisms and tissues, and digital pathology. Datasets range from several GBs to tens of TBs. Wherever possible, we have integrated image data with all relevant experimental, imaging and analytic metadata. These annotations make it possible to re-use IDR data, and to connect independent imaging datasets by molecular perturbations and phenotypes. We have also built cloud-based analysis tool portals to catalyse the re-use of published imaging data. These include notebooks and Docker containers that package well-known tools like CellProfiler and Ilastik, making it easy to view and interact with IDR data. All these efforts are open source and available for anyone to access and re-use.