Connecting Earth observation, high-throughput arthropod-biodiversity point samples, and joint species distribution modelling using neural networks to predict the distribution of biodiversity across a working forest landscape
Wednesday, August 4, 2021
Link To Share This Presentation: https://cdmcd.co/dEyjwv
Yuanheng Li and Mingjie Luo, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China, Christian Devenish and Douglas W. Yu, School of Biological Sciences, University of East Anglia, Norwich, United Kingdom, Marie I. Tosa and Taal Levi, Department of Fisheries and Wildlife, Oregon State University, Corvallis, OR, Damon B. Lesmeister and David M. Bell, Pacific Northwest Research Station, USDA Forest Service, Corvallis, OR
State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences Kunming, China
Background/Question/Methods Our overall goal is to predict the distribution of arthropod communities using only remotely sensed environmental covariates, to make it possible to carry out large-scale, continuous mapping of arthropod biodiversity. In this project, we collected 121 Malaise-trap samples from 96 sample points in and around the HJ Andrews Experimental Forest, Oregon. We shotgun-sequenced each Malaise-trap sample and used Kelpie software to carry out in-silico PCR of the COI DNA barcode gene (BF3BR2 primer pair) to extract 889 Operational Taxonomic Units (OTU), which we filtered to 303 OTUs with ≥ 6 incidences. The resulting sample by species (OTU) table was paired with Landsat, GIS, and Lidar environmental covariates in a joint species distribution model (multivariate probit model). This model (sjSDM package in R) is one of the first to apply deep neural networks (DNN) on ecological community data. The model uses a DNN on the environmental covariates, paired with a linear model on spatial position and a sophisticated species-species correlation matrix imposed to direct regularization. Model tuning was carried out using crossvalidation, and we measured model performance in terms of explanatory power, and predictive power. Results/Conclusions The model currently has a mean explanatory AUC (area under ROC curve) over all species of 0.82, and a mean predictive AUC of 0.73. Our results show that it is possible to fit a model with reasonable predictive performance on large numbers of data-poor species from a mass-sampling campaign. Preliminary tests with linear models fit with sjSDM showed that elevation and forest age are important covariates for predictive performance. Next steps will be continued technical improvements to model fitting and to use explanable AI (xAI) to derive mechanistic understanding of the fitted DNN model.