Predicting species distributions and entire communities is crucial for ecologists, offering quantitative data for conservation and a deeper understanding of the drivers behind species distributions and community assembly.
Conventional Species Distribution Models (SDMs) utilize statistical and machine learning methods but face limitations in multi-species predictions at the community level, hindered by scalability and sensitivity to imbalance. This paper explores the potential of advanced deep-learning methods to overcome these challenges and provide meaningful multi-species predictions. Interpretability methods are also discussed, revealing valuable ecological insights from complex models.
Specifically, we introduce two deep learning models that both use site x species community data but differ on the input environmental data structure: (1) a Multi-Layer Perception (MLP) model for tabular environmental data (e.g., in-situ climate or soil data), and (2) a Convolutional Neural Network (CNN) model tailored for image data (e.g., photos, satellite imagery). Imbalance issues are addressed through adapted loss functions.
We applied these two deep-learning models to a plant community data comprising 130582 plot encompassing 2522 species located in the French Alps. The tabular environmental data consisted of high-resolution climate, terrain and soil information, while the images were derived from aerial photographs. Despite differing information sources, both models achieve approximately 70% macro True Skill Statistics (TSS) on hold-out data, demonstrating high predictive capacity for community data. We further demonstrated the use of interpretability tools to maximize the utility of these methods in unraveling the intricacies of community structure.
In conclusion, neural networks, though more complex than traditional models, offer a broader array of features for predicting entire species communities. They handle imbalance issues and accommodate various environmental data types, ranging from tabular datasets to images, while also providing insightful interpretation tools. The versatility extends to tabular datasets and images, with no clear superiority between the two. The last hidden layers can contribute valuable input data for other species modeling endeavors, and trained models support transfer learning tasks. We assert that the field of ecology now possesses an additional, potent tool in its arsenal that can foster basic and fundamental research.