Analysis of the first genetic engineering attribution challenge
Citations Over TimeTop 10% of 2022 papers
Abstract
The ability to identify the designer of engineered biological sequences-termed genetic engineering attribution (GEA)-would help ensure due credit for biotechnological innovation, while holding designers accountable to the communities they affect. Here, we present the results of the first Genetic Engineering Attribution Challenge, a public data-science competition to advance GEA techniques. Top-scoring teams dramatically outperformed previous models at identifying the true lab-of-origin of engineered plasmid sequences, including an increase in top-1 and top-10 accuracy of 10 percentage points. A simple ensemble of prizewinning models further increased performance. New metrics, designed to assess a model's ability to confidently exclude candidate labs, also showed major improvements, especially for the ensemble. Most winning teams adopted CNN-based machine-learning approaches; however, one team achieved very high accuracy with an extremely fast neural-network-free approach. Future work, including future competitions, should further explore a wide diversity of approaches for bringing GEA technology into practical use.
Related Papers
- → Physician-Friendly Machine Learning: A Case Study with Cardiovascular Disease Risk Prediction(2019)71 cited
- → Artificial Intelligence, Machine Learning, and Medicine: A Little Background Goes a Long Way Toward Understanding(2021)29 cited
- → Application of Machine Learning in Animal Disease Analysis and Prediction(2020)26 cited
- → Sentiment Analysis by Using Supervised Machine Learning and Deep Learning Approaches(2020)3 cited
- → Breakdown of Machine Learning Algorithms(2022)1 cited