Creating a universal SNP and small indel variant caller with deep neural networks
Citations Over Time
Abstract
Abstract Next-generation sequencing (NGS) is a rapidly evolving set of technologies that can be used to determine the sequence of an individual’s genome 1 by calling genetic variants present in an individual using billions of short, errorful sequence reads 2 . Despite more than a decade of effort and thousands of dedicated researchers, the hand-crafted and parameterized statistical models used for variant calling still produce thousands of errors and missed variants in each genome 3,4 . Here we show that a deep convolutional neural network 5 can call genetic variation in aligned next-generation sequencing read data by learning statistical relationships (likelihoods) between images of read pileups around putative variant sites and ground-truth genotype calls. This approach, called DeepVariant, outperforms existing tools, even winning the “highest performance” award for SNPs in a FDA-administered variant calling challenge. The learned model generalizes across genome builds and even to other mammalian species, allowing non-human sequencing projects to benefit from the wealth of human ground truth data. We further show that, unlike existing tools which perform well on only a specific technology, DeepVariant can learn to call variants in a variety of sequencing technologies and experimental designs, from deep whole genomes from 10X Genomics to Ion Ampliseq exomes. DeepVariant represents a significant step from expert-driven statistical modeling towards more automatic deep learning approaches for developing software to interpret biological instrumentation data.
Related Papers
- → Performance evaluation of indel calling tools using real short-read data(2015)110 cited
- → KoVariome: Korean National Standard Reference Variome database of whole genomes with comprehensive SNV, indel, CNV, and SV analyses(2018)61 cited
- → Genetic insights and evaluation of forensic features in Mongolian and Ewenki groups using the InDel variations(2022)10 cited
- → Analysis of optimal alignments unfolds aligners’ bias in existing variant profiles(2016)6 cited
- → KoVariome: Korean National Standard Reference Variome database of whole genomes with comprehensive SNV, indel, CNV, and SV analyses(2017)5 cited