Enhancing sensitivity and controlling false discovery rate in somatic indel discovery
Citations Over Time
Abstract
Abstract As witnessed by various population-scale cancer genome sequencing projects, accurate discovery of somatic variants has become of central importance in modern cancer research. However, count statistics on somatic insertions and deletions (indels) discovered so far point out that large amounts of discoveries must have been missed. The reason is that the combination of uncertainties relating to, for example, gap and alignment ambiguities, twilight zone indels, cancer heterogeneity, sample purity, sampling and strand bias are hard to accurately quantify. Here, a unifying statistical model is provided whose dependency structures enable to accurately quantify all inherent uncertainties in short time. As major consequence, false discovery rate (FDR) in somatic indel discovery can now be controlled at utmost accuracy. As demonstrated on simulated and real data, this enables to dramatically increase the amount of true discoveries while safely suppressing the FDR. Specifically supported by workflow design, our approach can be integrated as a post-processing step in large-scale projects. The software is publicly available at https://varlociraptor.github.io and can be easily installed via Bioconda 1 [Grüning et al., 2018].
Related Papers
- → The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection(2015)61 cited
- → Patterns of Insertion and Deletion in Mammalian Genomes(2007)56 cited
- → The pattern of insertion/deletion polymorphism in Arabidopsis thaliana(2008)29 cited
- → Genetic structure and relationships among 11 cattle populations using indel markers(2018)1 cited
- False discovery rate and its extension and application(2011)