Challenges of Large-Scale Biomedical Workflows on the Cloud -- A Case Study on the Need for Reproducibility of Results
Citations Over TimeTop 10% of 2015 papers
Abstract
Computational bioinformatics workflows are extensively used to analyse genomics data. With the unprecedented advancements in genomic sequence technology and opportunities for personalized medicines, it is essential that analysis results are repeatable by others, especially when moving into clinical environment. To cope with the complex computational demands of huge biological datasets, a shift to distributed compute resources is unavoidable. A case study was conducted in which three well established bioinformatics analysis groups across Australia were assigned to analyse exome sequence data from a range of patients with a rare condition: disorder of sex development. Initially these groups used their own in-house data processing pipelines, and subsequently used a common bioinformatics workbench based upon Galaxy and offered through the Australia-wide National eResearch Collaboration Tools and Resources (NeCTAR) Research Cloud. This paper describes the experiences in this work and the variability of results. We put forward principles that should be used to ensure reproducibility of scientific results moving forward.
Related Papers
- → The GENCODE exome: sequencing the complete human exome(2011)63 cited
- → Clinical Exome Performance for Reporting Secondary Genetic Findings(2014)35 cited
- [The application of exome sequencing in human disease].(2014)
- → Exome sequencing for neurological disorders(2017)
- → Abstract 1349: Improved human exome sequencing workflow with the most complete coverage(2020)