A technology-agnostic long-read analysis pipeline for transcriptome discovery and quantification
Citations Over Time
Abstract
ABSTRACT Alternative splicing is widely acknowledged to be a crucial regulator of gene expression and is a key contributor to both normal developmental processes and disease states. While cost-effective and accurate for quantification, short-read RNA-seq lacks the ability to resolve full-length transcript isoforms despite increasingly sophisticated computational methods. Long-read sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore (ONT) bypass the transcript reconstruction challenges of short reads. Here we introduce TALON, the ENCODE4 pipeline for platform-independent analysis of long-read transcriptomes. We apply TALON to the GM12878 cell line and show that while both PacBio and ONT technologies perform well at full-transcript discovery and quantification, each displayed distinct technical artifacts. We further apply TALON to mouse hippocampus and cortex transcriptomes and find that 422 genes found in these regions have more reads associated with novel isoforms than with annotated ones. We demonstrate that TALON is a capable of tracking both known and novel transcript models as well as their expression levels across datasets for both simple studies and in larger projects. These properties will enable TALON users to move beyond the limitations of short-read data to perform isoform discovery and quantification in a uniform manner on existing and future long-read platforms.
Related Papers
- → Alternative Splicing and Isoforms: From Mechanisms to Diseases(2022)140 cited
- → Alternative Splicing in Angiogenesis(2019)112 cited
- → Stage-specific changes in SR splicing factors and alternative splicing in mammary tumorigenesis(1999)175 cited
- → Splicing isoform-specific functional genomic in cancer cells(2018)3 cited
- → Expression of Human BRE in Multiple Isoforms(2001)20 cited