Uncovering hidden duplicated content in public transcriptomics data
Database2013Vol. 2013, pp. bat010–bat010
Citations Over Time
Abstract
As part of the development of the database Bgee (a dataBase for Gene Expression Evolution), we annotate and analyse expression data from different types and different sources, notably Affymetrix data from GEO and ArrayExpress, and RNA-Seq data from SRA. During our quality control procedure, we have identified duplicated content in GEO and ArrayExpress, affecting ∼14% of our data: fully or partially duplicated experiments from independent data submissions, Affymetrix chips reused in several experiments, or reused within an experiment. We present here the procedure that we have established to filter such duplicates from Affymetrix data, and our procedure to identify future potential duplicates in RNA-Seq data.
Related Papers
- → Transcriptome analysis of human ageing in male skin shows mid-life period of variability and central role of NF-κB(2016)61 cited
- → Characterization of Early Developments in the Splenic Leukocyte Transcriptome of NOD Mice(2003)5 cited
- → Triplet Frequencies Implementation in Total Transcriptome Analysis(2019)
- → TRANSCRIPTOMIC AGING IN THE HRS(2019)
- → Studying gene expression profiles in specialized brain regions by microSAGE(2000)