Scaling and Normalization Effects in NMR Spectroscopic Metabonomic Data Sets
Citations Over TimeTop 10% of 2006 papers
Abstract
Considerable confusion appears to exist in the metabonomics literature as to the real need for, and the role of, preprocessing the acquired spectroscopic data. A number of studies have presented various data manipulation approaches, some suggesting an optimum method. In metabonomics, data are usually presented as a table where each row relates to a given sample or analytical experiment and each column corresponds to a single measurement in that experiment, typically individual spectral peak intensities or metabolite concentrations. Here we suggest definitions for and discuss the operations usually termed normalization (a table row operation) and scaling (a table column operation) and demonstrate their need in 1H NMR spectroscopic data sets derived from urine. The problems associated with "binned" data (i.e., values integrated over discrete spectral regions) are also discussed, and the particular biological context problems of analytical data on urine are highlighted. It is shown that care must be exercised in calculation of correlation coefficients for data sets where normalization to a constant sum is used. Analogous considerations will be needed for other biofluids, other analytical approaches (e.g., HPLC-MS), and indeed for other "omics" techniques (i.e., transcriptomics or proteomics) and for integrated studies with "fused" data sets. It is concluded that data preprocessing is context dependent and there can be no single method for general use.
Related Papers
- → Preprocessing and Pretreatment of Metabolomics Data for Statistical Analysis(2017)70 cited
- → Road Accident Data Analysis: Data Preprocessing for Better Model Building(2019)17 cited
- → Comparison of Data Normalization for Wine Classification Using K-NN Algorithm(2022)11 cited
- → Data preprocessing based on missing value and discretisation(2020)2 cited
- Research and Application on Spatial Data Preprocessing Techniques in Logistics Area(2010)