Motif extraction and protein classification
Citations Over TimeTop 16% of 2005 papers
Abstract
We present a novel unsupervised method for extracting meaningful motifs from biological sequence data. This de novo motif extraction (MEX) algorithm is data driven, finding motifs that are not necessarily over-represented in the data. Applying MEX to the oxidoreductases class of enzymes, containing approximately 7000 enzyme sequences, a relatively small set of motifs is obtained. This set spans a motif-space that is used for functional classification of the enzymes by an SVM classifier. The classification based on MEX motifs surpasses that of two other SVM based methods: SVMProt, a method based on the analysis of physical-chemical properties of a protein generated from its sequence of amino acids, and SVM applied to a Smith-Waterman distances matrix. Our findings demonstrate that the MEX algorithm extracts relevant motifs, supporting a successful sequence-to-function classification.
Related Papers
- → Image pattern classification for the identification of disease causing agents in plants(2009)317 cited
- → Brain fMRI processing and classification based on combination of PCA and SVM(2009)13 cited
- → A novel architecture of CNN based on SVM classifier for recognising Arabic handwritten script(2016)6 cited
- → Prediction of protein secondary structure using SVM-PSSM Classifier combined by sequence features(2016)5 cited
- → Research and Design of Image Feature Recognition Classifier Based on SVM(2009)2 cited