Learning to Fold RNAs in Linear Time
Citations Over Time
Abstract
Abstract RNA secondary structure is helpful for understanding RNA’s functionality, thus accurate prediction systems are desired. Both thermodynamics-based models and machine learning-based models have been used in different prediction systems to solve this problem. Compared to thermodynamics-based models, machine learning-based models can address the inaccurate measurement of thermodynamic parameters due to experimental limitation. However, the existing methods for training machine learning-based models are still expensive because of their cubic-time inference cost. To overcome this, we present a linear-time machine learning-based folding system, using recently proposed approximate folding tool LinearFold as inference engine, and structured SVM (sSVM) as training algorithm. Furthermore, to remedy non-convergence of naive sSVM with inexact search inference, we introduce a max violation update strategy. The training speed of our system is 41× faster than CONTRAfold on a diverse dataset for one epoch, and 14× faster than MXfold on a dataset with longer sequences. With the learned parameters, our system improves the accuracy of LinearFold, and is also the most accurate system among selected folding tools, including CONTRAfold, Vienna RNAfold and MXfold.
Related Papers
- → Classification using support vector machines with graded resolution(2005)31 cited
- → Smoothing Support Vector Machines for e-Insensitive Regressi(2006)2 cited
- On Multiclass Support Vector Machines: One-Against-Half Approach(2010)
- 재무예측을 위한 Support Vector Machine의 최적화(2011)
- [발표논문] 인공신경망과 Support Vector Machine의 기업부도예측 성과 비교-Support Vector Machine의 유용성을 중심으로-(2004)