Enriched Random Forest for High Dimensional Genomic Data
Citations Over TimeTop 10% of 2021 papers
Abstract
Ensemble methods such as random forest works well on high-dimensional datasets. However, when the number of features is extremely large compared to the number of samples and the percentage of truly informative feature is very small, performance of traditional random forest decline significantly. To this end, we develop a novel approach that enhance the performance of traditional random forest by reducing the contribution of trees whose nodes are populated with less informative features. The proposed method selects eligible subsets at each node by weighted random sampling as opposed to simple random sampling in traditional random forest. We refer to this modified random forest algorithm as "Enriched Random Forest". Using several high-dimensional micro-array datasets, we evaluate the performance of our approach in both regression and classification settings. In addition, we also demonstrate the effectiveness of balanced leave-one-out cross-validation to reduce computational load and decrease sample size while computing feature weights. Overall, the results indicate that enriched random forest improves the prediction accuracy of traditional random forest, especially when relevant features are very few.
Related Papers
- [A comparative study of different sampling designs in fish community estimation].(2014)
- MARGIN OF ERROR BETWEEN SIMPLE RANDOM SAMPLING AND STRATIFIED SAMPLING(2018)
- A Comparative Research Via Sampling Methods——A Case Study of the Commute Time in Beijing(2012)
- → Sampling Methods(2021)3 cited
- Application of Afterwards Stratified Sampling Technique to Inventory of Forest Resources' Planning and Design(2008)