Reliability of Supervised Machine Learning Using Synthetic Data in Health Care: Model to Preserve Privacy for Data Sharing
JMIR Medical Informatics2020Vol. 8(7), pp. e18910–e18910
Citations Over TimeTop 10% of 2020 papers
Abstract
The results of this study are promising with small decreases in accuracy observed in models trained with synthetic data compared with models trained with real data, where both are tested on real data. Such deviations are expected and manageable. Tree-based classifiers have some sensitivity to synthetic data, and the underlying cause requires further investigation. This study highlights the potential of synthetic data and the need for further evaluation of their robustness. Synthetic data must ensure individual privacy and data utility are preserved in order to instill confidence in health care departments when using such data to inform policy decision-making.
Related Papers
- → Random-Forest (RF) and Support Vector Machine (SVM) Implementation for Analysis of Gene Expression Data in Chronic Kidney Disease (CKD)(2019)30 cited
- → Hyperspectral remote sensing image classification based on the integration of support vector machine and random forest(2012)21 cited
- → Comparison between Support Vector Machine and Random Forest for Audio Classification(2021)12 cited
- → Smartphone Price Prediction Using Machine Learning Techniques(2023)3 cited
- → An Innovative Method in Improving the accuracy in Intrusion detection by comparing Random Forest over Support Vector Machine(2022)1 cited