A two-phase transfer learning model for cross-project defect prediction
Citations Over TimeTop 1% of 2018 papers
Abstract
Context: Previous studies have shown that a transfer learning model, TCA+ proposed by Nam et al., can significantly improve the performance of cross-project defect prediction (CPDP). TCA+ achieves the improvement by reducing data distribution difference between source (training data) and target (testing data) projects. However, TCA+ is unstable, i.e., its performance varies largely when using different source projects to build prediction models. In practice, it is hard to choose a suitable source project to build the prediction model. Objective: To address the limitation of TCA+, we propose a two-phase transfer learning model (TPTL) for CPDP. Method: In the first phase, we propose a source project estimator (SPE) to automatically choose two source projects with the highest distribution similarity to a target project from candidates. Next, two source projects that are estimated to achieve the highest values of F1-score and cost-effectiveness are selected. In the second phase, we leverage TCA+ to build two prediction models based on the two selected projects and combine their prediction results to further improve the prediction performance. Results: We evaluate TPTL on 42 defect datasets from PROMISE repository, and compare it with two versions of TCA+ (TCA+_Rnd, randomly selecting one source project; TCA+_All, using all alternative source projects), a related source project selection model TDS proposed by Herbold, a state-of-the-art CPDP model leveraging a log transformation (LT) method, and a transfer learning model Dycom with better form of TCA. Experiment results show that, on average across 42 datasets, TPTL respectively improves these baseline models by 19%, 5%, 36%, 27%, and 11% in terms of F1-score; by 64%, 92%, 71%, 11%, and 66% in terms of cost-effectiveness. Conclusion: The proposed TPTL model can solve the instability problem of TCA+, showing substantial improvements over the state-of-the-art and related CPDP models.
Related Papers
- → Leverage and Corporate Performance: Evidence from Unsuccessful Takeovers(1999)222 cited
- → News media coverage and corporate leverage adjustments(2019)90 cited
- → Deep Transfer Learning for the Multilabel Classification of Chest X-ray Images(2022)24 cited
- → Performance of True Transfer Learning using CNN DenseNet121 for COVID-19 Detection from Chest X-Ray Images(2021)22 cited
- → The dynamics of leverage of newly controlled target firms: evidence after an acquisition(2023)1 cited