Versioning for End-to-End Machine Learning Pipelines
2017pp. 1–9
Citations Over TimeTop 10% of 2017 papers
Abstract
End-to-end machine learning pipelines that run in shared environments are challenging to implement. Production pipelines typically consist of multiple interdependent processing stages. Between stages, the intermediate results are persisted to reduce redundant computation and to improve robustness. Those results might come in the form of datasets for data processing pipelines or in the form of model coefficients in case of model training pipelines. Reusing persisted results improves efficiency but at the same time creates complicated dependencies. Every time one of the processing stages is changed, either due to code change or due to parameters change, it becomes difficult to find which datasets can be reused and which should be recomputed.
Related Papers
- → Development of a simulation model for reuse businesses and case studies in Japan(2010)37 cited
- → Does End-to-End Trained Deep Model Always Perform Better than Non-End-to-End Counterpart?(2021)2 cited
- → The notion of end-to-end capacity and its application to the estimation of end-to-end network delays(2005)4 cited
- → End-to-end consensus using end-to-end channels(2006)2 cited
- Research on recycle and reuse of auto material(2003)