0 citations0 references

Versioning for End-to-End Machine Learning Pipelines

2017pp. 1–9

Citations Over TimeTop 10% of 2017 papers

Tom van der Weide, Dimitris Papadopoulos, O. Smirnov, Michał Zieliński, Tim van Kasteren

Abstract

End-to-end machine learning pipelines that run in shared environments are challenging to implement. Production pipelines typically consist of multiple interdependent processing stages. Between stages, the intermediate results are persisted to reduce redundant computation and to improve robustness. Those results might come in the form of datasets for data processing pipelines or in the form of model coefficients in case of model training pipelines. Reusing persisted results improves efficiency but at the same time creates complicated dependencies. Every time one of the processing stages is changed, either due to code change or due to parameters change, it becomes difficult to find which datasets can be reused and which should be recomputed.

Citations Over TimeTop 10% of 2017 papers

Abstract

Related Papers