This paper presents DIS-PIPE, a software tool that leverages well-established process mining techniques to tackle the Data Pipeline Discovery (DPD) task. Data pipelines are composite steps that move data from disparate sources to some data consumers. While data travels through the pipeline, it can undergo various transformations processed by computational platforms. In this context, DPD targets learning the structure and behavior of a data pipeline from an event log that keeps track of its past executions, uncovering, to some extent, specific execution-related dark data whose knowledge is critical to improving the quality of pipeline modeling. DIS-PIPE has been designed, implemented, and validated in the H2020 European project DataCloud context, and is able to interpret XES logs enriched with information to capture the core concepts of data pipelines.
Dettaglio pubblicazione
2024, Doctoral Consortium and Demo Track 2024 at the International Conference on Process Mining 2024 (ICPM-D 2024), Pages - (volume: 3783)
DIS-PIPE: A Tool for Data Pipeline Discovery (04b Atto di convegno in volume)
Agostinelli S., Benvenuti D., Marrella A., Rossi J.
keywords