Detecting drift in deep learning: A methodology primer

Abstract:

Supervised machine learning models are trained on historical data to learn a static mapping between their input and output variables. However, when they are deployed on continuously streamed data, whose nature is likely to change over time (data or concept drift), model performance may suddenly and substantially degrade, forcing practitioners to continuously update the models to reflect the new data distribution. Few methods, however, are available to reliably detect data drift on heterogeneous data types (structured and unstructured), possibly without requiring labeled data at inference time. In this article, we review existing methods for dataset drift detection, discuss their applicability to deep neural networks, and experiment on a practical case study related to semistructured document analysis.