Autoregressive Drift Detection Method Explained
This page gives an accessible explanation of the paper Autoregressive based Drift Detection Method by Mansour Zoubeirou A. Mayaki and Michel Riveill.
Short summary
In many machine-learning applications, a model is trained on past data and then used on future observations. This works well when the data distribution stays stable. In real-world data streams, however, the distribution can change over time. This phenomenon is known as concept drift, and it can make a previously accurate model less reliable.
The paper proposes ADDM, an autoregressive drift detection method. Instead of looking directly at raw inputs, ADDM focuses on the model error rate. The error rate is treated as a dynamic signal, and changes in this signal are used to detect when the underlying data distribution may have changed.
Method overview
What problem does the paper address?
Standard learning systems often assume that training data and future data follow the same distribution. This assumption is called stationarity. In practice, it is frequently violated. For example, user behavior, industrial sensors, fraud patterns, medical signals, or network traffic can evolve over time.
When the data distribution changes, a model trained on old data may no longer be appropriate. Detecting this change quickly is important because it allows the system to decide when to retrain or update the model.
Why use an autoregressive model?
The key observation is that drift often appears through a change in the model error. If the model suddenly makes more mistakes, or if the error pattern changes, this can indicate that the data distribution has shifted.
ADDM models the error rate as a time series. A simplified autoregressive view is:
where \(Y_t\) denotes the current error rate and previous values \(Y_{t-1},\ldots,Y_{t-p}\) help predict its expected behavior. If the observed behavior no longer matches the expected regime, the detector can signal a drift.
Model adaptation after drift
Detecting drift is only the first step. Once a drift is detected, the predictive model must adapt to the new distribution. The paper proposes to combine information from the old model and a new model trained on recent data, using a weight related to the severity of the drift.
One quantity used in the paper is:
where \(Q_3^0\) is the third quartile of the error rate under the old concept, and \(Q_3^t\) is the third quartile of the error rate under the new concept. Intuitively, this weight helps decide how strongly the new model should influence the final prediction after drift.
Why it matters
- It works at the model level. The method can be connected to different learning algorithms because it monitors prediction errors.
- It is useful for streams. The approach is designed for data arriving over time.
- It supports adaptation. The method does not only detect drift; it also proposes a way to update the predictive model.
- It is practical. Monitoring the error rate is often easier than modeling the full high-dimensional input distribution.