ID3 Implements AI Techniques for Data Quality Assessment
ID3’ Digital Lab is making use of Artificial Intelligence (AI) to enhance Data Quality evaluation.
Our Data Quality pipeline for time-series data is a 2 step process, starting from non-AI based univariate anomaly detection methodologies and ending with more complex AI models.
The first step of the pipeline leverages domain knowledge for the specific business problem of interest: we know in advance some possible relationships between measurements based on Physics laws. For instance, we anticipate that if one variable exhibits positivity, another associated variable should follow suit. Any deviations from these constraints are flagged as anomalies. Other techniques used in the first step are anomaly detection algorithms, which work in an univariate fashion on the single variable or on a transformation of it.
The second step makes use of more advanced AI models. These methods take multiple variables (and their intercorrelations) into account, and are able to detect anomalies in multiple variables. The models used were mostly based on Autoencoders, and their probabilistic counterpart, Variational Autoencoders. Both of these approaches are unsupervised (or semi-supervised) deep learning architectures.
Autoencoders are a group of deep learning architectures that aim to learn a latent representation of the input data, with a significantly smaller dimensionality, and reconstruct the input data from the input latent representation. In the context of anomaly detection, the goal is for the model to learn to reconstruct normal inputs with a low error, and reconstruct the anomalous inputs with a high error. In the end, by setting a threshold on the reconstruction error, anomalies are detected.
This approach not only upholds the integrity and reliability of our data but also highlights our dedication to utilizing state-of-the-art technologies to advance excellence in Data Quality Assessment.