DQM: an open source library to revolutionize data quality evaluation for industrial AI

Discover

As part of the Confiance.ai program, IRT SystemX, in collaboration with Atos and CEA, has developed DQM (Data Quality Metrics), an open-source library designed in Python. This tool enables the evaluation of data quality used in the development and assessment of artificial intelligence (AI) models—particularly within complex industrial environments.

Data quality is essential to ensure the reliability of AI models. The DQM library provides a concrete response to this challenge by offering relevant and interpretable quality attributes that assess critical aspects such as the representativeness and coverage of data within specific operational domains.

The institute’s teams developed two main categories of metrics:

Data-intrinsic metrics (e.g., representativeness, diversity), to evaluate dataset quality before an AI model is implemented;
System-dependent metrics (e.g., data-model coverage), which measure the impact of data on system performance once integrated into a model.

DQM was designed as a standalone Python package, making it easy to use independently or to integrate into other tools, such as DebiAI. The library has already been integrated into the end-to-end methodologies of major industrial players such as Naval Group and Valeo, strengthening their ability to accurately assess data quality.

The potential of the DQM library is very promising. Our experiments have shown its effectiveness in providing a deep understanding of the data used in machine learning workflows. Integrated into the European Trustworthy Foundation created by the Confiance.ai community, the library is generating strong interest and is paving the way for new applications across various industrial sectors. Furthermore, a scientific paper detailing the library’s contributions was published in ATRACC.

Faouzi Adjed, research engineer and Data AI architect, IRT SystemX

Focus: concrete results with European and international reach

4 technology transfers to CAB project partners (RTE, Orange) and European players (Flatlandet, EnliteAI)

InteractiveAI serves as the technological foundation for the H2020 AI4REALNET project, strengthening its position as a reference platform in the field of interactive AI
The project has led to the publication of 8 scientific papers, contributing to the advancement of knowledge in AI and human-machine interaction

Discover the first edition of the Digital Transformation Notebooks