It’s Not the Algorithm: Faulty Data Derails Artificial Intelligence

3 November, 2025

A Bar-Ilan University study finds that data flaws—not code—are the root cause of AI system failures, from mislabeled images to hidden biases that distort machine-made decisions

In the photo: Right – Dr. Limor Ziv (Photo: Netanel Israel).
Left – Dr. Maayan Nakash (Photo: Bar-Ilan University)

In a world obsessed with models, compute power, and algorithmic breakthroughs, a new study from Bar-Ilan University shines a spotlight on the less glamorous—but perhaps most crucial—ingredient of artificial intelligence: data. According to Dr. Limor Ziv from the School of Communication and Dr. Maayan Nakash from the Department of Management, most AI failures stem not from flawed algorithms but from the data that feeds them.

Published in the scientific journal Machine Learning & Knowledge Extraction, the study is among the first to explore this issue empirically and across industries. The researchers conducted in-depth interviews with 74 senior AI and data executives from the United States, Europe, India, and Israel to uncover what really happens “behind the algorithm”: how data is collected, managed, and ultimately misused.

Data Quality – The Hidden Bottleneck of the AI Revolution

The findings are striking: the real bottleneck in AI is not computing power but the integrity of the information underlying it. The study identifies common issues such as missing data, labeling errors, duplication, and inconsistent formats—all of which lead to bias, ethical failures, and eroded public trust.
“Only organizations that invest in data cleaning, validation, and standardization can build AI systems that are reliable and valuable,” says Dr. Ziv.

The research illustrates how data flaws directly translate into malfunctioning models. In one case, inconsistent manual labeling of medical images caused an algorithm to recognize which hospital a photo came from—rather than the actual disease. “The model was statistically brilliant but conceptually wrong,” one interviewee recalled.

In another case, missing financial data led recommendation systems to “fill in the blanks” with historical assumptions, reinforcing systemic bias. Dr. Nakash calls this the “illusion of completeness”:
“When data is incomplete, the model invents what’s missing—and in doing so, it cements old biases.”

Even duplication and inconsistent file formats can derail models. “Half of our systems failed because data came from 15 sources that didn’t speak the same language,” said one data executive. Other examples included hiring algorithms that downgraded female applicants—not due to malicious code, but because the historical training data skewed male.

The researchers also warn of data drift: models that once performed well lose accuracy as the real world changes. “The model didn’t fail,” one participant said. “Reality just moved on—but without ongoing monitoring, you only notice after the damage is done.”

Ziv and Nakash argue that data quality is not a technical issue but a moral and organizational one. Biased or incomplete data reproduces human inequalities even when algorithms appear neutral. The risks extend further—privacy breaches, data leaks, and opaque decision-making erode trust across industries. As one participant warned, “A serious data leak from a healthcare or financial system is only a matter of time.”

Building on their findings, the researchers propose a new conceptual model—the Data-Centric AI Lifecycle—which treats data not as a stage in development but as a living infrastructure: collected, cleaned, monitored, and constantly improved.
Organizations, they suggest, must manage data like any other critical asset—maintained, audited, and renewed continuously. “If you want trustworthy AI,” they conclude, “you must start with trustworthy data.”

The Bar-Ilan study aligns with a growing global movement toward Data-Centric AI: a cultural shift that prioritizes information integrity over algorithmic novelty. It calls for less fascination with “the magic of the model” and more focus on the invisible scaffolding that sustains it.

In an age when AI informs everything from medicine to national security, the study’s message is both timely and universal:
There can be no artificial intelligence without data intelligence.

Share via Whatsapp

Posted in: News