Are Information Engineers Sleepwalking In the direction of AI Disaster?

June 27, 2025

135

Are Information Engineers Sleepwalking In the direction of AI Disaster?

(New Africa/Shutterstock)

For the reason that earliest days of huge knowledge, knowledge engineers have been the unsung heroes doing the soiled work of transferring, reworking, and prepping knowledge so extremely paid knowledge scientists and machine studying engineers can do their factor and get the glory. Because the agentic AI period dawns on us, it opens up a bunch of recent knowledge engineering alternatives–in addition to probably catostrphic pitfalls.

Frank Weigel, the previous Googel and Microsoft govt who was lately employed by Matillion to be its new chief product officer, brazenly questioned to a reporter lately whether or not the Agentic AI Air was on a glideslope for catastrophe.

“Mainly, we see there’s an enormous downside coming for knowledge engineering groups,” Weigel mentioned in an interview through the latest Snowflake Summit. “I’m unsure everyone is totally conscious of it.”

Right here’s the difficulty, as Weigel defined it:

The explosion of supply knowledge is one facet of the issue. Information engineers who’re accustomed to working with structured knowledge at the moment are being requested to handle, prep, and remodel unstructured knowledge, which is tougher to work with, however which in the end is the gas for many AI (i.e. phrases and photos processed by neural networks).

Information engineers are already overworked. Weigel cited a examine that indicated 80% of information engineering groups are already overloaded. However once you add AI and unstructured knowledge to the combination, the workload concern turns into much more acute.

Agentic AI offers a possible resolution. It’s pure that overworked knowledge engineering groups will flip to AI for assist. There’s a bevy of suppliers constructing copilots and swarms of AI brokers that, ostensibly, can construct, deploy, monitor, and repair knowledge pipelines once they break. We’re already seeing agentic AI have actual impacts on knowledge engineering groups, in addition to the downstream knowledge analysts who in the end are those requesting the info within the first place.

Supply: Shutterstock

However in keeping with Weigel, if we implement agentic AI for knowledge engineering the improper method we’re probably setting ourselves a lure that can be powerful to get out of.

The issue that he’s foreseeing would stem from AI brokers that entry supply knowledge on their very own. If an analyst can kick off an agentic AI workflow that in the end includes the AI agent writing SQL to acquire a chunk of information from some upstream system, what occurs when one thing goes improper with the info pipeline? AI brokers would possibly have the ability to repair fundamental issues, however what about critical ones that demand human consideration?

“You should have autonomous AI brokers that run complete enterprise capabilities,” Weigel mentioned. “However equally, they begin to have an enormous want for knowledge. And so if the info staff already was overloaded earlier than, effectively, it’s now going to be like wanting down the abyss and saying ‘How on earth can we do something? How am I going to have a human knowledge engineer reply a query from an AI agent?’”

As soon as human knowledge engineers are out of the loop, dangerous issues can begin taking place, Weigel mentioned. They probably face a state of affairs the place the quantity of information requests–which initially had been served by human knowledge engineers however now are being served by AI brokers–is past their functionality to maintain up.

The accuracy of information may even undergo, he mentioned. If each AI agent writes its personal SQL and pulls knowledge immediately out of its supply, the chances of getting the improper reply goes up significantly.

“We’re now again in the dead of night ages, the place we had been 10 years in the past [when we wondered] why we want knowledge warehouses,” he mentioned. “I do know that if individual A, B, and C ask a query, and beforehand they wrote their very own queries, they bought completely different outcomes. Proper now, we ask the identical agent the identical query, and since they’re non-deterministic, they may really create completely different queries each time you ask it. And consequently, you now have the completely different enterprise capabilities all getting completely different solutions, insisting after all that it’s proper.

Matillion CPO Frank Weigel

“You have got misplaced all of the governance and management of why you established a central knowledge staff,” Weigel continued. “And for me, that’s the angle that I believe plenty of knowledge orgs haven’t actually considered. Once I get a demo of an AI agent, they by no means discuss that. They only have the agent entry the info immediately. And certain, it could possibly. However the issue is, it shouldn’t actually.”

The reply to this dilemma, in keeping with Weigel, is twofold. First, it’s essential to maintain knowledge warehouses, because it serves as a repository for knowledge that has been vetted, checked, and standardized.

It’s additionally vital to maintain people within the loop, in keeping with Weigel. And to maintain people within the loop, human knowledge engineers should in some way be prevented from turning into fully overwhelmed by the unstructured knowledge requests and the brand new AI workflows. To perform that, he mentioned, they primarily should turn into superhuman knowledge engineers, augmented with AI.

Matillion is constructing its agentic AI options round this technique. As an alternative of setting AI brokers unfastened to jot down their very own SQL in opposition to supply knowledge techniques, Matillion is utilizing AI brokers as supporting forged members who’s purpose is to help the human knowledge engineer in getting the work achieved.

This on-demand staff of digital knowledge engineers is dubbed Maia, which the corporate introduced earlier this month. The brokers, which run within the Matillion Information Producdtivity Cloud (DPC), are capable of help knowledge engineers with a variety of duties, together with creating knowledge connectors, constructing knowledge pipelines, documenting adjustments, testing pipelines, and analyzing failures.

“We have to supercharge the info engineering operate, and we have to allow them to match the AI capabilities,” he mentioned. “As an alternative of only a copilot idea, it has turn into a element, a collection of completely different knowledge engineers which have completely different duties. They’ll do various things.”

Maia acts because the lead agent that controls numerous sub-agents. The corporate has three or 4 such knowledge engineering sub-agents at the moment, Weigel mentioned, and it’ll have extra sooner or later. Maia, which is constructed utilizing a group of enormous language fashions (LLMs), together with Anthropic’s Claude–may even appropriate itself when it does one thing improper.

Matillion is near transport a preview of Maia

“It’s actually fascinating,” Weigel mentioned. “If you see it work, it can break down the issue into the steps. Then it can begin doing it. It’ll have a look at the info and determine whether or not it’s going heading in the right direction. It would roll again. ‘That wasn’t fairly proper.’ And so it actually is sort of a knowledge engineer in its job and pondering, together with wanting on the knowledge. It’ll ask the human for sure at sure factors if it desires enter.”

Regardless of the potential for agentic autonomy, that’s not a part of the Matillion plan, as the corporate sees the human engineer as a vital backstop that may’t be eradicated from the equation.

One other essential backstop that might assist Matillion clients keep away from agentic AI pitfalls: No AI era of SQL.

Whereas LLMs like Claude have gotten actually, actually good at writing SQL, Matillion won’t hand the reins over to AI for this vital element. The ETL vendor has been routinely producing SQL as a part of its knowledge pipeline resolution for Snowflake, Databricks, and different cloud knowledge warehouses for years, and it’s not about to begin from scratch.

“The key in Matillion is we’ve abstracted that layer so we’re a lot nearer to the consumer intent,” Weigel mentioned. “So the consumer is constructing that knowledge pipeline intent with predefined constructing blocks that in the end write SQL. However it’s Matillion that writes SQL, not the consumer.”

This method additionally avoids the issue of getting spaghetti SQL code that may’t be up to date and modified over time, which is a risk with AI-generated code.

“We’ve got this abstraction of this intermediate illustration of those elements that in flip points SQL,” Weigel mentioned. “And so our agent doesn’t need to generate no matter code you want. As an alternative, it’s about choosing the right element and configuring the proper element after which sequencing them collectively.”

It’s simple to get mesmerized by the “shiny object” syndrome within the tech world. With all of the advances in generative AI, it’s tempting to letting these shiny new copilots unfastened to attempt to replicate the job of the overworked, under-appreciated knowledge engineer, at a fraction of her value.

But when changing knowledge engineers with AI additionally means changing a lot of the governance and management the info engineer brings, that might spell catastrophe for firms. “I believe knowledge engineering groups aren’t possibly totally conscious of the potential doom that’s there,” Weigel mentioned.

As an alternative, firms ought to be trying to super-charge these overworked knowledge engineers utilizing AI, which Weigel mentioned is the perfect hope for surviving the AI knowledge deluge.

Associated Gadgets:

Are We Placing the Agentic Cart Earlier than the LLM Horse?

Matillion Bringing AI to Information Pipelines

Matillion Appears to Unlock Information for AI