RCA in the AI Era, Without Shipping Your Failure Data to the Labs

AI compresses RCA from weeks to minutes. But if that speed up runs through a consumer chatbot, your sovereign data posture is already gone. Here is what sovereign RCA looks like in practice.

Written by Shane Scriven

Connect with us to learn more

Last Monday's AI in AM Weekly argued that the previous week in AI was not a capability week, it was a governance week. A US federal judge ruled that AI chat transcripts are not legally privileged. Uber confirmed that agentic tools burned an entire year's AI budget in a single quarter. Treasury and the Fed summoned the biggest US banks over a model that surfaces thousands of zero days on demand. The through line was simple: the capability arrived before the guardrails, and the gap is now your asset risk register's problem.

This article is the practical companion to that argument. It is about the one AI workflow inside asset owning organisations where the governance gap shows up first, at scale, every single day: root cause analysis.

The Friday afternoon pattern

Walk through any reliability engineering team in 2026 and you will see the same behaviour. A failure happens on an asset. The engineer pulls the work order history, the OEM manual, the operator's notes, pastes a narrative of the event into a consumer chatbot on a Friday afternoon, asks it to draft a cause analysis, and drops the result into the RCA template. The whole loop takes twenty minutes.

I want to be precise about why this works. It works because pattern finding across an unstructured pile of text is exactly what large language models do well. The engineer is not cheating. They are using the most powerful text tool ever built on a problem that is genuinely mostly text.

The problem is not that the AI is bad at this. The problem is what the workflow costs you by default.

Three things break the moment that workflow scales

Data egress. Every paste into a consumer AI is a data transfer. Your failure history, your asset identifiers, your operator notes, your customer impact, your contractor names, your safety concerns. All of it now sits on infrastructure you do not own, covered by terms of service you did not negotiate, in a jurisdiction you do not operate in. For many asset owners this is a direct violation of an existing data handling policy. For critical infrastructure operators it is a sovereignty failure before you get to the legal question.

Discoverability. The ruling from TheNextWeb on 15 April 2026 is not an abstract concern. The reasoning trail your engineer draws through the chatbot — the hypotheses raised, the causes considered and dismissed, the language used to describe the incident — is now a text record that can be subpoenaed. In an industry where failure investigations frequently end up in coronial inquests, audits and contractual disputes, a shadow RCA written in a US chatbot is a future exhibit. You cannot unpublish it.

No asset context. This one is the easiest to miss. The frontier model you are prompting has never seen your CMMS, your fleet, your asset register, your sensor tags, your failure mode library, your maintenance regime, your operator competency records or your three previous investigations into a similar failure. It can make educated guesses that sound fluent. Fluent is not the same as right. An RCA that sounds correct and is not correct is more dangerous than no RCA at all, because it carries the institutional weight of "the AI said so".

The 940k OBD file case

Earlier this month I ran an AI pass over roughly 940,000 on board diagnostic files from a high floor tram fleet. The task was unstructured text plus numeric tag data. The old version of this work is a team of reliability engineers reading through months of files, categorising them, looking for clusters and common preconditions. That version takes weeks.

The new version took hours. Clustering, frequency counts, correlation across tag streams, a first pass summary of the dominant failure patterns — the AI handled all of it. Pattern finding at scale is exactly what the technology is for.

What matters here is what did not happen. Not one of those 940,000 files left the asset boundary. The inference ran inside a controlled environment. The schema the AI read was the operator's schema. The failure history the AI was biased by was the operator's failure history. The output stayed inside the asset owning organisation's governance perimeter.

That is what sovereign RCA looks like, and it is not hypothetical. It ran earlier this month on a real fleet, on real files, and it produced results the reliability engineering team could act on without any of the three breaks above.

The pattern in practice

Sovereign RCA has three ingredients. None of them are exotic, but all three have to be present for the workflow to be defensible.

Edge or sovereign inference. The model runs inside an environment you control. That can mean on premises hardware, a sovereign cloud region under contract, or an embedded edge device sitting alongside the asset. SAS-AM's AMiPU platform is one implementation of this pattern — intentionally designed for offline operation so that the asset owning organisation's data never has to leave its boundary to get the AI speed up.

Your schema, your history. The model has to be prompted, grounded or fine tuned against your CMMS schema and your failure history. Generic failure mode libraries are a starting point, not an answer. A sovereign deployment that still reasons off a generic OEM manual throws away most of the context advantage you paid to protect.

Reliability engineering on top. The AI accelerates pattern finding. The reliability engineer closes the cause chain, judges novelty, and imports the operational context that is not in the data. The SAS-AM RCM meets AI framework (available as a gated download on our resources page) maps this explicitly — it names which steps of a reliability centred maintenance workflow are AI suitable, which are human only, and where the handoffs sit.

What to ship this week

Audit where RCA actually happens today. Ask every reliability engineer on your team, by name, what tool they reach for when they need a first pass analysis of a failure. If the answer is a consumer chatbot, you now know your data egress surface. Document it.
Draw the asset data boundary, explicitly. One page. What is inside, what is outside, what can cross, under what conditions, approved by whom. If that page does not exist, the boundary is whatever each individual engineer decides on a Friday afternoon.
Commission one sovereign RCA pilot. Pick a single asset class with enough failure history to be interesting and enough sensitivity that sovereignty actually matters. Prove the pattern in one corner of the estate before you scale it.

The governance conversation Monday's post opened gets answered one workflow at a time. RCA is the workflow where it gets answered first, because RCA is where the AI speed up is the most seductive and the data custody failure is the most invisible.

SAS Asset Management provides advanced analytics, expert asset management services and maturity assessments to help asset owners realise their value.

Download

Grab the RCM meets AI framework from the SAS-AM resources page. It names which reliability centred maintenance steps are AI suitable, which are human only, and where the handoffs sit.

Talk to us

Want a 30 minute conversation about sovereign RCA in your asset base? Book a discovery call.

Transport

Article