When an AI system is assessed by the team that built it — or the vendor that sold it — the results are structurally limited. Not necessarily because of bad faith, but because of proximity. The people closest to a system are the least well-placed to evaluate it objectively. This is not a new problem; it exists in software auditing, financial accounting, and medical device evaluation. But in AI, the stakes and the complexity make it particularly consequential.
The Structural Problem with Internal Review
Internal teams know their system well — which is an asset for development and a liability for evaluation. Familiarity breeds assumptions. Metrics are chosen that play to the system's strengths. Edge cases that the team is aware of but has deprioritised tend not to surface in self-assessments. The result is not a lie, but a partial picture presented as a complete one.
Vendor Assessments Are Not Neutral
Many organisations rely on the AI vendor's own documentation — safety reports, bias evaluations, performance benchmarks. These documents are produced by organisations with a commercial interest in positive outcomes. They are not fabrications, but they are not independent. The framing, the metrics, the comparison baselines, and the omissions are all shaped by the context of their production. Procurement teams and oversight bodies are right to treat them as one input, not as the answer.
"A system that has only ever been evaluated by people who want it to succeed is not a system that has been evaluated."
What Independent Review Actually Involves
Genuine independent review means assessment conducted by people with no material interest in a particular outcome, using frameworks and criteria established before the review begins. In practice, this typically involves:
- Selecting evaluation criteria drawn from established frameworks — SC42 trustworthiness standards, MLCommons safety benchmarks, or domain-specific requirements
- Reviewing documentation, architecture, training data provenance, and deployment context
- Testing against defined scenarios, including adversarial and edge cases
- Producing a written opinion that distinguishes findings from recommendations
The output is not a pass/fail certificate. It is a structured, documented perspective that an organisation — and its stakeholders — can reason about.
When Independent Review Is Most Valuable
Independent review is particularly valuable at three points: before deployment, where it can catch issues before they affect users; after a significant incident, where an independent view of what happened and why is essential for accountability; and as part of ongoing governance, where periodic review provides an external check on a system's behaviour over time.
Regulators and procurement frameworks are increasingly expecting it. The EU AI Act's conformity assessment requirements for high-risk AI systems effectively mandate something in this direction. Being ahead of that expectation — rather than scrambling to meet it — is a strategic advantage.
The Value of the Second Opinion
Organisations sometimes resist independent review because they fear what it might find. That is understandable, but inverted. Finding issues in a structured review, with time to address them, is substantially better than finding them in production. The value of independent review is not confirmation — it is credible assurance, and the knowledge that your system has been stress-tested by people with no reason to be gentle.
Independence Check
How robust is your current AI review process? Answer 6 questions to get a score.
Looking for an independent assessment of your AI system?
We provide structured, independent review of AI systems — grounded in established frameworks and delivered with no stake in a particular outcome.
Contact us