How accurate is RoadAI? Results from a human-vs-AI repeatability study

4 minMay 11, 2026

Tuomas Keränen

Product Manager, RoadAI

Elevated highway winding through dense green forest canopy in daylight, with scattered vehicles traveling along the road.

Article overview

In a head-to-head test against trained CVI inspectors on the same roads, RoadAI was twice as consistent as the human surveyors. RoadAI was also more repeatable than the human inspectors on every one of the seven defect types in the UKPMS scoring schedule.

Product:RoadAI
Industry:Road maintenance

It feels like there's a new AI-powered road survey tool announced every other week. The question for road authorities today is not whether AI belongs in your workflow; it's how to tell which of these AI survey tools actually delivers reliable results.

After all, the whole point of running condition surveys year after year is to track how your network is changing: which roads are deteriorating, how previous treatments are holding up, and where to put next year's budget. That picture requires a survey method that gives you the same answer for the same road, run after run. Without that repeatability, you can’t be sure that changes in the data come from the network rather than the survey.

So we measured it. We compared RoadAI against trained human inspectors to see which produced more repeatable results.

How we set up the test

The study followed the UKPMS Coarse Visual Inspection (CVI) process, the standard for manual road condition inspections used by UK local authorities. Surveyors score 20-meter sections of road for 7 defect types: wheel-track cracking, wearing-course deterioration, left- and right-edge deterioration, surface deterioration, settlement or subsidence, and transverse cracking. Each defect is scored on a four-point severity scale.

Four vehicles drove the same 40-kilometer route covering a mix of urban and rural surfaces. Two vehicles were operated by trained CVI inspectors using the same scoring methodology. The other two ran the RoadAI app on a standard smartphone mounted to the windscreen, with a regular driver at the wheel.

Diagram of four cars on a road showing 40.04 km data collection, comparing human inspectors and RoadAI, highlighting 103% repeatability.

Each vehicle produced a complete dataset in HMDIF format with 2,002 observations at 20-meter resolution over the 40.04 km length of the route. That gave us three comparisons to make:

Human vs. human, to see how consistent the human surveys were.
RoadAI vs. RoadAI, to see how consistent the RoadAI surveys were.
RoadAI vs. human, to see how well RoadAI agreed with experienced surveyors.

To compare the datasets, we used standard statistical techniques for measuring how much two observers agree on the same observations. A score of 1 means perfect agreement, while 0 means no more agreement than random chance.

Diagram showing interactions between Humans 1-2 and RoadAI 1-2, highlighting agreement scores; fair (0.25-0.37) and substantial (0.63). — RoadAI was twice as consistent as human surveyors in this study.

Finding 1: Human surveyors don't agree with each other as much as you'd expect

When we compared the two human inspectors, the agreement score was 0.31, which is classified as "fair agreement."

Both inspectors were trained to the same standard. Both surveyed the same road on the same day. And yet, one person's "moderate crack" was another person’s "severe crack." Some defects logged by one were missed entirely by the other.

This isn't a criticism of the inspectors. It's how subjective scoring works. Two experienced professionals with the same training made different judgment calls.

Finding 2: RoadAI is more repeatable than the human inspectors

The two RoadAI runs scored 0.63, which is “substantial agreement” and 103% higher than the human-vs-human score of 0.31. In plain terms, the two RoadAI runs were twice as consistent as the two human runs.

What’s more, RoadAI was more repeatable than the humans on every one of the seven defects scored: wheel track cracking, wearing course deterioration, left and right edge deteriorations, surface deterioration, settlement, and transverse cracking.

RoadAI doesn't get tired or distracted, doesn't carry a regional bias on what counts as severe versus moderate, and applies the same definitions on every kilometer on every run.

Finding 3: RoadAI agrees with experienced surveyors more than they agree with each other

The two RoadAI surveys agreed with one of the human inspectors more than the two inspectors agreed with each other.

For a public works director, this is what makes RoadAI valuable in the real world. If you flag a road for immediate resurfacing based on RoadAI's data, a trained surveyor walking the same road will see the same problems.

When you replace your CVI process with RoadAI, the data behind your maintenance decisions is consistent year after year, so trends reflect real change in the network, not differences between surveyors.

Bar chart comparing "RoadAI" and "Human" detections across various road deterioration categories with 2,002 observations. — The chart shows the level of agreement between surveys. Higher values indicate a greater level of agreement. The two RoadAI runs agreed with each other more closely than the human inspectors’ runs on every defect type.

Independent validation in three countries

RoadAI has also been comprehensively validated by national transport authorities in three countries.

PAS 2161 approval (UK). RoadAI is approved by the UK Department for Transport for surveying road conditions under PAS 2161 — the most comprehensive technical evaluation of road condition monitoring technology to date, developed with the British Standards Institution and Transport Research Laboratory.
Vejdirektoratet acceptance testing (Denmark). The Danish Road Directorate tested RoadAI against LCMS reference data and recorded a 1% variation in measurements, 99% consistency across five runs, and a 100% match in classification ranking across five test sections.
Cerema certification (France). RoadAI won the French national CIRR 2019 innovation call, with a Cerema Certificate of Achievement confirming measurement quality against the LCPC ME 38-2 M3 reference test method.

The computer vision models that passed those national assessments are the same models RoadAI uses on every customer's network.

What's left is for you to see RoadAI for yourself.

See RoadAI in action

RoadAI is not only more consistent than human inspectors, it’s also faster. Surveys that previously took weeks can be done in days.

Book a demo, and we'll walk through the features that matter for your network and answer any questions you have about workflows, integrations, and reporting.

Book a personal demo

Insights & reports

All posts

Science and innovation

Jun 2, 2026

A person in a light blue shirt looks at weather maps displayed on large monitors in a control room setting.

Power your next innovation with Xweather

From predictive insights to live data streams — integrate Xweather into your apps, platforms, and products.

Talk to sales Get started