Introduction

aec-bench is an open-source platform for benchmarking AI agents on Architecture, Engineering and Construction tasks.

Why aec-bench?

AEC work mixes deterministic calculation, document interpretation, tool use, and engineering judgement. General model benchmarks do not tell you whether an agent can follow an engineering brief, use the right formula, inspect source material, run a verifier, and leave an auditable trace.

aec-bench provides:

A public catalogue of built templates and proposed seed tasks across civil, electrical, ground, mechanical, and structural engineering
Agent harnesses for direct answers, tool loops, RLM, and Lambda-RLM workflows
Automated scoring through task-local verifiers and structured reward contracts
Versioned datasets so comparable runs are anchored to immutable task snapshots
Prime Lab export for local eval, hosted eval, adapter eval, and hosted training
Evolution and swarm workflows for improving agent workspaces against real benchmark failures

Disciplines

Discipline	Example coverage
Civil	Hydrology, hydraulics, drainage, roads, coastal, wind and load derivations
Electrical	Cable sizing, PV, grounding, arc flash, thermal rating, short-circuit
Ground	Bearing capacity, settlement, CPT/SPT interpretation, slope and retaining-wall checks
Mechanical	HVAC, pumps, fire services, process calculations, acoustics, wastewater
Structural	Marine, concrete, structural fire, load combinations, movement and connection checks

Mental model

The core loop is:

Define or generate tasks.
Freeze a dataset when you need comparability.
Run an agent harness against the selected tasks.
Verify outputs and write trial records.
Evaluate, report, and inspect traces.

Prime export, evolution, and swarm runs all build on the same task, trial, verifier, and trace records.

Next Steps

Quickstart — Generate and run a first task
Templates — Understand the built-in template catalogue
Agent Harnesses — Choose an execution strategy
Prime Lab — Export tasks for hosted eval and training

Why aec-bench?

Disciplines

Mental model

Next Steps

On this page