SECTION 01 / HERO

How capable is AI at real engineering?

aec-bench measures AI performance across 500+ tasks in architecture, engineering and construction — cable sizing, seismic design, hydraulic modelling, HVAC, geotech. Real problems, real standards, automated scoring.

explore_results read_the_docs browse_tasks

SECTION 02 / CURRENT STANDINGS

Current standings

dataset release · 552 tasks · 5 disciplines

~/aec-bench / leaderboard.tsv18 rows · release eval

aec-bench ~ $ bench leaderboard --top 4 --by reward › release ok

#

Model

Per-discipline

Reward

Δ last run

Tokens

Coverage

#01

Grok 4.3

other · tool_loop

0.89

+0.00

31.69M

100%

#02

Grok 4.20 Reasoning

other · tool_loop

0.87

+0.00

36.74M

100%

#03

Kimi K2.6

other · tool_loop

0.86

+0.00

88.65M

96%

#04

GPT-5.2

openai · tool_loop

0.83

+0.00

13.71M

100%

C civilE electricalG groundM mechanicalS structural

→bench leaderboard --full ↗·14 more models

SECTION 03 / REWARD × LATENCY

Reward × Latency

release results pair task performance with runtime and completion coverage

── TOP_4 ──full table ↗

#01Grok 4.30.89
#02Grok 4.20 Reasoning0.87
#03Kimi K2.60.86
#04GPT-5.20.83

── REWARD × LATENCY ──explore ↗

SECTION 04 / DISCIPLINES

Five engineering disciplines

coverage 468/468 tasks · verified against AS/NZS standards

CIV·01

Civil

Roads, drainage, hydraulics, earthworks.

57 built

+ 30 proposed

ELE·02

Electrical

Cable sizing, fault current, lighting, power.

52 built

+ 92 proposed

GND·03

Ground

Foundations, slopes, retaining walls.

10 built

+ 3 proposed

MEC·04

Mechanical

HVAC, fire protection, piping, acoustics.

50 built

+ 92 proposed

STR·05

Structural

Steel/concrete design, seismic, connections.

15 built

+ 67 proposed

SECTION 05 / HOW IT WORKS

Define → run → score

six-stage pipeline · same flow every run

01

Define task

Template + params

02

Resolve instance

Jinja render

03

Stage env

Sandbox + tools

04

Execute agent

Harness drives the model

05

Score output

Automated verifier

06

Aggregate

Ledger + report

aec-bench ~ $ uv run aec-bench run-local \
  tasks/generated/electrical/cable-sizing/voltage-drop/sydney-suburban-residential-lighting-00 \
  --model claude-sonnet-4-20250514 --harness direct
› staging temporary workspace … ok
› executing harness direct
› verifier complete · reward 0.83 · imported as experiment local
aec-bench ~ $ uv run aec-bench evaluate --experiment local --report report.html
› done. report written to report.html

→read the CLI guide

SECTION 06 / RUN IT YOURSELF

Benchmark your model against real engineering.

Open-source. Reproducible. Runs locally or against any provider.

git clone https://github.com/TheodoreGalanos/aec-bench.git

source checkout·github.com/TheodoreGalanos/aec-bench·2.4k ★

quickstart browse task library contribute a task submit your model