aec-bench Documentation

aec-bench is an open-source platform for benchmarking AI agents on Architecture, Engineering and Construction tasks.

aec-bench is a Python platform for creating, versioning, running, evaluating, and improving Architecture, Engineering and Construction benchmark tasks for AI agents.

Use these docs to move from a first local run to maintaining reproducible benchmark datasets, agent harnesses, evaluation traces, and leaderboard-ready results.

Recommended paths

New to aec-bench: start with Quickstart.
Writing benchmark content: read Tasks, Templates, and Datasets.
Running agents: read Harnesses, Configuration, and Environment.
Reviewing results: read Scoring, Traces, and Classification.
Publishing or integrating: read Library Catalogue and Prime Lab.

aec-bench Documentation

Recommended paths

On this page