aec-bench Documentation
aec-bench is an open-source platform for benchmarking AI agents on Architecture, Engineering and Construction tasks.
aec-bench is a Python platform for creating, versioning, running, evaluating, and improving Architecture, Engineering and Construction benchmark tasks for AI agents.
Use these docs to move from a first local run to maintaining reproducible benchmark datasets, agent harnesses, evaluation traces, and leaderboard-ready results.
Recommended paths
- New to aec-bench: start with Quickstart.
- Writing benchmark content: read Tasks, Templates, and Datasets.
- Running agents: read Harnesses, Configuration, and Environment.
- Reviewing results: read Scoring, Traces, and Classification.
- Publishing or integrating: read Library Catalogue and Prime Lab.