aec-benchaec-bench

Installation

Installation covers the Python, uv, optional Prime, backend, and provider credential requirements needed to run aec-bench locally.

System Requirements

  • Python: 3.13 or later
  • Package manager: uv
  • OS: macOS, Linux, or Windows through WSL
  • Memory: 4 GB RAM minimum, 8 GB recommended for larger generated suites

Docker, Modal, Prime, and hosted Harbor execution are optional. You only need those services when you choose the matching execution or training path.

From Source

Use the source checkout when developing templates, tasks, or library code:

git clone https://github.com/TheodoreGalanos/aec-bench.git
cd aec-bench
uv sync --extra webui --dev
uv run aec-bench --version

The console entry point is aec-bench. In a source checkout, run it through uv run so Python, dependencies, and local package code stay aligned.

Optional Extras

Web UI support is optional:

uv sync --extra webui --dev
uv run aec-bench web

Prime Lab support is optional:

uv sync --extra prime --dev
uv run aec-bench prime doctor

Provider integrations for direct Python agent calls may require the pydantic-ai extra:

uv sync --extra pydantic-ai --dev

Provider Credentials

Set credentials for the provider used by your model or endpoint alias:

ProviderEnvironment variables
AnthropicANTHROPIC_API_KEY
OpenAIOPENAI_API_KEY
Azure OpenAI or Azure AI Foundry v1AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT; optional AZURE_OPENAI_API_VERSION
Together AITOGETHER_API_KEY
AWS BedrockAWS_BEARER_TOKEN_BEDROCK, AWS_REGION or AWS_DEFAULT_REGION
Prime hosted eval/trainingauthenticated prime CLI session

When you run the CLI from a project checkout, aec-bench loads .env at startup. Existing shell variables take precedence.

For Azure AI Foundry deployments that expose the v1 OpenAI-compatible API, use the /openai/v1/ endpoint and pass the deployment name as --model. For Together AI, use an explicit together: model prefix.

Verify Installation

Use CLI help and a non-provider command first:

uv run aec-bench --help
uv run aec-bench generate list-templates --discipline ground
uv run aec-bench library export --stdout --pretty

Then run a provider-backed command once credentials are present:

uv run aec-bench run-local tasks/electrical/voltage-drop \
  --model gpt-4.1-mini \
  --harness direct

Project Layout

The source checkout uses a src/ Python layout:

src/aec_bench/          # Library source
tasks/                  # Benchmark task seeds and generated instances
seeds/                  # Expert-created seed files
agents/                 # Ready-to-use agent implementations
artefacts/              # Local generated artefacts and catalogue exports
docs/                   # Architecture and library guides
workspaces/             # Evolution workspaces

Generated artefacts, local runs, Prime packages, and evolution swarm state are intentionally local outputs. Commit curated tasks, templates, source, tests, and docs rather than transient run artefacts.

Next Steps

  • Quickstart — Generate and run a first task
  • Templates — Understand the built-in template catalogue
  • CLI Reference — See the current command surface
  • Prime Lab — Export tasks for Prime eval and training

On this page