aec-benchaec-bench

Environment

aec-bench reads a handful of environment variables for credentials and runtime overrides, alongside a canonical on-disk project layout.

.env loading

aec-bench loads .env from the project root at CLI startup via dotenv.load_dotenv(). Anything set in the shell takes precedence; .env fills in the gaps.

# .env
ANTHROPIC_API_KEY=sk-ant-...
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=https://example.services.ai.azure.com/openai/v1/
AZURE_OPENAI_API_VERSION=2024-10-21
TOGETHER_API_KEY=...

Don't commit .env — the project template adds it to .gitignore.

Provider credentials

Which variables are required depends on which models appear in your agent configs.

VariableUsed byNotes
ANTHROPIC_API_KEYClaude modelsRequired for any claude-* model
AZURE_OPENAI_API_KEYAzure-routed OpenAIRequired alongside endpoint
AZURE_OPENAI_ENDPOINTAzure OpenAI or Azure AI Foundry v1Use the resource endpoint, or the /openai/v1/ endpoint for Foundry deployments
AZURE_OPENAI_API_VERSIONAzure OpenAIOptional; defaults to 2024-10-21 where needed
TOGETHER_API_KEYTogether AIUse with together: model prefixes
OPENAI_API_KEYOpenAI directFallback when Azure isn't configured
AWS_REGION / AWS_DEFAULT_REGIONBedrock through SDKsRegion selector
AWS_BEARER_TOKEN / AWS_BEARER_TOKEN_BEDROCKBedrock script-style providerUsed by script-style Bedrock agent runners
AWS_BEDROCK_ENDPOINTBedrock script-style providerOptional explicit Bedrock endpoint

Model routing depends on the harness path. See Providers.

Agent runtime overrides

The runtime that actually invokes an agent inside the container reads a few env vars, mostly for script-style and RLM adapters where arguments are passed through the environment rather than on a Python call:

VariablePurposeDefault
AGENT_MODELModel name override
AGENT_INSTRUCTIONTask instruction (if not piped through files)
AGENT_MAX_TOKENSMax output tokens16384
AGENT_MAX_TURNSMax turns in a multi-turn loop10
AGENT_COMMAND_TIMEOUTPer-command timeout (seconds)120
AGENT_TOOLS_JSONJSON array of tool specs
AGENT_API_VERSIONAzure API version2024-10-21

These are usually set by the harness automatically; you'd override them only for custom adapter shells.

Backend credentials

VariableBackendNotes
MODAL_TOKEN_ID, MODAL_TOKEN_SECRETModalSet via modal token set
HARBOR_ENDPOINTHarborService URL
HARBOR_TOKENHarborAuth token
Prime CLI authPrime hosted eval/trainingManaged by the prime CLI

Backend configs reference these with $VAR expansion so secrets never land in YAML.

File layout

A project initialised with aec-bench init follows this layout:

project_root/
├── aec-bench.toml              # project config
├── suite.toml                  # generated-suite config (optional)
├── .env                        # local secrets (gitignored)

├── tasks/                      # task catalog
│   └── electrical/
│       └── voltage-drop/
│           ├── task.toml
│           ├── instruction.md
│           ├── rlm.toml                  (optional)
│           ├── environment/
│           │   ├── Dockerfile
│           │   └── docker-compose.yaml   (optional)
│           ├── tests/
│           │   ├── test.sh               # verifier entry
│           │   └── verify.py
│           └── tools/
│               └── tool_name.py

├── templates/                  # template definitions
│   └── voltage-drop/
│       ├── params.toml
│       ├── instruction.md
│       └── engine.py

├── artefacts/
│   ├── ledger/                 # trial records (append-only)
│   │   └── exp-20260412-001/
│   │       ├── trial-uuid.json
│   │       └── ...
│   ├── feedback/               # agent feedback / evolution artefacts
│   └── datasets/               # dataset manifests
│       └── electrical-v1/
│           └── 1.0.0/
│               └── manifest.json

├── jobs/                       # raw trial outputs
│   └── exp-20260412-001/
│       └── trial-uuid/
│           └── workspace/
│               ├── output.jsonl
│               ├── trajectory.jsonl
│               └── logs/verifier/
│                   ├── reward.json
│                   └── details.json

├── prime-rl/                   # generated Prime environments and eval outputs

├── seeds/                      # seed task fixtures

└── workspaces/                 # evolution workspaces (git-versioned)
    └── voltage-drop-evo/
        ├── manifest.yaml
        ├── prompts/system.md
        └── skills/

Every path is overridable in aec-bench.toml. Commands that consume artefacts (ledger list, report leaderboard, evaluate) resolve paths through the same project config.

Global user config

Per-user path defaults live at ~/.config/aec-bench/config.json:

{
  "tasks_root": "tasks",
  "ledger_root": "artefacts/ledger",
  "feedback_root": "artefacts/feedback",
  "jobs_root": "jobs",
  "datasets_root": "artefacts/datasets"
}

Managed via aec-bench config view|set|reset. These are the fallbacks when a project-level setting is not specified. The project loader also has built-in defaults for source-only paths such as templates_root and seeds_root.

Precedence

When the same setting can come from multiple places, aec-bench resolves in this order (highest wins):

  1. CLI flag (--backend modal)
  2. Experiment YAML (compute.backend)
  3. Project config (aec-bench.toml)
  4. Global user config (~/.config/aec-bench/config.json)
  5. Built-in defaults

Environment variables bypass this ladder. They are either credentials required by provider SDKs at call time or agent-runtime overrides read by a container entry script.

Generated Prime packages and swarm run state are local artefacts. Regenerate them from source tasks, datasets, and workspaces rather than treating them as canonical source.

On this page