Environment
aec-bench reads a handful of environment variables for credentials and runtime overrides, alongside a canonical on-disk project layout.
.env loading
aec-bench loads .env from the project root at CLI startup via dotenv.load_dotenv(). Anything set in the shell takes precedence; .env fills in the gaps.
# .env
ANTHROPIC_API_KEY=sk-ant-...
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=https://example.services.ai.azure.com/openai/v1/
AZURE_OPENAI_API_VERSION=2024-10-21
TOGETHER_API_KEY=...Don't commit .env — the project template adds it to .gitignore.
Provider credentials
Which variables are required depends on which models appear in your agent configs.
| Variable | Used by | Notes |
|---|---|---|
ANTHROPIC_API_KEY | Claude models | Required for any claude-* model |
AZURE_OPENAI_API_KEY | Azure-routed OpenAI | Required alongside endpoint |
AZURE_OPENAI_ENDPOINT | Azure OpenAI or Azure AI Foundry v1 | Use the resource endpoint, or the /openai/v1/ endpoint for Foundry deployments |
AZURE_OPENAI_API_VERSION | Azure OpenAI | Optional; defaults to 2024-10-21 where needed |
TOGETHER_API_KEY | Together AI | Use with together: model prefixes |
OPENAI_API_KEY | OpenAI direct | Fallback when Azure isn't configured |
AWS_REGION / AWS_DEFAULT_REGION | Bedrock through SDKs | Region selector |
AWS_BEARER_TOKEN / AWS_BEARER_TOKEN_BEDROCK | Bedrock script-style provider | Used by script-style Bedrock agent runners |
AWS_BEDROCK_ENDPOINT | Bedrock script-style provider | Optional explicit Bedrock endpoint |
Model routing depends on the harness path. See Providers.
Agent runtime overrides
The runtime that actually invokes an agent inside the container reads a few env vars, mostly for script-style and RLM adapters where arguments are passed through the environment rather than on a Python call:
| Variable | Purpose | Default |
|---|---|---|
AGENT_MODEL | Model name override | — |
AGENT_INSTRUCTION | Task instruction (if not piped through files) | — |
AGENT_MAX_TOKENS | Max output tokens | 16384 |
AGENT_MAX_TURNS | Max turns in a multi-turn loop | 10 |
AGENT_COMMAND_TIMEOUT | Per-command timeout (seconds) | 120 |
AGENT_TOOLS_JSON | JSON array of tool specs | — |
AGENT_API_VERSION | Azure API version | 2024-10-21 |
These are usually set by the harness automatically; you'd override them only for custom adapter shells.
Backend credentials
| Variable | Backend | Notes |
|---|---|---|
MODAL_TOKEN_ID, MODAL_TOKEN_SECRET | Modal | Set via modal token set |
HARBOR_ENDPOINT | Harbor | Service URL |
HARBOR_TOKEN | Harbor | Auth token |
| Prime CLI auth | Prime hosted eval/training | Managed by the prime CLI |
Backend configs reference these with $VAR expansion so secrets never land in YAML.
File layout
A project initialised with aec-bench init follows this layout:
project_root/
├── aec-bench.toml # project config
├── suite.toml # generated-suite config (optional)
├── .env # local secrets (gitignored)
│
├── tasks/ # task catalog
│ └── electrical/
│ └── voltage-drop/
│ ├── task.toml
│ ├── instruction.md
│ ├── rlm.toml (optional)
│ ├── environment/
│ │ ├── Dockerfile
│ │ └── docker-compose.yaml (optional)
│ ├── tests/
│ │ ├── test.sh # verifier entry
│ │ └── verify.py
│ └── tools/
│ └── tool_name.py
│
├── templates/ # template definitions
│ └── voltage-drop/
│ ├── params.toml
│ ├── instruction.md
│ └── engine.py
│
├── artefacts/
│ ├── ledger/ # trial records (append-only)
│ │ └── exp-20260412-001/
│ │ ├── trial-uuid.json
│ │ └── ...
│ ├── feedback/ # agent feedback / evolution artefacts
│ └── datasets/ # dataset manifests
│ └── electrical-v1/
│ └── 1.0.0/
│ └── manifest.json
│
├── jobs/ # raw trial outputs
│ └── exp-20260412-001/
│ └── trial-uuid/
│ └── workspace/
│ ├── output.jsonl
│ ├── trajectory.jsonl
│ └── logs/verifier/
│ ├── reward.json
│ └── details.json
│
├── prime-rl/ # generated Prime environments and eval outputs
│
├── seeds/ # seed task fixtures
│
└── workspaces/ # evolution workspaces (git-versioned)
└── voltage-drop-evo/
├── manifest.yaml
├── prompts/system.md
└── skills/Every path is overridable in aec-bench.toml. Commands that consume artefacts (ledger list, report leaderboard, evaluate) resolve paths through the same project config.
Global user config
Per-user path defaults live at ~/.config/aec-bench/config.json:
{
"tasks_root": "tasks",
"ledger_root": "artefacts/ledger",
"feedback_root": "artefacts/feedback",
"jobs_root": "jobs",
"datasets_root": "artefacts/datasets"
}Managed via aec-bench config view|set|reset. These are the fallbacks when a project-level setting is not specified. The project loader also has built-in defaults for source-only paths such as templates_root and seeds_root.
Precedence
When the same setting can come from multiple places, aec-bench resolves in this order (highest wins):
- CLI flag (
--backend modal) - Experiment YAML (
compute.backend) - Project config (
aec-bench.toml) - Global user config (
~/.config/aec-bench/config.json) - Built-in defaults
Environment variables bypass this ladder. They are either credentials required by provider SDKs at call time or agent-runtime overrides read by a container entry script.
Generated Prime packages and swarm run state are local artefacts. Regenerate them from source tasks, datasets, and workspaces rather than treating them as canonical source.