Environment

aec-bench reads a handful of environment variables for credentials and runtime overrides, alongside a canonical on-disk project layout.

`.env` loading

aec-bench loads .env from the project root at CLI startup via dotenv.load_dotenv(). Anything set in the shell takes precedence; .env fills in the gaps.

# .env
ANTHROPIC_API_KEY=sk-ant-...
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=https://example.services.ai.azure.com/openai/v1/
AZURE_OPENAI_API_VERSION=2024-10-21
TOGETHER_API_KEY=...

Don't commit .env — the project template adds it to .gitignore.

Provider credentials

Which variables are required depends on which models appear in your agent configs.

Variable	Used by	Notes
`ANTHROPIC_API_KEY`	Claude models	Required for any `claude-*` model
`AZURE_OPENAI_API_KEY`	Azure-routed OpenAI	Required alongside endpoint
`AZURE_OPENAI_ENDPOINT`	Azure OpenAI or Azure AI Foundry v1	Use the resource endpoint, or the `/openai/v1/` endpoint for Foundry deployments
`AZURE_OPENAI_API_VERSION`	Azure OpenAI	Optional; defaults to `2024-10-21` where needed
`TOGETHER_API_KEY`	Together AI	Use with `together:` model prefixes
`OPENAI_API_KEY`	OpenAI direct	Fallback when Azure isn't configured
`AWS_REGION` / `AWS_DEFAULT_REGION`	Bedrock through SDKs	Region selector
`AWS_BEARER_TOKEN` / `AWS_BEARER_TOKEN_BEDROCK`	Bedrock script-style provider	Used by script-style Bedrock agent runners
`AWS_BEDROCK_ENDPOINT`	Bedrock script-style provider	Optional explicit Bedrock endpoint

Model routing depends on the harness path. See Providers.

The runtime that actually invokes an agent inside the container reads a few env vars, mostly for script-style and RLM adapters where arguments are passed through the environment rather than on a Python call:

Variable	Purpose	Default
`AGENT_MODEL`	Model name override	—
`AGENT_INSTRUCTION`	Task instruction (if not piped through files)	—
`AGENT_MAX_TOKENS`	Max output tokens	`16384`
`AGENT_MAX_TURNS`	Max turns in a multi-turn loop	`10`
`AGENT_COMMAND_TIMEOUT`	Per-command timeout (seconds)	`120`
`AGENT_TOOLS_JSON`	JSON array of tool specs	—
`AGENT_API_VERSION`	Azure API version	`2024-10-21`

These are usually set by the harness automatically; you'd override them only for custom adapter shells.

Backend credentials

Variable	Backend	Notes
`MODAL_TOKEN_ID`, `MODAL_TOKEN_SECRET`	Modal	Set via `modal token set`
`HARBOR_ENDPOINT`	Harbor	Service URL
`HARBOR_TOKEN`	Harbor	Auth token
Prime CLI auth	Prime hosted eval/training	Managed by the `prime` CLI

Backend configs reference these with $VAR expansion so secrets never land in YAML.

File layout

A project initialised with aec-bench init follows this layout:

project_root/
├── aec-bench.toml              # project config
├── suite.toml                  # generated-suite config (optional)
├── .env                        # local secrets (gitignored)
│
├── tasks/                      # task catalog
│   └── electrical/
│       └── voltage-drop/
│           ├── task.toml
│           ├── instruction.md
│           ├── rlm.toml                  (optional)
│           ├── environment/
│           │   ├── Dockerfile
│           │   └── docker-compose.yaml   (optional)
│           ├── tests/
│           │   ├── test.sh               # verifier entry
│           │   └── verify.py
│           └── tools/
│               └── tool_name.py
│
├── templates/                  # template definitions
│   └── voltage-drop/
│       ├── params.toml
│       ├── instruction.md
│       └── engine.py
│
├── artefacts/
│   ├── ledger/                 # trial records (append-only)
│   │   └── exp-20260412-001/
│   │       ├── trial-uuid.json
│   │       └── ...
│   ├── feedback/               # agent feedback / evolution artefacts
│   └── datasets/               # dataset manifests
│       └── electrical-v1/
│           └── 1.0.0/
│               └── manifest.json
│
├── jobs/                       # raw trial outputs
│   └── exp-20260412-001/
│       └── trial-uuid/
│           └── workspace/
│               ├── output.jsonl
│               ├── trajectory.jsonl
│               └── logs/verifier/
│                   ├── reward.json
│                   └── details.json
│
├── prime-rl/                   # generated Prime environments and eval outputs
│
├── seeds/                      # seed task fixtures
│
└── workspaces/                 # evolution workspaces (git-versioned)
    └── voltage-drop-evo/
        ├── manifest.yaml
        ├── prompts/system.md
        └── skills/

Every path is overridable in aec-bench.toml. Commands that consume artefacts (ledger list, report leaderboard, evaluate) resolve paths through the same project config.

Global user config

Per-user path defaults live at ~/.config/aec-bench/config.json:

{
  "tasks_root": "tasks",
  "ledger_root": "artefacts/ledger",
  "feedback_root": "artefacts/feedback",
  "jobs_root": "jobs",
  "datasets_root": "artefacts/datasets"
}

Managed via aec-bench config view|set|reset. These are the fallbacks when a project-level setting is not specified. The project loader also has built-in defaults for source-only paths such as templates_root and seeds_root.

Precedence

When the same setting can come from multiple places, aec-bench resolves in this order (highest wins):

CLI flag (--backend modal)
Experiment YAML (compute.backend)
Project config (aec-bench.toml)
Global user config (~/.config/aec-bench/config.json)
Built-in defaults

Environment variables bypass this ladder. They are either credentials required by provider SDKs at call time or agent-runtime overrides read by a container entry script.

Generated Prime packages and swarm run state are local artefacts. Regenerate them from source tasks, datasets, and workspaces rather than treating them as canonical source.

`.env` loading

Provider credentials

Agent runtime overrides

Backend credentials

File layout

Global user config

Precedence

On this page

Environment

.env loading

Provider credentials

Agent runtime overrides

Backend credentials

File layout

Global user config

Precedence

On this page

`.env` loading