Configuration
Agent configuration covers provider credentials, experiment agent entries, harness parameters, and custom internal adapter registration.
Providers
Provider handling depends on the execution path. Agent harnesses backed by PydanticAI infer provider routing from the model string and available credentials. Script-style agents can also build sandbox environment variables from an explicit provider name.
| Runtime path | Required env vars |
|---|---|
| Anthropic API | ANTHROPIC_API_KEY |
| Azure OpenAI or Azure AI Foundry v1 | AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT; optional AZURE_OPENAI_API_VERSION |
| Bedrock through PydanticAI | AWS_REGION or AWS_DEFAULT_REGION, plus AWS credentials available to the process |
| Bedrock script-style provider | AWS_BEDROCK_ENDPOINT, AWS_BEARER_TOKEN or AWS_BEARER_TOKEN_BEDROCK, AWS_REGION or AWS_DEFAULT_REGION |
| OpenAI script-style provider | OPENAI_API_KEY |
| Together AI | TOGETHER_API_KEY |
# .env
ANTHROPIC_API_KEY=sk-ant-...
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=https://example.services.ai.azure.com/openai/v1/
OPENAI_API_KEY=sk-...
TOGETHER_API_KEY=...
AWS_REGION=us-west-2Keep credentials out of config files. Agent definitions should reference environment variables or rely on the provider SDK's normal credential chain.
For Azure AI Foundry deployments that expose the v1 OpenAI-compatible API, set AZURE_OPENAI_ENDPOINT to the /openai/v1/ endpoint and pass the deployment name as model. For Together AI, prefix the model with together: so routing stays explicit when multiple provider credentials are present:
uv run aec-bench run-local tasks/electrical/voltage-drop \
--model "together:Qwen/Qwen3.7-Max" \
--harness directAgent definitions
An agent entry in an experiment manifest picks the public harness name, model, and optional parameters:
# experiment.yaml
agents:
- name: claude-sonnet-tool-loop
harness: tool_loop
model: claude-sonnet-4-20250514
parameters:
max_turns: 12
- name: gpt4-direct
harness: direct
model: gpt-4.1
parameters:
max_tokens: 8192The manifest parser also accepts the older adapter field, but public docs use harness. Supplying both is an error.
The model field supports $ENV_VAR references so pinned model IDs stay out of config:
agents:
- name: pinned
harness: tool_loop
model: $ANTHROPIC_MODEL # resolved at run timeHarness parameters
Each harness takes different knobs. In experiment manifests, parameters are passed to the execution layer and become request.configuration for the internal adapter.
Direct: simple generation settings such as output token budget:
max_tokens = 16384Tool Loop: bounded turn count:
max_turns = 8RLM: workspace-level rlm.toml, grouped into guardrails and execution:
# rlm.toml
[guardrails]
token_budget = 100_000
max_iterations = 20
max_subcall_depth = 3
max_budget_usd = 5.00
[execution]
scaffolding = true
context_limit = 1_000_000
compaction_threshold_pct = 0.85
max_parallel_workers = 4Lambda-RLM: workspace-level lambda-rlm.toml, with template, planner, review, guardrails, and execution settings:
# lambda-rlm.toml
[template]
tier = "dependency_tree"
definition = "report_template.toml"
[planner]
context_window_chars = 200_000
max_branching_factor = 4
[review]
enabled = true
max_retries_per_source = 1
max_supplements_per_section = 1
[guardrails]
token_budget = 500_000
[execution]
max_parallel_workers = 4RLM and Lambda-RLM configuration files live in the staged task workspace. The experiment manifest selects the harness and model; the workspace TOML controls the harness-specific runtime behaviour.
Custom Internal Adapters
Public documentation calls these agent harnesses, but the Python execution protocol is still named Adapter. Any class matching that protocol can be registered:
from aec_bench.adapters.base import AdapterRequest, AdapterResult
class MyAdapter:
def __init__(self, model_name: str, workspace: str, **kwargs):
self.model_name = model_name
self.workspace = workspace
def execute(self, request: AdapterRequest) -> AdapterResult:
# your strategy here
...
def adapter_name(self) -> str:
return "my_adapter"
def resolved_model(self) -> str:
return self.model_nameRegister against a LocalAdapterRegistry with a builder function:
from aec_bench.adapters.local_registry import LocalAdapterRegistry
registry = LocalAdapterRegistry()
registry.register(
"my_adapter",
lambda model_name, workspace, **kwargs: MyAdapter(model_name, workspace, **kwargs),
)Once registered, the adapter kind is addressable from an experiment config through the public harness field:
agents:
- name: my-experimental-agent
harness: my_adapter
model: claude-sonnet-4What a good adapter does
- Respects the whitelist: only call tools present in
request.tools. - Writes to
request.output_path: producing that file is the adapter's job. - Records a transcript: capture model turns, tool calls, and tool results.
- Reports token usage: populate
usage_input_tokensandusage_output_tokenswhen available. - Classifies failures: set
failure_kindso downstream reports can group errors.
The contract is deliberately thin. Keeping it thin is what lets the same task compare cleanly across many harnesses.