Templates

A template defines a parameterised engineering problem family that generates concrete task instances through parameter sampling and Jinja2 rendering.

How templates work

A template is a reusable engineering problem family. It samples realistic parameters, renders an instruction, computes ground truth, and scaffolds complete task instances that can be validated, frozen into datasets, and run through the same benchmark pipeline as hand-authored tasks.

Template

params.toml

instruction.md

engine.py

Generate

instance-001

instance-002

instance-003

Template

params.toml

instruction.md

engine.py

Generate

instance-001

instance-002

instance-003

The current library catalogue contains 184 built templates across five disciplines, plus proposed seed tasks that have not yet been converted into deterministic templates.

Discipline	Built templates	Proposed seeds	Typical coverage
Civil	57	30	Hydrology, hydraulics, transport geometry, coastal, drainage, wind and load derivations
Electrical	52	92	Cable sizing, PV, grounding, arc flash, busbar, thermal rating, short-circuit
Ground	10	3	Bearing capacity, settlement, CPT/SPT interpretation, slope and retaining-wall checks
Mechanical	50	92	HVAC, pumps, fire services, process calculations, acoustics, vibration, wastewater
Structural	15	67	Marine, concrete, structural fire, load combinations, movement and connection checks

The catalogue used by the site is generated by aec-bench library export; see Library Catalogue.

Template anatomy

Each built-in template is a directory under src/aec_bench/templates/builtin/<discipline>/<template>/:

src/aec_bench/templates/builtin/electrical/voltage_drop/
├── params.toml       # metadata, parameters, archetypes, difficulty presets
├── instruction.md    # Jinja2 template for the problem statement
├── engine.py         # pure ground-truth computation
└── __init__.py

The three required files are the contract:

File	Role
`params.toml`	Declares metadata, inputs, sampling ranges, archetypes, outputs, tolerances, and difficulty presets
`instruction.md`	Renders the task prompt from sampled parameters and visibility rules
`engine.py`	Computes expected outputs from sampled parameters for verifier and fixture generation

`params.toml`

params.toml is the public contract for a template:

params.toml

[meta]
name = "voltage-drop"
description = "Cable voltage drop calculation per AS/NZS 3008.1.1"
discipline = "electrical"
category = "cable-sizing"
standards = ["AS/NZS 3008.1.1"]
tool_mode = "with-tool"

[params.cable_size_mm2]
type = "enum"
unit = "mm²"
description = "Cable conductor cross-sectional area"
values = ["1.5", "2.5", "4", "6", "10", "16", "25", "35", "50", "70", "95", "120", "150", "185", "240"]

[params.length_m]
type = "float"
unit = "m"
description = "Cable route length (one way)"
min = 1
max = 500

[params.load_current_a]
type = "float"
unit = "A"
description = "Design load current"
min = 0.5
max = 500

[params.power_factor]
type = "float"
description = "Load power factor"
min = 0.5
max = 1.0
default = 0.8

[params.conductor_material]
type = "enum"
description = "Conductor material"
values = ["copper", "aluminium"]
derivable_from = "archetype"

[params.circuit_type]
type = "enum"
description = "Circuit type"
values = ["single_phase", "three_phase"]

Supported parameter types include float, int, and enum. Templates can also use archetype-derived values so generated cases remain realistic rather than random-but-implausible.

Archetypes

Archetypes bundle values that should move together:

params.toml

[archetypes.sydney_suburban_lighting]
description = "Suburban lighting circuit with moderate route length"
site_contexts = ["sydney-suburban", "melbourne-suburban"]
length_m = { min = 5, max = 30 }
load_current_a = { min = 1, max = 10 }

This matters in AEC tasks because input independence often creates nonsense. A geotechnical soil, hydraulic duty point, cable route, or structural load case usually has correlated values.

Difficulty presets

Difficulty controls which archetypes can be sampled and how much information is visible:

params.toml

[difficulty.easy]
description = "All calculation inputs are visible"
visibility = "all_given"
archetypes = ["residential_lighting", "residential_power"]

[difficulty.hard]
description = "Some inputs must be inferred from the scenario"
visibility = "partial"
archetypes = ["commercial_submain", "industrial_feeder"]
hidden_params = ["conductor_material"]
replacement_text = "Use the stated project context to select a suitable conductor material."

The built-in convention is:

Difficulty	Expected shape
`easy`	Direct calculation with all or nearly all values visible
`medium`	More steps, more distractors, or a modest inference
`hard`	Wider context, hidden values, richer unit handling, or more opportunities for wrong assumptions

`instruction.md`

Instructions are Jinja2 templates:

instruction.md

## Given

| Parameter | Value | Unit |
|-----------|-------|------|
| Cable size | {{ cable_size_mm2 }} | mm² |
| Cable route length | {{ length_m }} | m |
| Design load current | {{ load_current_a }} | A |
| Power factor | {{ power_factor }} | - |
{% if conductor_material is defined %}
| Conductor material | {{ conductor_material }} | - |
{% endif %}
| Circuit type | {{ circuit_type }} | - |

## Required

Calculate the voltage drop percentage and state whether it is within the allowable limit.

Difficulty visibility decides which variables are rendered into the prompt. The hidden values still exist for the engine and verifier.

`engine.py`

The engine is intentionally small and deterministic:

engine.py

def compute(
    cable_size_mm2: str,
    length_m: float,
    load_current_a: float,
    power_factor: float,
    conductor_material: str = "copper",
    circuit_type: str = "single_phase",
) -> dict[str, float]:
    vc_mv_per_a_m = ...
    voltage_drop_v = ...
    voltage_drop_pct = ...
    return {
        "vc_mv_per_a_m": vc_mv_per_a_m,
        "voltage_drop_v": voltage_drop_v,
        "voltage_drop_percent": voltage_drop_pct,
        "compliant": 1.0 if voltage_drop_pct <= 5.0 else 0.0,
    }

Good engines are pure functions over sampled parameters. They should not call model APIs, inspect the generated prompt, depend on local machine state, or rely on unstated intent or prose-only judgement.

Generating instances

Generate concrete tasks from a built-in template:

uv run aec-bench generate task voltage-drop \
  --instances 5 \
  --difficulty easy,medium \
  --seed 42 \
  --output tasks/generated

List and filter the catalogue:

uv run aec-bench generate list-templates
uv run aec-bench generate list-templates --discipline structural

Validate a custom template before using it:

uv run aec-bench generate validate-template ./my-template

Generate a configured suite:

uv run aec-bench generate suite --config suite.toml --dry-run
uv run aec-bench generate suite --config suite.toml

Each generated instance records its template name, sampled values, difficulty, and seed so the task can be reproduced.

Built-in template scope

The built-in templates are strongest when the engineering contract is explicit:

Deterministic calculations with numeric or categorical inputs
Stable formulae, embedded lookup tables, or clearly bounded reductions of a design method
Outputs with concrete tolerances
Verifiers that can score mechanically without relying on unstated intent or prose-only judgement

Tasks are deferred rather than templated when they depend on open-ended document review, hidden standards tables, iterative solvers without a reduced contract, or broad design judgement that has not been made explicit.

Writing your own

Start with the smallest useful deterministic contract:

Define the expected inputs and outputs in params.toml.
Add realistic archetypes so sampled values make sense together.
Render a clear instruction.md with difficulty-aware visibility.
Implement compute() in engine.py.
Run uv run aec-bench generate validate-template ./my-template.
Generate easy, medium, and hard instances and validate the generated task directories.

Use templates for reproducible benchmark families. Use hand-authored tasks for bespoke workflows that are not yet reducible to a stable generation contract.

On this page