docs: add 0.2.1 release note (load-time optimisations + fix)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
perf(load): flatten step list in one pass; fix nested-list duplication
2026-05-31 15:33:13 +02:00 · 2026-05-31 14:40:46 +02:00 · 2026-05-31 11:22:26 +02:00 · 2026-05-31 10:41:42 +02:00 · 2026-05-31 10:41:42 +02:00 · 2026-05-31 10:17:54 +02:00
16 changed files with 686 additions and 258 deletions
--- a/release_note.txt
+++ b/release_note.txt
@@ -1,3 +1,16 @@
+version 0.2.1
+==============
+- Faster test loading, especially for large tests built from jinja
+  templates and ``!include``: compiled jinja templates are cached and
+  reused (a file included many times is compiled once), rendering happens
+  in memory instead of through a temporary file, and YAML is parsed with
+  the libyaml C loader when available. Typical load time is 3-6x lower on
+  include / template-heavy tests; behaviour is unchanged.
+- Fix: a nested list holding more than one step under ``steps`` no longer
+  duplicates its entries while the step tree is built.
+- New load-time benchmark under ``test/benchmark/`` (synthetic-tree
+  generator + in-process timing harness) to measure the load pipeline.
+
 version 0.2
 ==============
 - Test items: each item type now declares its accepted parameters
--- a/src/testium/interpreter/test_set.py
+++ b/src/testium/interpreter/test_set.py
@@ -29,6 +29,51 @@ def _build_item_path(item) -> str:
    return " > ".join(reversed(parts))


+def _flatten_actions(actions, out, parent_seq_name):
+    """Expand nested lists and included ``sequence`` entries into ``out`` as a
+    flat list of single test-item dicts, propagating each sequence's source
+    filename onto its items.
+
+    Replaces the previous approach, which spliced each entry into the step
+    list and rebuilt the whole list every time (O(n^2) over the step list, and
+    a rebuild that duplicated entries when a nested list held more than one
+    element). This single forward pass is linear.
+    """
+    for idx, action in enumerate(actions):
+        # a bare list raises its elements to the same level
+        if isinstance(action, (list, tuple)):
+            _flatten_actions(action, out, parent_seq_name)
+            continue
+        # a NoneType (e.g. pointing at an unused alias) contributes nothing
+        if action is None:
+            continue
+        # a 'sequence' (an included file) is spliced in, with its filename
+        # propagated onto each of its items
+        if isinstance(action, dict) and "sequence" in action:
+            sequence = action["sequence"]["data"]
+            f = action["sequence"]["filename"]
+            if isinstance(sequence, dict):
+                sequence = [{k: v} for k, v in sequence.items()]
+            # Case of an empty sequence
+            elif sequence is None:
+                tm.print_info(
+                    f"An empty sequence is loaded in '{parent_seq_name}'."
+                )
+                sequence = []
+            elif not isinstance(sequence, list):
+                raise ETUMSyntaxError(
+                    f"Syntax error in '{parent_seq_name}' step number {idx+1}. Sequence definition: '{str(action)}'",
+                    f
+                )
+            for s in sequence:
+                if isinstance(s, dict) and s:
+                    s[list(s.keys())[0]]["seq_filename"] = f
+            _flatten_actions(sequence, out, parent_seq_name)
+            continue
+
+        out.append(action)
+
+
 class TestSet:
    def __init__(
        self,
@@ -434,56 +479,16 @@ class TestSet:
                f"No valid list of actions in sequence {parent_seq_name}",
                file_name
            )
-        # first we merged to the same level 'sequence dict entries and list within the list
-        counter = 0
        test_dir = tm.gd("test_directory")
-        la = len(parent_seq_actions)
-        while counter < la:
-            action = parent_seq_actions[counter]
-            # if action is a list raise up to the the same level,
-            # ie insert action element into the parent_seq_actions
-            if isinstance(action, (list, tuple)):
-                parent_seq_actions[counter : counter + 1] = action
-                parent_seq_actions = (
-                    parent_seq_actions[:counter]
-                    + action
-                    + parent_seq_actions[counter + 1 :]
-                )
-                la = len(parent_seq_actions)
-                continue
-            # if action is a NoneType skip and continue
-            # (when pointing to an unused alias for instance)
-            if action is None:
-                counter += 1
-                continue
-            # if action is a sequence we insert its entry into the action list
-            if "sequence" in action:
-                sequence = action["sequence"]["data"]
-                f = action["sequence"]["filename"]
-                if isinstance(sequence, dict):
-                    sequence = [{k: v} for k, v in sequence.items()]
-                # Case of an empty sequence
-                elif sequence is None:
-                    tm.print_info(
-                        f"An empty sequence is loaded in '{parent_seq_name}'."
-                    )
-                    sequence = []
-                elif not isinstance(sequence, list):
-                    raise ETUMSyntaxError(
-                        f"Syntax error in '{parent_seq_name}' step number {counter+1}. Sequence definition: '{str(action)}'",
-                        f
-                    )
-                for s in sequence:
-                    s[list(s.keys())[0]]["seq_filename"] = f
-                parent_seq_actions = (
-                    parent_seq_actions[:counter]
-                    + sequence
-                    + parent_seq_actions[counter + 1 :]
-                )
-                la = len(parent_seq_actions)
-                continue

-            # Action is now for sure a list of dict of length 1
+        # Flatten nested lists and included 'sequence' entries to the same level
+        # in one linear pass (was an in-place splice + full list rebuild per
+        # entry: O(n^2) over the step list).
+        flat_actions = []
+        _flatten_actions(parent_seq_actions, flat_actions, parent_seq_name)
+
+        for action in flat_actions:
+            # Action is now for sure a dict of length 1
            k = list(action.keys())[0]
            if action[k].get("seq_filename", None) is None:
                action[k]["seq_filename"] = file_name
@@ -546,8 +551,6 @@ class TestSet:
                    action[k]["seq_filename"]
                )

-            counter += 1
-
        return ret

    def tree(self):
--- a/src/testium/interpreter/utils/include.py
+++ b/src/testium/interpreter/utils/include.py
@@ -6,10 +6,10 @@ from runtime.tum_except import ETUMFileError
 from interpreter.utils.template import template_to_test
 from copy import copy
 from interpreter.utils.globdict import global_dict
-from interpreter.utils.yaml_load import yaml_load
+from interpreter.utils.yaml_load import yaml_load, YAML_BASE_LOADER


-class TUMLoaderNoIncludes(yaml.Loader):
+class TUMLoaderNoIncludes(YAML_BASE_LOADER):

    def __init__(self, stream):

--- a/src/testium/interpreter/utils/template.py
+++ b/src/testium/interpreter/utils/template.py
@@ -1,33 +1,74 @@
+import io
 import os
 from sys import exc_info
-from jinja2 import Template
+from jinja2 import Environment
 from jinja2.exceptions import TemplateSyntaxError, TemplateError, UndefinedError
-from tempfile import TemporaryFile
 from interpreter.utils.yaml_load import print_yaml
 from runtime.tum_except import ETUMSyntaxError


+# One Environment reused for every render (default settings, i.e. identical
+# behaviour to jinja2.Template), plus a compiled-template cache so a file that
+# is included many times — or a test that is reloaded — is compiled only once.
+# Jinja compilation is the expensive step; render (variable substitution) stays
+# per-call. Cache is keyed on path + mtime + size so an edited file recompiles.
+_ENV = Environment()
+_template_cache = {}  # abspath -> (mtime_ns, size, compiled_template)
+
+
+class _RenderedStream(io.StringIO):
+    """A rendered template kept in memory.
+
+    Carries ``root`` (and ``name``) so the YAML loader resolves ``!include``
+    paths exactly as it did from the on-disk temp file this replaces — without
+    the write + seek + read round-trip (one temp file per included file). That
+    round-trip is pure overhead, and especially costly on slow storage.
+    """
+
+
+def _compiled_template(filename: str):
+    """Return the compiled jinja template for *filename*, reusing the cached
+    one when the file is unchanged (path + mtime + size)."""
+    key = os.path.abspath(filename)
+    try:
+        st = os.stat(filename)
+    except OSError:
+        st = None
+    if st is not None:
+        cached = _template_cache.get(key)
+        if (cached is not None
+                and cached[0] == st.st_mtime_ns
+                and cached[1] == st.st_size):
+            return cached[2]
+    with open(filename, "r") as f:
+        source = f.read()
+    template = _ENV.from_string(source)  # compile (may raise TemplateSyntaxError)
+    if st is not None:
+        _template_cache[key] = (st.st_mtime_ns, st.st_size, template)
+    return template
+
+
 def template_to_test(filename: str, params: list):
    """ Function which processes an eventual jinja2 template to a test file
    """
-    # Temporary file created to receive the processed include
-    # file
-    tmpf = TemporaryFile('w+t')
-    with open(filename, 'r') as f:
-        try:
-            j2_template = Template(f.read())
-        except TemplateError as e:
+    # Compile (cached) — a syntax error in the template surfaces here.
+    try:
+        j2_template = _compiled_template(filename)
+    except TemplateError as e:
+        with open(filename, "r") as f:
            print_yaml(f, filename)
-            type, value, tb = exc_info()
-            msg = "Template error"
-            if hasattr(value, 'lineno'):
-                msg = msg + f" on line {value.lineno}: "
-            else:
-                msg += ": "
-            raise ETUMSyntaxError(msg + str(e), filename)
+        type, value, tb = exc_info()
+        msg = "Template error"
+        if hasattr(value, 'lineno'):
+            msg = msg + f" on line {value.lineno}: "
+        else:
+            msg += ": "
+        raise ETUMSyntaxError(msg + str(e), filename)
+
+    # Render into memory (no temp file).
    try:
        params["include_directory"] = os.path.dirname(os.path.abspath(filename))
-        tmpf.write(j2_template.render(params))
+        rendered = j2_template.render(params)
    except TemplateSyntaxError as e:
        raise ETUMSyntaxError(f"""Template loading of file '{filename}' with following parameters '{str(params)}'
 Syntax error in template: {e.message}""")
@@ -42,8 +83,7 @@ Template rendering error: {e.message}""")
        raise ETUMSyntaxError(f"""Template loading of file '{filename}' with following parameters '{str(params)}'
 Unexpected error: {str(e)}""")

-    # return to begining of the temp file
-    tmpf.seek(0, os.SEEK_SET)
-    tmpf.root = os.path.dirname(filename)
-
-    return tmpf
+    stream = _RenderedStream(rendered)
+    stream.root = os.path.dirname(filename)
+    stream.name = filename
+    return stream
--- a/src/testium/interpreter/utils/test_init.py
+++ b/src/testium/interpreter/utils/test_init.py
@@ -11,7 +11,7 @@ import api.testium as tm
 import interpreter.utils.globdict as globdict
 import interpreter.utils.settings as prefs
 from interpreter.utils.paths import testium_path
-from interpreter.utils.yaml_load import yaml_load
+from interpreter.utils.yaml_load import yaml_load, YAML_BASE_LOADER
 from interpreter.utils import clear_recursively
 from runtime.tum_except import ETUMSyntaxError
 from interpreter.utils.params import expanse, eval_func_init
@@ -89,7 +89,7 @@ def locate_report_file(rep_file):
 def yamltodict(param_file, silent=True):
    # load of the file
    with open(param_file, "r") as fd:
-        dp = yaml_load(fd, param_file, yaml.Loader)
+        dp = yaml_load(fd, param_file, YAML_BASE_LOADER)

    if dp is None:
        tm.print_info(f"The YAML file '{param_file}' is empty.")
--- a/src/testium/interpreter/utils/yaml_load.py
+++ b/src/testium/interpreter/utils/yaml_load.py
@@ -1,3 +1,4 @@
+import yaml
 from yaml.parser import ParserError
 from yaml import load, Loader
 from yaml.scanner import ScannerError
@@ -5,6 +6,12 @@ from api.testium import print_debug
 from runtime.tum_except import ETUMSyntaxError
 import io

+# Use the libyaml-backed loader (much faster parsing) when PyYAML was built
+# with it, falling back to the pure-Python loader otherwise. The C loader
+# raises the same ParserError/ScannerError and supports the same custom
+# constructors (!include) and construct_* helpers the TUM loaders rely on.
+YAML_BASE_LOADER = yaml.CLoader if getattr(yaml, "__with_libyaml__", False) else yaml.Loader
+

 def print_yaml(file: io.TextIOWrapper, file_name):
    """ Prints YAML file if debug mode is activated.
@@ -21,10 +28,10 @@ def yaml_load(file, real_file_name: str, loader: Loader):
        return load(file, loader)

    except ParserError as e:
-        if isinstance(file, io.TextIOWrapper):
+        if isinstance(file, (io.TextIOWrapper, io.StringIO)):
            print_yaml(file, real_file_name)
        raise ETUMSyntaxError(f"yaml file parsing error: " + str(e), real_file_name)
    except ScannerError as e:
-        if isinstance(file, io.TextIOWrapper):
+        if isinstance(file, (io.TextIOWrapper, io.StringIO)):
            print_yaml(file, real_file_name)
        raise ETUMSyntaxError("yaml file scanning error: " + str(e), real_file_name)
--- a/test/benchmark/.gitignore
+++ b/test/benchmark/.gitignore
@@ -0,0 +1 @@
+cases/
--- a/test/benchmark/README.md
+++ b/test/benchmark/README.md
@@ -0,0 +1,116 @@
+# Load-time benchmark
+
+Measures how long *testium* takes to **load** a `.tum` test tree — template
+rendering (jinja) + YAML parsing + test-tree construction — *without* executing
+it. Purpose: get reproducible numbers before/after load-path optimisations, and
+attribute any gain to a specific part of the pipeline.
+
+It is meant for *very long* tests, the kind you can build with `jinja` loops and
+`!include`, where load time becomes noticeable.
+
+## Files
+
+| File | Role |
+|------|------|
+| `gen_bench_test.py` | Generates a synthetic `.tum` tree (the test input). |
+| `load_bench.py` | Drives the **real** loader in-process and times it. |
+| `run.sh` | Convenience: generate + time across profiles, using the project venv. |
+| `cases/` | Generated trees (git-ignored, recreated on demand). |
+
+The benchmark `.tum` files are **generated**, not committed — the generator is
+the artifact. They use only `let` leaves and `group` containers, so loading has
+no runtime side effect (no subprocess, no `<| |>` eval) and the timing reflects
+the parse/build pipeline alone.
+
+## Quick start
+
+```bash
+# default matrix (all profiles), 5 repeats each
+./test/benchmark/run.sh
+
+# one profile at one size
+./test/benchmark/run.sh repeat 2000
+
+# more repeats for a tighter min
+REPEAT=10 ./test/benchmark/run.sh includes 1000
+```
+
+`run.sh` uses the project venv at `test/tmp/.venv` (created by `./run.sh`). If it
+is missing, run `./run.sh` once first.
+
+To drive the harness directly on any `.tum` (not just generated ones):
+
+```bash
+test/tmp/.venv/bin/python3 test/benchmark/load_bench.py --repeat 5 --quiet path/to/main.tum
+```
+
+## Profiles
+
+Each profile isolates one cost. `--size` is the profile-specific count.
+
+| Profile | What it builds | Stresses |
+|---------|----------------|----------|
+| `flat` | one main file, N inline `let` steps | big YAML parse + linear object build |
+| `includes` | main `!include`s N **distinct** sub-files | per-include template+YAML+tempfile, `sequence` splice |
+| `repeat` | main `!include`s the **same** parametrised leaf N times | jinja **recompilation** of an identical template |
+| `jinja` | one main file, `{% for %}` emitting N steps | single large render + single large parse |
+| `deep` | nested includes, depth N | include recursion (see caveat) |
+| `mix` | groups + jinja loop + distinct + repeated includes | realistic blend |
+
+## Reading the output
+
+```
+phase              min      median
+initial         0.1131      0.1285   <- pass 1: discover config files (no includes)
+loadtest        1.0724      1.0900   <- config fixpoint loop + full recursive include load
+build           0.1850      0.1976   <- TestSet: load_test_recursively tree build
+total           1.3886      1.4227
+counters  (last run):
+  templates :    1003 calls   0.5247s  (exclusive: jinja compile+render+tempfile)
+  yaml      :    1004 parses   1.4696s  (inclusive of nested includes)
+```
+
+- **min** is the headline (least noisy); median is a sanity check.
+- **initial / loadtest / build** map to the three pipeline stages in
+  `interpreter/process.py` and `interpreter/test_set.py`. The main file is
+  rendered+parsed across `initial` *and* `loadtest` (the loader does ~3 passes).
+- **templates** = number of `template_to_test()` calls and their *exclusive*
+  wall time (one file render each — pure jinja compile+render+tempfile I/O).
+  A high count with the same source file = recompilation, the `repeat` case.
+- **yaml** = number of `yaml_load()` parses. Its time is *inclusive* of nested
+  includes, so use the **count** for attribution, not the seconds.
+
+## Mapping to the optimisation axes
+
+| Axis (see DESIGN / discussion) | Watch | Best profile to prove it |
+|--------------------------------|-------|--------------------------|
+| 1 — cache compiled jinja templates | `templates` time drops, count unchanged | `repeat` |
+| 2 — drop the tempfile round-trip | `templates` time drops | `includes`, `repeat`, `mix` |
+| 3 — C YAML loader (libyaml) | `yaml` time / `loadtest` drops | `flat`, `jinja` |
+| 6 — O(n²) sequence splice | `build` drops | `includes`, `mix` |
+
+## How to compare before/after a change
+
+1. Run the matrix on the current code, keep the output.
+2. Apply one axis.
+3. Re-run the **same** profiles/sizes; compare `min` per phase and the counters.
+
+Change one axis at a time so the attribution is clean. Run on an idle machine
+(and note the disk: on a USB stick the tempfile round-trip of axis 2 weighs
+more).
+
+## Caveat: deep includes
+
+The loader is recursive and spends ~10 stack frames per include level, so
+`deep` hits Python's `RecursionError` around ~90 nested levels. The harness
+reports this cleanly instead of crashing. Real tests are *wide* (many steps /
+many includes), not deep, so `includes`/`repeat`/`jinja`/`mix` are the
+representative "very long" cases.
+
+## Notes
+
+- No execution is triggered — timing stops where `Batch` would mark the test
+  *loaded*.
+- The profiles contain no `<| |>`, so the external eval process is not started.
+  Pass `--with-eval` to `load_bench.py` for trees that evaluate at load time.
+- Numbers are machine- and disk-specific; only compare runs from the same host.
--- a/test/benchmark/gen_bench_test.py
+++ b/test/benchmark/gen_bench_test.py
@@ -0,0 +1,179 @@
+#!/usr/bin/env python3
+"""Generate synthetic ``.tum`` test trees to benchmark *load* time.
+
+The generated trees are deliberately cheap to *build* (only ``let`` leaves and
+``group`` containers — no subprocess, no runtime side effect) so the load
+benchmark measures the parse / template / tree-build pipeline and nothing else.
+
+Profiles, each targeting a specific cost in the loader:
+
+  flat      one main file, N inline ``let`` steps, no include, no jinja.
+            Baseline: YAML parse of a big document + linear object build.
+
+  includes  main ``!include``s N *distinct* sub-files (a few steps each).
+            Stresses the per-include template+YAML+tempfile round-trip and the
+            ``sequence`` splice in test_set.load_test_recursively.
+
+  repeat    main ``!include``s the *same* parametrised leaf file N times.
+            Stresses jinja *recompilation*: the compiled template is identical
+            every time, only the render params (idx) differ -> the case a
+            template cache collapses.
+
+  jinja     one main file whose ``{% for %}`` loop emits N steps.
+            Stresses a single large jinja render + a single large YAML parse.
+
+  deep      nested includes, depth N (main -> d0 -> d1 -> ...).
+            Stresses include recursion and per-level template+YAML.
+
+  mix       a realistic blend: groups, a jinja loop, distinct includes and a
+            repeated parametrised include.
+
+Usage:
+    gen_bench_test.py --profile repeat --size 1000 --out cases/repeat_1000
+    -> writes <out>/main.tum (+ includes, + param.yaml) and prints the path.
+"""
+import argparse
+import os
+import shutil
+
+
+def _let(indent, i, name=None):
+    name = name if name is not None else f"s{i}"
+    pad = " " * indent
+    return (
+        f"{pad}- let:\n"
+        f"{pad}    name: {name}\n"
+        f"{pad}    values:\n"
+        f"{pad}        - k{i}: {i}\n"
+    )
+
+
+def gen_flat(out, n):
+    body = "".join(_let(8, i) for i in range(n))
+    main = f"main:\n    name: bench flat {n}\n    steps:\n{body}"
+    _write(out, "main.tum", main)
+
+
+def gen_includes(out, n):
+    steps = "".join(f"        - !include inc_{i}.tum\n" for i in range(n))
+    main = f"main:\n    name: bench includes {n}\n    steps:\n{steps}"
+    _write(out, "main.tum", main)
+    for i in range(n):
+        # each include is a YAML *sequence* (list of steps)
+        seq = "".join(_let(0, i * 3 + j, name=f"inc{i}_{j}") for j in range(3))
+        _write(out, f"inc_{i}.tum", seq)
+
+
+def gen_repeat(out, n):
+    steps = "".join(
+        f"        - !include {{file: leaf.tum, idx: {i}}}\n" for i in range(n)
+    )
+    main = f"main:\n    name: bench repeat {n}\n    steps:\n{steps}"
+    _write(out, "main.tum", main)
+    leaf = (
+        "- let:\n"
+        "    name: leaf_{{ idx }}\n"
+        "    values:\n"
+        "        - leaf_{{ idx }}: {{ idx }}\n"
+    )
+    _write(out, "leaf.tum", leaf)
+
+
+def gen_jinja(out, n):
+    main = (
+        f"main:\n    name: bench jinja {n}\n    steps:\n"
+        "{% for i in range(" + str(n) + ") %}\n"
+        "        - let:\n"
+        "            name: j{{ i }}\n"
+        "            values:\n"
+        "                - k{{ i }}: {{ i }}\n"
+        "{% endfor %}\n"
+    )
+    _write(out, "main.tum", main)
+
+
+def gen_deep(out, n):
+    main = (
+        f"main:\n    name: bench deep {n}\n    steps:\n"
+        "        - let:\n            name: top\n            values:\n                - a: 0\n"
+        "        - !include d_0.tum\n"
+    )
+    _write(out, "main.tum", main)
+    for i in range(n):
+        seq = _let(0, i, name=f"d{i}")
+        if i < n - 1:
+            seq += f"- !include d_{i + 1}.tum\n"
+        _write(out, f"d_{i}.tum", seq)
+
+
+def gen_mix(out, n):
+    # n groups, each: 2 inline lets, one distinct include, one repeated include,
+    # plus a small jinja loop. Roughly ~6*n steps.
+    per = max(1, n)
+    parts = [f"main:\n    name: bench mix {n}\n    steps:\n"]
+    for g in range(per):
+        parts.append(
+            f"        - group:\n"
+            f"            name: grp{g}\n"
+            f"            steps:\n"
+        )
+        parts.append(_let(16, g * 2, name=f"g{g}_a"))
+        parts.append(_let(16, g * 2 + 1, name=f"g{g}_b"))
+        parts.append(f"                - !include inc_{g}.tum\n")
+        parts.append(f"                - !include {{file: leaf.tum, idx: {g}}}\n")
+        parts.append(
+            "{% for i in range(3) %}\n"
+            f"                - let:\n"
+            f"                    name: g{g}_j{{{{ i }}}}\n"
+            f"                    values:\n"
+            f"                        - g{g}_k{{{{ i }}}}: {{{{ i }}}}\n"
+            "{% endfor %}\n"
+        )
+    _write(out, "main.tum", "".join(parts))
+    for g in range(per):
+        _write(out, f"inc_{g}.tum", _let(0, g, name=f"mixinc{g}"))
+    _write(
+        out,
+        "leaf.tum",
+        "- let:\n    name: mixleaf_{{ idx }}\n    values:\n        - mixleaf_{{ idx }}: {{ idx }}\n",
+    )
+
+
+PROFILES = {
+    "flat": gen_flat,
+    "includes": gen_includes,
+    "repeat": gen_repeat,
+    "jinja": gen_jinja,
+    "deep": gen_deep,
+    "mix": gen_mix,
+}
+
+
+def _write(out, name, content):
+    with open(os.path.join(out, name), "w") as f:
+        f.write(content)
+
+
+def main():
+    ap = argparse.ArgumentParser(description=__doc__,
+                                 formatter_class=argparse.RawDescriptionHelpFormatter)
+    ap.add_argument("--profile", required=True, choices=sorted(PROFILES))
+    ap.add_argument("--size", type=int, default=1000,
+                    help="profile-specific count (steps / includes / depth)")
+    ap.add_argument("--out", required=True, help="output directory (recreated)")
+    args = ap.parse_args()
+
+    out = os.path.abspath(args.out)
+    if os.path.isdir(out):
+        shutil.rmtree(out)
+    os.makedirs(out)
+
+    # minimal config file so the loader does not emit "no param file" noise
+    _write(out, "param.yaml", "bench_dummy: 1\n")
+
+    PROFILES[args.profile](out, args.size)
+    print(os.path.join(out, "main.tum"))
+
+
+if __name__ == "__main__":
+    main()
--- a/test/benchmark/load_bench.py
+++ b/test/benchmark/load_bench.py
@@ -0,0 +1,200 @@
+#!/usr/bin/env python3
+"""Time the testium *load* pipeline on a given ``.tum`` tree.
+
+It drives the real loader code (``TestProcess._load_initial_params`` /
+``_load_test`` then ``TestSet(...)``) in-process, so the numbers track the
+production path and stay honest as the code evolves. Execution is never
+triggered — we stop exactly where ``Batch`` would report the test as *loaded*.
+
+Reported per run, over ``--repeat`` iterations (min is the headline, least
+noisy):
+
+    initial   first pass: discover config files (template+YAML, no includes)
+    loadtest  config-file fixpoint loop + full recursive include/template/YAML
+    build     TestSet construction: the load_test_recursively tree build
+    total     sum of the three
+
+Plus instrumentation counters (exact call counts, wall time) for the two
+hot leaves the optimisation axes target:
+
+    templates  jinja template_to_test() calls   (axis 1 compile cache, axis 2 tempfile)
+    yaml       yaml_load() parses               (axis 3 C loader)
+
+template time is exclusive (one file render); yaml time is wall-inclusive of
+nested includes, so lean on the *counts* for attribution.
+
+Must run inside the project venv (jinja2, pyyaml, telnetlib3, ...). The
+benchmark profiles contain no ``<| |>`` so the external eval process is not
+needed; pass --with-eval to start it for faithfulness on eval-heavy trees.
+
+Usage (see run.sh for the convenience wrapper):
+    test/tmp/.venv/bin/python3 test/benchmark/load_bench.py [--repeat 5] <main.tum>
+"""
+import argparse
+import os
+import statistics
+import sys
+from queue import Queue
+from time import perf_counter
+
+# --- bootstrap: src/testium for flat imports, src for `import testium` --------
+HERE = os.path.dirname(os.path.abspath(__file__))
+ROOT = os.path.abspath(os.path.join(HERE, "..", ".."))
+sys.path.insert(0, os.path.join(ROOT, "src"))
+sys.path.insert(0, os.path.join(ROOT, "src", "testium"))
+
+import api.testium as tm  # noqa: E402
+from interpreter.utils.test_init import env_init, apply_overrides  # noqa: E402
+from interpreter.utils.test_ctrl import TestSetController  # noqa: E402
+from interpreter.process import TestProcess  # noqa: E402
+from interpreter.test_set import TestSet  # noqa: E402
+from interpreter.utils.py_eval import eval_process_init  # noqa: E402
+from interpreter.utils.api_srv import api_request  # noqa: E402
+
+# --- instrumentation: count + time the two hot leaves -------------------------
+import interpreter.process as _proc  # noqa: E402
+import interpreter.utils.include as _inc  # noqa: E402
+import interpreter.utils.test_init as _ti  # noqa: E402
+import interpreter.utils.template as _tpl  # noqa: E402
+import interpreter.utils.yaml_load as _yl  # noqa: E402
+
+_C = {"tpl_n": 0, "tpl_t": 0.0, "yaml_n": 0, "yaml_t": 0.0}
+_orig_tpl = _tpl.template_to_test
+_orig_yaml = _yl.yaml_load
+
+
+def _wrap_tpl(*a, **k):
+    t = perf_counter()
+    try:
+        return _orig_tpl(*a, **k)
+    finally:
+        _C["tpl_t"] += perf_counter() - t
+        _C["tpl_n"] += 1
+
+
+def _wrap_yaml(*a, **k):
+    t = perf_counter()
+    try:
+        return _orig_yaml(*a, **k)
+    finally:
+        _C["yaml_t"] += perf_counter() - t
+        _C["yaml_n"] += 1
+
+
+# rebind in every module that did `from ... import template_to_test / yaml_load`
+for _m in (_proc, _inc):
+    _m.template_to_test = _wrap_tpl
+for _m in (_proc, _inc, _ti):
+    _m.yaml_load = _wrap_yaml
+
+
+def _reset_counters():
+    _C.update(tpl_n=0, tpl_t=0.0, yaml_n=0, yaml_t=0.0)
+
+
+def load_once(tp, fname, test_dir):
+    """One full load (no execution). Returns (initial, loadtest, build) seconds."""
+    t0 = perf_counter()
+    init_pf, gv = tp._load_initial_params(test_dir)
+    t1 = perf_counter()
+    test_dict, _pf = tp._load_test(init_pf, gv)
+    t2 = perf_counter()
+    TestSet(fname, test_dict, Queue())
+    t3 = perf_counter()
+    return (t1 - t0, t2 - t1, t3 - t2)
+
+
+def main():
+    ap = argparse.ArgumentParser(description=__doc__,
+                                 formatter_class=argparse.RawDescriptionHelpFormatter)
+    ap.add_argument("main_tum", help="path to the generated main.tum")
+    ap.add_argument("--repeat", type=int, default=5)
+    ap.add_argument("--with-eval", action="store_true",
+                    help="start the external eval process (needed only for <| |> at load)")
+    ap.add_argument("--quiet", action="store_true",
+                    help="silence the loader's INFO output during runs")
+    args = ap.parse_args()
+
+    fname = os.path.abspath(args.main_tum)
+    if not os.path.isfile(fname):
+        ap.error(f"not found: {fname}")
+    test_dir = os.path.dirname(fname)
+
+    env_init()
+    apply_overrides({}, {})
+
+    eval_proc = None
+    if args.with_eval:
+        eval_proc = eval_process_init(api_request, 10, test_dir)
+        eval_proc.start()
+        eval_proc.wait_ready(10)
+
+    if args.quiet:
+        # the loader prints a couple of INFO lines per config file; mute stdout
+        # around the measured section to avoid I/O skew.
+        devnull = open(os.devnull, "w")
+        real_stdout = sys.stdout
+
+    tp = TestProcess(fname, Queue(), TestSetController(),
+                     config_files=[], defines={}, gui_defaults={}, text_mode=True)
+
+    samples = []  # list of (initial, loadtest, build)
+    last_counters = None
+    try:
+        for r in range(args.repeat):
+            _reset_counters()
+            if args.quiet:
+                sys.stdout = devnull
+            try:
+                samples.append(load_once(tp, fname, test_dir))
+            except RecursionError:
+                if args.quiet:
+                    sys.stdout = real_stdout
+                print(f"file      : {fname}")
+                print("ERROR     : RecursionError during load — the include "
+                      "nesting is too deep for the recursive loader.\n"
+                      "            (each include level costs ~10 stack frames; "
+                      "raise sys.setrecursionlimit to probe further.)")
+                return 2
+            except Exception as e:  # noqa: BLE001 - report, don't crash the bench
+                if args.quiet:
+                    sys.stdout = real_stdout
+                print(f"file      : {fname}")
+                print(f"ERROR     : load failed: {type(e).__name__}: {e}")
+                return 2
+            finally:
+                if args.quiet:
+                    sys.stdout = real_stdout
+            last_counters = dict(_C)
+    finally:
+        if eval_proc is not None:
+            eval_proc.stop()
+            eval_proc.join()
+        if args.quiet:
+            devnull.close()
+
+    initial = [s[0] for s in samples]
+    loadtest = [s[1] for s in samples]
+    build = [s[2] for s in samples]
+    total = [sum(s) for s in samples]
+
+    def stat(xs):
+        return min(xs), statistics.median(xs)
+
+    print(f"file      : {fname}")
+    print(f"repeats   : {args.repeat}   (showing  min | median, seconds)")
+    print(f"{'phase':<10}{'min':>12}{'median':>12}")
+    for name, xs in (("initial", initial), ("loadtest", loadtest),
+                     ("build", build), ("total", total)):
+        mn, md = stat(xs)
+        print(f"{name:<10}{mn:>12.4f}{md:>12.4f}")
+    if last_counters:
+        print("counters  (last run):")
+        print(f"  templates : {last_counters['tpl_n']:>7d} calls   "
+              f"{last_counters['tpl_t']:>8.4f}s  (exclusive: jinja compile+render+tempfile)")
+        print(f"  yaml      : {last_counters['yaml_n']:>7d} parses  "
+              f"{last_counters['yaml_t']:>8.4f}s  (inclusive of nested includes)")
+
+
+if __name__ == "__main__":
+    sys.exit(main() or 0)
--- a/test/benchmark/run.sh
+++ b/test/benchmark/run.sh
@@ -0,0 +1,49 @@
+#!/bin/bash
+# Load-time benchmark driver: generate synthetic .tum trees and time the
+# testium load pipeline on them, using the project venv.
+#
+# Usage:
+#   ./test/benchmark/run.sh                     # default matrix (all profiles)
+#   ./test/benchmark/run.sh <profile> <size>    # one profile at one size
+#   REPEAT=10 ./test/benchmark/run.sh repeat 2000
+#
+# Profiles: flat includes repeat jinja deep mix   (see gen_bench_test.py)
+#
+# Generated trees go under test/benchmark/cases/ (git-ignored). The numbers
+# are wall-clock; run on an otherwise idle machine and compare min values.
+set -e
+
+SCRIPT_DIR="$(realpath "$(dirname "$(readlink -f "$0")")")"
+PROJECT_DIR="$(realpath "$SCRIPT_DIR/../..")"
+VPY="$PROJECT_DIR/test/tmp/.venv/bin/python3"
+CASES="$SCRIPT_DIR/cases"
+REPEAT="${REPEAT:-5}"
+
+if [ ! -x "$VPY" ]; then
+    echo "ERROR: project venv not found at $VPY — run ./run.sh once to create it." >&2
+    exit 1
+fi
+
+bench() {
+    local profile="$1" size="$2"
+    local out="$CASES/${profile}_${size}"
+    local main
+    main="$("$VPY" "$SCRIPT_DIR/gen_bench_test.py" --profile "$profile" --size "$size" --out "$out")"
+    echo "===== profile=$profile size=$size ====="
+    "$VPY" "$SCRIPT_DIR/load_bench.py" --repeat "$REPEAT" --quiet "$main"
+    echo
+}
+
+if [ $# -eq 2 ]; then
+    bench "$1" "$2"
+    exit 0
+fi
+
+# Default matrix. 'deep' is kept small: the recursive loader hits Python's
+# recursion limit around ~90 nested include levels.
+bench flat     2000
+bench includes 1000
+bench repeat   1000
+bench jinja    2000
+bench deep     40
+bench mix      300
--- a/serial/rsl_terminal_robustness.tum
+++ b/serial/rsl_terminal_robustness.tum
@@ -1,69 +0,0 @@
-# Main
-################################################################################
-main:
-  name:                               Serial Terminal bug reproducer
-  version:                            0.1
-  steps:
-    - group:
-        name:                         Test preparation
-        steps:
-          - console:
-              name:             Open RSL Simulator Terminal
-              console_name:     RSL_simulator
-              steps:
-                  - open:
-                      protocol:   terminal
-                      terminal_path: $(rslsimulatorpath)
-                  - writeln: "pwd"
-                  - read_until: {expected: "$", timeout: 5}
-                  - writeln: "./RSverify $(rsTx)" # /dev/ttyMUE1
-                  - read_until: {expected: "RSL controller>", timeout: 5}
-                  - writeln: "setportconf 0 115200 none 8 1 1 255"
-                  - read_until: {expected: "RSL controller>", timeout: 5}
-                  - writeln: "send4ever 0 0"
-                  - read_until: {expected: "RSL controller>", timeout: 5}
-
-          - console:
-              name:                   Open the EUT console
-              console_name:           cons_target
-              doc:                    Initiates the console of the target in order
-                                      to be ready to capture its traces.
-              stop_on_failure:        True
-              steps:
-                - open:
-                    protocol:         serial
-                    serial_port:      $(rsRx) # /dev/ttyMUE2
-                    serial_baudrate:  115200
-
-    - loop:
-        name:                         Qualification loop
-        stop_on_failure:              False
-        steps:
-          - py_func:
-              name:                   Capture the RS serial output
-              file:                   $(test_directory)/terminal_bug_reproducer.py
-              func_name:              RetreiveData
-              param:
-                - cons_target
-
-          - sleep: {timeout: 1}
-
-# Cleanup sequence
-#-------------------------------------------------------------------------------
-    - group:
-        name:                         Cleanup
-        execute_on_stop:              True
-        steps:
-          - console:
-              name:                   Close the target console
-              console_name:           cons_target
-              execute_on_stop:        True
-              steps:
-                - close:
-
-          - console:
-              name:                   Close the RSL_simulator
-              console_name:           RSL_simulator
-              execute_on_stop:        True
-              steps:
-                - close:
--- a/serial/terminal_bug_reproducer.py
+++ b/serial/terminal_bug_reproducer.py
@@ -1,26 +0,0 @@
-import api.testium as tm
-
-def RetreiveData(console_name):
-    print("--------------- retrieving data ---------------")
-    result = 0
-    cons   = tm.console(console_name)
-
-    if cons is None:
-        print("--------------- The console does not exist ---------------")
-    else:
-        try:
-            is_finished = False
-            while not is_finished:
-                status, d = cons.read_until('\n', timeout=0, return_data=True, mute=True)
-                if 0 == status:
-                    print("--------------- Data ---------------")
-                    print(d)
-                else:
-                    print("--------------- No data ---------------")
-                    print("Status: ", status)
-                    is_finished = True
-        except:
-            print("--------------- Error retrieving data ---------------")
-            result = -1
-
-    return result
--- a/terminal/generate_char.sh
+++ b/terminal/generate_char.sh
@@ -1,9 +0,0 @@
-chars='<=>| -,;:!/."()[]{}*\&#%+012345689abcdefghiklmnopqrstuvwxyzABCD'
-for j in {1..256} ;
-do
-    for i in {1..256} ; do
-        echo -n "${chars:RANDOM%${#chars}:1}"
-    done
-    echo
-    sleep 0.01
-done
--- a/terminal/terminal_bug_reproducer.py
+++ b/terminal/terminal_bug_reproducer.py
@@ -1,26 +0,0 @@
-import api.testium as tm
-
-def RetreiveData(console_name):
-    print("--------------- retrieving data ---------------")
-    result = 0
-    cons   = tm.console(console_name)
-
-    if cons is None:
-        print("--------------- The console does not exist ---------------")
-    else:
-        try:
-            is_finished = False
-            while not is_finished:
-                status, d = cons.read_until('\n', timeout=0, return_data=True, mute=True)
-                if 0 == status:
-                    print("--------------- Data ---------------")
-                    print(d)
-                else:
-                    print("--------------- No data ---------------")
-                    print("Status: ", status)
-                    is_finished = True
-        except:
-            print("--------------- Error retrieving data ---------------")
-            result = -1
-
-    return result
--- a/terminal/terminal_robustness.tum
+++ b/terminal/terminal_robustness.tum
@@ -1,50 +0,0 @@
-# Main
-################################################################################
-main:
-  name:                               Terminal bug reproducer
-  version:                            0.1
-  steps:
-    - group:
-        name:                         Test preparation
-        steps:
-          - console:
-              name:                   Open the EUT console
-              console_name:           cons_target
-              doc:                    Initiates the console of the target in order
-                                      to be ready to capture its traces.
-              stop_on_failure:        True
-              steps:
-                - open:
-                    protocol: terminal
-
-    - loop:
-        name:                         Qualification loop
-        stop_on_failure:              False
-        steps:
-          - console:
-              name:                   write random data
-              console_name:           cons_target
-              steps:
-                  - writeln: bash $(test_directory)/generate_char.sh
-
-          - py_func:
-              name:                   Capture the terminal output
-              file:                   $(test_directory)/terminal_bug_reproducer.py
-              func_name:              RetreiveData
-              param:
-                - cons_target
-
-          - sleep: {timeout: 1}
-
-# Cleanup sequence
-#-------------------------------------------------------------------------------
-    - group:
-        name:                         Cleanup
-        execute_on_stop:              True
-        steps:
-          - console:
-              name:                   Close the target console
-              console_name:           cons_target
-              execute_on_stop:        True
-              steps:
-                - close:
Author	SHA1	Message	Date
François	f2eedb5606	docs: add 0.2.1 release note (load-time optimisations + fix) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 15:33:13 +02:00
François	f02616dc3a	perf(load): flatten step list in one pass; fix nested-list duplication load_test_recursively expanded nested lists and included 'sequence' entries by splicing each into the step list and rebuilding the whole list every time (O(n^2)). The list branch also rebuilt after an in-place splice, duplicating entries when a nested list held more than one item. Replace both with a single linear _flatten_actions pass. Build phase ~12% faster at 6k items; the real fix is the duplication (a nested 2-element list now yields a,b,c,d not a,b,c,c,d). Validation suite identical (post-exec SUCCESS, same verdicts/tracebacks). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 14:40:46 +02:00
François	5adba7fcd5	perf(load): use libyaml CLoader when available Base the TUM loaders (and the param-file load) on yaml.CLoader when PyYAML is built with libyaml, falling back to the pure-Python Loader otherwise. Same ParserError/ScannerError, same custom !include constructors. YAML parse time ~8x lower; validation suite identical (same verdicts, same 8 expected-fail tracebacks, post-exec SUCCESS). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 11:22:26 +02:00
François	5086aa6c0e	perf(load): cache compiled jinja templates, render in memory Shared jinja Environment + compiled-template cache keyed on (path, mtime, size), and render to an in-memory StringIO instead of a temp file. Behaviour unchanged (validation suite passes); template time -10..40x, total load -20..30% on template-heavy trees. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 10:41:42 +02:00
François	ef49789780	test: add load-time benchmark (jinja/include trees) Generator + in-process harness timing the real loader's three stages and template/YAML call counts, across tunable profiles. cases/ git-ignored; see test/benchmark/README.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 10:41:42 +02:00
François	6e31ae971a	removed unused robustness.	2026-05-31 10:17:54 +02:00