test: add load-time benchmark (jinja/include trees)
Generator + in-process harness timing the real loader's three stages and template/YAML call counts, across tunable profiles. cases/ git-ignored; see test/benchmark/README.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
1
test/benchmark/.gitignore
vendored
Normal file
1
test/benchmark/.gitignore
vendored
Normal file
@@ -0,0 +1 @@
|
|||||||
|
cases/
|
||||||
116
test/benchmark/README.md
Normal file
116
test/benchmark/README.md
Normal file
@@ -0,0 +1,116 @@
|
|||||||
|
# Load-time benchmark
|
||||||
|
|
||||||
|
Measures how long *testium* takes to **load** a `.tum` test tree — template
|
||||||
|
rendering (jinja) + YAML parsing + test-tree construction — *without* executing
|
||||||
|
it. Purpose: get reproducible numbers before/after load-path optimisations, and
|
||||||
|
attribute any gain to a specific part of the pipeline.
|
||||||
|
|
||||||
|
It is meant for *very long* tests, the kind you can build with `jinja` loops and
|
||||||
|
`!include`, where load time becomes noticeable.
|
||||||
|
|
||||||
|
## Files
|
||||||
|
|
||||||
|
| File | Role |
|
||||||
|
|------|------|
|
||||||
|
| `gen_bench_test.py` | Generates a synthetic `.tum` tree (the test input). |
|
||||||
|
| `load_bench.py` | Drives the **real** loader in-process and times it. |
|
||||||
|
| `run.sh` | Convenience: generate + time across profiles, using the project venv. |
|
||||||
|
| `cases/` | Generated trees (git-ignored, recreated on demand). |
|
||||||
|
|
||||||
|
The benchmark `.tum` files are **generated**, not committed — the generator is
|
||||||
|
the artifact. They use only `let` leaves and `group` containers, so loading has
|
||||||
|
no runtime side effect (no subprocess, no `<| |>` eval) and the timing reflects
|
||||||
|
the parse/build pipeline alone.
|
||||||
|
|
||||||
|
## Quick start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# default matrix (all profiles), 5 repeats each
|
||||||
|
./test/benchmark/run.sh
|
||||||
|
|
||||||
|
# one profile at one size
|
||||||
|
./test/benchmark/run.sh repeat 2000
|
||||||
|
|
||||||
|
# more repeats for a tighter min
|
||||||
|
REPEAT=10 ./test/benchmark/run.sh includes 1000
|
||||||
|
```
|
||||||
|
|
||||||
|
`run.sh` uses the project venv at `test/tmp/.venv` (created by `./run.sh`). If it
|
||||||
|
is missing, run `./run.sh` once first.
|
||||||
|
|
||||||
|
To drive the harness directly on any `.tum` (not just generated ones):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
test/tmp/.venv/bin/python3 test/benchmark/load_bench.py --repeat 5 --quiet path/to/main.tum
|
||||||
|
```
|
||||||
|
|
||||||
|
## Profiles
|
||||||
|
|
||||||
|
Each profile isolates one cost. `--size` is the profile-specific count.
|
||||||
|
|
||||||
|
| Profile | What it builds | Stresses |
|
||||||
|
|---------|----------------|----------|
|
||||||
|
| `flat` | one main file, N inline `let` steps | big YAML parse + linear object build |
|
||||||
|
| `includes` | main `!include`s N **distinct** sub-files | per-include template+YAML+tempfile, `sequence` splice |
|
||||||
|
| `repeat` | main `!include`s the **same** parametrised leaf N times | jinja **recompilation** of an identical template |
|
||||||
|
| `jinja` | one main file, `{% for %}` emitting N steps | single large render + single large parse |
|
||||||
|
| `deep` | nested includes, depth N | include recursion (see caveat) |
|
||||||
|
| `mix` | groups + jinja loop + distinct + repeated includes | realistic blend |
|
||||||
|
|
||||||
|
## Reading the output
|
||||||
|
|
||||||
|
```
|
||||||
|
phase min median
|
||||||
|
initial 0.1131 0.1285 <- pass 1: discover config files (no includes)
|
||||||
|
loadtest 1.0724 1.0900 <- config fixpoint loop + full recursive include load
|
||||||
|
build 0.1850 0.1976 <- TestSet: load_test_recursively tree build
|
||||||
|
total 1.3886 1.4227
|
||||||
|
counters (last run):
|
||||||
|
templates : 1003 calls 0.5247s (exclusive: jinja compile+render+tempfile)
|
||||||
|
yaml : 1004 parses 1.4696s (inclusive of nested includes)
|
||||||
|
```
|
||||||
|
|
||||||
|
- **min** is the headline (least noisy); median is a sanity check.
|
||||||
|
- **initial / loadtest / build** map to the three pipeline stages in
|
||||||
|
`interpreter/process.py` and `interpreter/test_set.py`. The main file is
|
||||||
|
rendered+parsed across `initial` *and* `loadtest` (the loader does ~3 passes).
|
||||||
|
- **templates** = number of `template_to_test()` calls and their *exclusive*
|
||||||
|
wall time (one file render each — pure jinja compile+render+tempfile I/O).
|
||||||
|
A high count with the same source file = recompilation, the `repeat` case.
|
||||||
|
- **yaml** = number of `yaml_load()` parses. Its time is *inclusive* of nested
|
||||||
|
includes, so use the **count** for attribution, not the seconds.
|
||||||
|
|
||||||
|
## Mapping to the optimisation axes
|
||||||
|
|
||||||
|
| Axis (see DESIGN / discussion) | Watch | Best profile to prove it |
|
||||||
|
|--------------------------------|-------|--------------------------|
|
||||||
|
| 1 — cache compiled jinja templates | `templates` time drops, count unchanged | `repeat` |
|
||||||
|
| 2 — drop the tempfile round-trip | `templates` time drops | `includes`, `repeat`, `mix` |
|
||||||
|
| 3 — C YAML loader (libyaml) | `yaml` time / `loadtest` drops | `flat`, `jinja` |
|
||||||
|
| 6 — O(n²) sequence splice | `build` drops | `includes`, `mix` |
|
||||||
|
|
||||||
|
## How to compare before/after a change
|
||||||
|
|
||||||
|
1. Run the matrix on the current code, keep the output.
|
||||||
|
2. Apply one axis.
|
||||||
|
3. Re-run the **same** profiles/sizes; compare `min` per phase and the counters.
|
||||||
|
|
||||||
|
Change one axis at a time so the attribution is clean. Run on an idle machine
|
||||||
|
(and note the disk: on a USB stick the tempfile round-trip of axis 2 weighs
|
||||||
|
more).
|
||||||
|
|
||||||
|
## Caveat: deep includes
|
||||||
|
|
||||||
|
The loader is recursive and spends ~10 stack frames per include level, so
|
||||||
|
`deep` hits Python's `RecursionError` around ~90 nested levels. The harness
|
||||||
|
reports this cleanly instead of crashing. Real tests are *wide* (many steps /
|
||||||
|
many includes), not deep, so `includes`/`repeat`/`jinja`/`mix` are the
|
||||||
|
representative "very long" cases.
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- No execution is triggered — timing stops where `Batch` would mark the test
|
||||||
|
*loaded*.
|
||||||
|
- The profiles contain no `<| |>`, so the external eval process is not started.
|
||||||
|
Pass `--with-eval` to `load_bench.py` for trees that evaluate at load time.
|
||||||
|
- Numbers are machine- and disk-specific; only compare runs from the same host.
|
||||||
179
test/benchmark/gen_bench_test.py
Executable file
179
test/benchmark/gen_bench_test.py
Executable file
@@ -0,0 +1,179 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""Generate synthetic ``.tum`` test trees to benchmark *load* time.
|
||||||
|
|
||||||
|
The generated trees are deliberately cheap to *build* (only ``let`` leaves and
|
||||||
|
``group`` containers — no subprocess, no runtime side effect) so the load
|
||||||
|
benchmark measures the parse / template / tree-build pipeline and nothing else.
|
||||||
|
|
||||||
|
Profiles, each targeting a specific cost in the loader:
|
||||||
|
|
||||||
|
flat one main file, N inline ``let`` steps, no include, no jinja.
|
||||||
|
Baseline: YAML parse of a big document + linear object build.
|
||||||
|
|
||||||
|
includes main ``!include``s N *distinct* sub-files (a few steps each).
|
||||||
|
Stresses the per-include template+YAML+tempfile round-trip and the
|
||||||
|
``sequence`` splice in test_set.load_test_recursively.
|
||||||
|
|
||||||
|
repeat main ``!include``s the *same* parametrised leaf file N times.
|
||||||
|
Stresses jinja *recompilation*: the compiled template is identical
|
||||||
|
every time, only the render params (idx) differ -> the case a
|
||||||
|
template cache collapses.
|
||||||
|
|
||||||
|
jinja one main file whose ``{% for %}`` loop emits N steps.
|
||||||
|
Stresses a single large jinja render + a single large YAML parse.
|
||||||
|
|
||||||
|
deep nested includes, depth N (main -> d0 -> d1 -> ...).
|
||||||
|
Stresses include recursion and per-level template+YAML.
|
||||||
|
|
||||||
|
mix a realistic blend: groups, a jinja loop, distinct includes and a
|
||||||
|
repeated parametrised include.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
gen_bench_test.py --profile repeat --size 1000 --out cases/repeat_1000
|
||||||
|
-> writes <out>/main.tum (+ includes, + param.yaml) and prints the path.
|
||||||
|
"""
|
||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
import shutil
|
||||||
|
|
||||||
|
|
||||||
|
def _let(indent, i, name=None):
|
||||||
|
name = name if name is not None else f"s{i}"
|
||||||
|
pad = " " * indent
|
||||||
|
return (
|
||||||
|
f"{pad}- let:\n"
|
||||||
|
f"{pad} name: {name}\n"
|
||||||
|
f"{pad} values:\n"
|
||||||
|
f"{pad} - k{i}: {i}\n"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def gen_flat(out, n):
|
||||||
|
body = "".join(_let(8, i) for i in range(n))
|
||||||
|
main = f"main:\n name: bench flat {n}\n steps:\n{body}"
|
||||||
|
_write(out, "main.tum", main)
|
||||||
|
|
||||||
|
|
||||||
|
def gen_includes(out, n):
|
||||||
|
steps = "".join(f" - !include inc_{i}.tum\n" for i in range(n))
|
||||||
|
main = f"main:\n name: bench includes {n}\n steps:\n{steps}"
|
||||||
|
_write(out, "main.tum", main)
|
||||||
|
for i in range(n):
|
||||||
|
# each include is a YAML *sequence* (list of steps)
|
||||||
|
seq = "".join(_let(0, i * 3 + j, name=f"inc{i}_{j}") for j in range(3))
|
||||||
|
_write(out, f"inc_{i}.tum", seq)
|
||||||
|
|
||||||
|
|
||||||
|
def gen_repeat(out, n):
|
||||||
|
steps = "".join(
|
||||||
|
f" - !include {{file: leaf.tum, idx: {i}}}\n" for i in range(n)
|
||||||
|
)
|
||||||
|
main = f"main:\n name: bench repeat {n}\n steps:\n{steps}"
|
||||||
|
_write(out, "main.tum", main)
|
||||||
|
leaf = (
|
||||||
|
"- let:\n"
|
||||||
|
" name: leaf_{{ idx }}\n"
|
||||||
|
" values:\n"
|
||||||
|
" - leaf_{{ idx }}: {{ idx }}\n"
|
||||||
|
)
|
||||||
|
_write(out, "leaf.tum", leaf)
|
||||||
|
|
||||||
|
|
||||||
|
def gen_jinja(out, n):
|
||||||
|
main = (
|
||||||
|
f"main:\n name: bench jinja {n}\n steps:\n"
|
||||||
|
"{% for i in range(" + str(n) + ") %}\n"
|
||||||
|
" - let:\n"
|
||||||
|
" name: j{{ i }}\n"
|
||||||
|
" values:\n"
|
||||||
|
" - k{{ i }}: {{ i }}\n"
|
||||||
|
"{% endfor %}\n"
|
||||||
|
)
|
||||||
|
_write(out, "main.tum", main)
|
||||||
|
|
||||||
|
|
||||||
|
def gen_deep(out, n):
|
||||||
|
main = (
|
||||||
|
f"main:\n name: bench deep {n}\n steps:\n"
|
||||||
|
" - let:\n name: top\n values:\n - a: 0\n"
|
||||||
|
" - !include d_0.tum\n"
|
||||||
|
)
|
||||||
|
_write(out, "main.tum", main)
|
||||||
|
for i in range(n):
|
||||||
|
seq = _let(0, i, name=f"d{i}")
|
||||||
|
if i < n - 1:
|
||||||
|
seq += f"- !include d_{i + 1}.tum\n"
|
||||||
|
_write(out, f"d_{i}.tum", seq)
|
||||||
|
|
||||||
|
|
||||||
|
def gen_mix(out, n):
|
||||||
|
# n groups, each: 2 inline lets, one distinct include, one repeated include,
|
||||||
|
# plus a small jinja loop. Roughly ~6*n steps.
|
||||||
|
per = max(1, n)
|
||||||
|
parts = [f"main:\n name: bench mix {n}\n steps:\n"]
|
||||||
|
for g in range(per):
|
||||||
|
parts.append(
|
||||||
|
f" - group:\n"
|
||||||
|
f" name: grp{g}\n"
|
||||||
|
f" steps:\n"
|
||||||
|
)
|
||||||
|
parts.append(_let(16, g * 2, name=f"g{g}_a"))
|
||||||
|
parts.append(_let(16, g * 2 + 1, name=f"g{g}_b"))
|
||||||
|
parts.append(f" - !include inc_{g}.tum\n")
|
||||||
|
parts.append(f" - !include {{file: leaf.tum, idx: {g}}}\n")
|
||||||
|
parts.append(
|
||||||
|
"{% for i in range(3) %}\n"
|
||||||
|
f" - let:\n"
|
||||||
|
f" name: g{g}_j{{{{ i }}}}\n"
|
||||||
|
f" values:\n"
|
||||||
|
f" - g{g}_k{{{{ i }}}}: {{{{ i }}}}\n"
|
||||||
|
"{% endfor %}\n"
|
||||||
|
)
|
||||||
|
_write(out, "main.tum", "".join(parts))
|
||||||
|
for g in range(per):
|
||||||
|
_write(out, f"inc_{g}.tum", _let(0, g, name=f"mixinc{g}"))
|
||||||
|
_write(
|
||||||
|
out,
|
||||||
|
"leaf.tum",
|
||||||
|
"- let:\n name: mixleaf_{{ idx }}\n values:\n - mixleaf_{{ idx }}: {{ idx }}\n",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
PROFILES = {
|
||||||
|
"flat": gen_flat,
|
||||||
|
"includes": gen_includes,
|
||||||
|
"repeat": gen_repeat,
|
||||||
|
"jinja": gen_jinja,
|
||||||
|
"deep": gen_deep,
|
||||||
|
"mix": gen_mix,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _write(out, name, content):
|
||||||
|
with open(os.path.join(out, name), "w") as f:
|
||||||
|
f.write(content)
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
ap = argparse.ArgumentParser(description=__doc__,
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter)
|
||||||
|
ap.add_argument("--profile", required=True, choices=sorted(PROFILES))
|
||||||
|
ap.add_argument("--size", type=int, default=1000,
|
||||||
|
help="profile-specific count (steps / includes / depth)")
|
||||||
|
ap.add_argument("--out", required=True, help="output directory (recreated)")
|
||||||
|
args = ap.parse_args()
|
||||||
|
|
||||||
|
out = os.path.abspath(args.out)
|
||||||
|
if os.path.isdir(out):
|
||||||
|
shutil.rmtree(out)
|
||||||
|
os.makedirs(out)
|
||||||
|
|
||||||
|
# minimal config file so the loader does not emit "no param file" noise
|
||||||
|
_write(out, "param.yaml", "bench_dummy: 1\n")
|
||||||
|
|
||||||
|
PROFILES[args.profile](out, args.size)
|
||||||
|
print(os.path.join(out, "main.tum"))
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
200
test/benchmark/load_bench.py
Executable file
200
test/benchmark/load_bench.py
Executable file
@@ -0,0 +1,200 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""Time the testium *load* pipeline on a given ``.tum`` tree.
|
||||||
|
|
||||||
|
It drives the real loader code (``TestProcess._load_initial_params`` /
|
||||||
|
``_load_test`` then ``TestSet(...)``) in-process, so the numbers track the
|
||||||
|
production path and stay honest as the code evolves. Execution is never
|
||||||
|
triggered — we stop exactly where ``Batch`` would report the test as *loaded*.
|
||||||
|
|
||||||
|
Reported per run, over ``--repeat`` iterations (min is the headline, least
|
||||||
|
noisy):
|
||||||
|
|
||||||
|
initial first pass: discover config files (template+YAML, no includes)
|
||||||
|
loadtest config-file fixpoint loop + full recursive include/template/YAML
|
||||||
|
build TestSet construction: the load_test_recursively tree build
|
||||||
|
total sum of the three
|
||||||
|
|
||||||
|
Plus instrumentation counters (exact call counts, wall time) for the two
|
||||||
|
hot leaves the optimisation axes target:
|
||||||
|
|
||||||
|
templates jinja template_to_test() calls (axis 1 compile cache, axis 2 tempfile)
|
||||||
|
yaml yaml_load() parses (axis 3 C loader)
|
||||||
|
|
||||||
|
template time is exclusive (one file render); yaml time is wall-inclusive of
|
||||||
|
nested includes, so lean on the *counts* for attribution.
|
||||||
|
|
||||||
|
Must run inside the project venv (jinja2, pyyaml, telnetlib3, ...). The
|
||||||
|
benchmark profiles contain no ``<| |>`` so the external eval process is not
|
||||||
|
needed; pass --with-eval to start it for faithfulness on eval-heavy trees.
|
||||||
|
|
||||||
|
Usage (see run.sh for the convenience wrapper):
|
||||||
|
test/tmp/.venv/bin/python3 test/benchmark/load_bench.py [--repeat 5] <main.tum>
|
||||||
|
"""
|
||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
import statistics
|
||||||
|
import sys
|
||||||
|
from queue import Queue
|
||||||
|
from time import perf_counter
|
||||||
|
|
||||||
|
# --- bootstrap: src/testium for flat imports, src for `import testium` --------
|
||||||
|
HERE = os.path.dirname(os.path.abspath(__file__))
|
||||||
|
ROOT = os.path.abspath(os.path.join(HERE, "..", ".."))
|
||||||
|
sys.path.insert(0, os.path.join(ROOT, "src"))
|
||||||
|
sys.path.insert(0, os.path.join(ROOT, "src", "testium"))
|
||||||
|
|
||||||
|
import api.testium as tm # noqa: E402
|
||||||
|
from interpreter.utils.test_init import env_init, apply_overrides # noqa: E402
|
||||||
|
from interpreter.utils.test_ctrl import TestSetController # noqa: E402
|
||||||
|
from interpreter.process import TestProcess # noqa: E402
|
||||||
|
from interpreter.test_set import TestSet # noqa: E402
|
||||||
|
from interpreter.utils.py_eval import eval_process_init # noqa: E402
|
||||||
|
from interpreter.utils.api_srv import api_request # noqa: E402
|
||||||
|
|
||||||
|
# --- instrumentation: count + time the two hot leaves -------------------------
|
||||||
|
import interpreter.process as _proc # noqa: E402
|
||||||
|
import interpreter.utils.include as _inc # noqa: E402
|
||||||
|
import interpreter.utils.test_init as _ti # noqa: E402
|
||||||
|
import interpreter.utils.template as _tpl # noqa: E402
|
||||||
|
import interpreter.utils.yaml_load as _yl # noqa: E402
|
||||||
|
|
||||||
|
_C = {"tpl_n": 0, "tpl_t": 0.0, "yaml_n": 0, "yaml_t": 0.0}
|
||||||
|
_orig_tpl = _tpl.template_to_test
|
||||||
|
_orig_yaml = _yl.yaml_load
|
||||||
|
|
||||||
|
|
||||||
|
def _wrap_tpl(*a, **k):
|
||||||
|
t = perf_counter()
|
||||||
|
try:
|
||||||
|
return _orig_tpl(*a, **k)
|
||||||
|
finally:
|
||||||
|
_C["tpl_t"] += perf_counter() - t
|
||||||
|
_C["tpl_n"] += 1
|
||||||
|
|
||||||
|
|
||||||
|
def _wrap_yaml(*a, **k):
|
||||||
|
t = perf_counter()
|
||||||
|
try:
|
||||||
|
return _orig_yaml(*a, **k)
|
||||||
|
finally:
|
||||||
|
_C["yaml_t"] += perf_counter() - t
|
||||||
|
_C["yaml_n"] += 1
|
||||||
|
|
||||||
|
|
||||||
|
# rebind in every module that did `from ... import template_to_test / yaml_load`
|
||||||
|
for _m in (_proc, _inc):
|
||||||
|
_m.template_to_test = _wrap_tpl
|
||||||
|
for _m in (_proc, _inc, _ti):
|
||||||
|
_m.yaml_load = _wrap_yaml
|
||||||
|
|
||||||
|
|
||||||
|
def _reset_counters():
|
||||||
|
_C.update(tpl_n=0, tpl_t=0.0, yaml_n=0, yaml_t=0.0)
|
||||||
|
|
||||||
|
|
||||||
|
def load_once(tp, fname, test_dir):
|
||||||
|
"""One full load (no execution). Returns (initial, loadtest, build) seconds."""
|
||||||
|
t0 = perf_counter()
|
||||||
|
init_pf, gv = tp._load_initial_params(test_dir)
|
||||||
|
t1 = perf_counter()
|
||||||
|
test_dict, _pf = tp._load_test(init_pf, gv)
|
||||||
|
t2 = perf_counter()
|
||||||
|
TestSet(fname, test_dict, Queue())
|
||||||
|
t3 = perf_counter()
|
||||||
|
return (t1 - t0, t2 - t1, t3 - t2)
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
ap = argparse.ArgumentParser(description=__doc__,
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter)
|
||||||
|
ap.add_argument("main_tum", help="path to the generated main.tum")
|
||||||
|
ap.add_argument("--repeat", type=int, default=5)
|
||||||
|
ap.add_argument("--with-eval", action="store_true",
|
||||||
|
help="start the external eval process (needed only for <| |> at load)")
|
||||||
|
ap.add_argument("--quiet", action="store_true",
|
||||||
|
help="silence the loader's INFO output during runs")
|
||||||
|
args = ap.parse_args()
|
||||||
|
|
||||||
|
fname = os.path.abspath(args.main_tum)
|
||||||
|
if not os.path.isfile(fname):
|
||||||
|
ap.error(f"not found: {fname}")
|
||||||
|
test_dir = os.path.dirname(fname)
|
||||||
|
|
||||||
|
env_init()
|
||||||
|
apply_overrides({}, {})
|
||||||
|
|
||||||
|
eval_proc = None
|
||||||
|
if args.with_eval:
|
||||||
|
eval_proc = eval_process_init(api_request, 10, test_dir)
|
||||||
|
eval_proc.start()
|
||||||
|
eval_proc.wait_ready(10)
|
||||||
|
|
||||||
|
if args.quiet:
|
||||||
|
# the loader prints a couple of INFO lines per config file; mute stdout
|
||||||
|
# around the measured section to avoid I/O skew.
|
||||||
|
devnull = open(os.devnull, "w")
|
||||||
|
real_stdout = sys.stdout
|
||||||
|
|
||||||
|
tp = TestProcess(fname, Queue(), TestSetController(),
|
||||||
|
config_files=[], defines={}, gui_defaults={}, text_mode=True)
|
||||||
|
|
||||||
|
samples = [] # list of (initial, loadtest, build)
|
||||||
|
last_counters = None
|
||||||
|
try:
|
||||||
|
for r in range(args.repeat):
|
||||||
|
_reset_counters()
|
||||||
|
if args.quiet:
|
||||||
|
sys.stdout = devnull
|
||||||
|
try:
|
||||||
|
samples.append(load_once(tp, fname, test_dir))
|
||||||
|
except RecursionError:
|
||||||
|
if args.quiet:
|
||||||
|
sys.stdout = real_stdout
|
||||||
|
print(f"file : {fname}")
|
||||||
|
print("ERROR : RecursionError during load — the include "
|
||||||
|
"nesting is too deep for the recursive loader.\n"
|
||||||
|
" (each include level costs ~10 stack frames; "
|
||||||
|
"raise sys.setrecursionlimit to probe further.)")
|
||||||
|
return 2
|
||||||
|
except Exception as e: # noqa: BLE001 - report, don't crash the bench
|
||||||
|
if args.quiet:
|
||||||
|
sys.stdout = real_stdout
|
||||||
|
print(f"file : {fname}")
|
||||||
|
print(f"ERROR : load failed: {type(e).__name__}: {e}")
|
||||||
|
return 2
|
||||||
|
finally:
|
||||||
|
if args.quiet:
|
||||||
|
sys.stdout = real_stdout
|
||||||
|
last_counters = dict(_C)
|
||||||
|
finally:
|
||||||
|
if eval_proc is not None:
|
||||||
|
eval_proc.stop()
|
||||||
|
eval_proc.join()
|
||||||
|
if args.quiet:
|
||||||
|
devnull.close()
|
||||||
|
|
||||||
|
initial = [s[0] for s in samples]
|
||||||
|
loadtest = [s[1] for s in samples]
|
||||||
|
build = [s[2] for s in samples]
|
||||||
|
total = [sum(s) for s in samples]
|
||||||
|
|
||||||
|
def stat(xs):
|
||||||
|
return min(xs), statistics.median(xs)
|
||||||
|
|
||||||
|
print(f"file : {fname}")
|
||||||
|
print(f"repeats : {args.repeat} (showing min | median, seconds)")
|
||||||
|
print(f"{'phase':<10}{'min':>12}{'median':>12}")
|
||||||
|
for name, xs in (("initial", initial), ("loadtest", loadtest),
|
||||||
|
("build", build), ("total", total)):
|
||||||
|
mn, md = stat(xs)
|
||||||
|
print(f"{name:<10}{mn:>12.4f}{md:>12.4f}")
|
||||||
|
if last_counters:
|
||||||
|
print("counters (last run):")
|
||||||
|
print(f" templates : {last_counters['tpl_n']:>7d} calls "
|
||||||
|
f"{last_counters['tpl_t']:>8.4f}s (exclusive: jinja compile+render+tempfile)")
|
||||||
|
print(f" yaml : {last_counters['yaml_n']:>7d} parses "
|
||||||
|
f"{last_counters['yaml_t']:>8.4f}s (inclusive of nested includes)")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
sys.exit(main() or 0)
|
||||||
49
test/benchmark/run.sh
Executable file
49
test/benchmark/run.sh
Executable file
@@ -0,0 +1,49 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# Load-time benchmark driver: generate synthetic .tum trees and time the
|
||||||
|
# testium load pipeline on them, using the project venv.
|
||||||
|
#
|
||||||
|
# Usage:
|
||||||
|
# ./test/benchmark/run.sh # default matrix (all profiles)
|
||||||
|
# ./test/benchmark/run.sh <profile> <size> # one profile at one size
|
||||||
|
# REPEAT=10 ./test/benchmark/run.sh repeat 2000
|
||||||
|
#
|
||||||
|
# Profiles: flat includes repeat jinja deep mix (see gen_bench_test.py)
|
||||||
|
#
|
||||||
|
# Generated trees go under test/benchmark/cases/ (git-ignored). The numbers
|
||||||
|
# are wall-clock; run on an otherwise idle machine and compare min values.
|
||||||
|
set -e
|
||||||
|
|
||||||
|
SCRIPT_DIR="$(realpath "$(dirname "$(readlink -f "$0")")")"
|
||||||
|
PROJECT_DIR="$(realpath "$SCRIPT_DIR/../..")"
|
||||||
|
VPY="$PROJECT_DIR/test/tmp/.venv/bin/python3"
|
||||||
|
CASES="$SCRIPT_DIR/cases"
|
||||||
|
REPEAT="${REPEAT:-5}"
|
||||||
|
|
||||||
|
if [ ! -x "$VPY" ]; then
|
||||||
|
echo "ERROR: project venv not found at $VPY — run ./run.sh once to create it." >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
bench() {
|
||||||
|
local profile="$1" size="$2"
|
||||||
|
local out="$CASES/${profile}_${size}"
|
||||||
|
local main
|
||||||
|
main="$("$VPY" "$SCRIPT_DIR/gen_bench_test.py" --profile "$profile" --size "$size" --out "$out")"
|
||||||
|
echo "===== profile=$profile size=$size ====="
|
||||||
|
"$VPY" "$SCRIPT_DIR/load_bench.py" --repeat "$REPEAT" --quiet "$main"
|
||||||
|
echo
|
||||||
|
}
|
||||||
|
|
||||||
|
if [ $# -eq 2 ]; then
|
||||||
|
bench "$1" "$2"
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Default matrix. 'deep' is kept small: the recursive loader hits Python's
|
||||||
|
# recursion limit around ~90 nested include levels.
|
||||||
|
bench flat 2000
|
||||||
|
bench includes 1000
|
||||||
|
bench repeat 1000
|
||||||
|
bench jinja 2000
|
||||||
|
bench deep 40
|
||||||
|
bench mix 300
|
||||||
Reference in New Issue
Block a user