validation: dedicated venv + fix python_bin override timing

eval_proc was started before -d/GUI defines reached gd, so ``-d python_bin=...`` and the GUI ``python_bin`` preference were silently ignored by the very subprocess that runs ``<| ... |>`` evals (and only took effect for later items once the discovery cache had already been seeded with the system interpreter). apply_overrides() is now applied before eval_process_init(), and bins._resolve()'s cache is keyed by (name, override) so a later param.yaml change re-resolves on the next lookup. The validation suite now ships a wrapper (run.sh / run.bat) that creates a dedicated venv in the system temp dir and pins it via ``-d python_bin=...``. A new ``venv`` item asserts the override took effect for both eval_proc and py_func paths, with a ``sys.prefix != sys.base_prefix`` marker to catch the case where the override happens to be a system interpreter (path-equality alone would miss it, the venv's ``bin/python3`` being a symlink to the host). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-19 08:19:57 +02:00
parent 6f832cd67b
commit 4d8cafb5a0
10 changed files with 321 additions and 17 deletions
--- a/DESIGN.md
+++ b/DESIGN.md
@@ -114,11 +114,20 @@ To add a new API call usable from subprocesses:
 ### External interpreter resolution (`bins.py`)
 `src/testium/interpreter/utils/bins.py` — single source of truth for the paths to the external Python and Lua interpreters used by subprocesses.

- `python_bin()` / `lua_bin()` : resolve once, cache in memory. User can override via the `python_bin` / `lua_bin` global dict keys (typically populated from the YAML config). Falls back to discovery on PATH (candidates: `python3`/`python` and `lua`/`lua5.5`/`lua5.4`/`lua5.3`/`lua5.2`/`lua5.1`).
+- `python_bin()` / `lua_bin()` : resolve and cache. The cache is keyed by `(name, override)` so that a later change to `gd[python_bin]` (typically when a `param.yaml` sets the key) triggers a re-resolution on the next lookup instead of returning the stale auto-discovered path. Falls back to discovery on PATH (candidates: `python3`/`python` and `lua`/`lua5.5`/`lua5.4`/`lua5.3`/`lua5.2`/`lua5.1`).
 - `ensure(*names)` : called by `TestSet._validate_runtime_deps()` at test load. Always requires `python` (the eval engine always runs); requires `lua` only if a `lua_func` item is in the tree. Fails fast with a clear error citing tried candidates and override key.

 Engines (`PyProcessBase`, `LuaProcessBase`, `EvalExecEngine`) call `bins.python_bin()`/`bins.lua_bin()` themselves — call sites never pass an explicit binary path.

+#### Override-timing contract (`apply_overrides`)
+`bins.python_bin()` is called for the **first** time inside `eval_process_init()` (the long-lived inline-`<| … |>` subprocess), which happens **before** the YAML param files are loaded. To make `-d python_bin=…` and the GUI `python_bin` preference take effect for `eval_proc` itself, `process.py:run()` applies them to gd **before** `eval_process_init()` via the `apply_overrides()` helper extracted from `update_global()`. The post-load `update_global()` call then re-applies the same overrides (after `prepare_global()` clears gd), keeping the gd value in sync with the cached resolution.
+
+| Override source | `eval_proc` | `py_func` / `cycle` / `post_exec` |
+|---|---|---|
+| `-d python_bin=…` (CLI) | ✅ | ✅ |
+| GUI `python_bin` preference | ✅ | ✅ |
+| `python_bin: …` in `param.yaml` | ❌ (eval_proc already started) | ✅ (cache re-resolves on key change) |
+
 ## Key files

 | Path | Role |
@@ -261,12 +270,18 @@ Both Flatpak and AppImage export `TESTIUM_VERSION` from a launcher (Flatpak: lau
 - `unittest` item: renamed from `unittest_file`.
 - GUI test tree: check and fold state preserved across same-file reloads.
 - Licence: EUPL-1.2.
+- Interpreter override timing: `apply_overrides()` extracted from `update_global()` and called by `process.py:run()` before `eval_process_init()`, so `-d python_bin=…` / GUI prefs reach `bins.python_bin()` on its first lookup. `bins._resolve()` cache is now keyed by `(name, override)` so later `param.yaml` changes are picked up by subsequently constructed engines.

 ## Validation tests
-Located in `test/validation/`. Run with `-b` flag:
+Located in `test/validation/`. Two entry points:
 ```
-./run.sh -b -- test/validation/main.tum
+./test/validation/run.sh        # wrapper — uses a dedicated venv (see below)
+./run.sh -b -- test/validation/main.tum   # direct — testium's own python is used for test execution
 ```
+The `run.sh` / `run.bat` wrappers create a dedicated Python venv at `${TMPDIR:-/tmp}/testium-validation-venv` (Linux) or `%TEMP%\testium-validation-venv` (Windows), with `--system-site-packages` + `pip install junit-xml`, and run the suite with `-d python_bin=…` so every test-execution subprocess (eval_proc, py_func, cycle, post_exec) runs inside the venv. testium itself keeps running in the project's own environment. `clean` as the first argument recreates the venv.
+
+The `venv` item (`test/validation/items/venv/`) asserts that the override actually took effect: `python_bin` is set, `sys.executable` matches it, `sys.prefix == dirname(dirname(python_bin))`, and `sys.prefix != sys.base_prefix` (the last marker catches the case where `python_bin` happens to be a system interpreter, which path-equality alone would miss because the venv's `bin/python3` is a symlink to the host). Both `eval_proc` (inline `<| … |>`) and `py_func` paths are exercised.
+
 Parallel item tests: `test/validation/items/parallel/test.tum`

 ## Dependencies