validation: dedicated venv + fix python_bin override timing

eval_proc was started before -d/GUI defines reached gd, so ``-d python_bin=...`` and the GUI ``python_bin`` preference were silently ignored by the very subprocess that runs ``<| ... |>`` evals (and only took effect for later items once the discovery cache had already been seeded with the system interpreter). apply_overrides() is now applied before eval_process_init(), and bins._resolve()'s cache is keyed by (name, override) so a later param.yaml change re-resolves on the next lookup. The validation suite now ships a wrapper (run.sh / run.bat) that creates a dedicated venv in the system temp dir and pins it via ``-d python_bin=...``. A new ``venv`` item asserts the override took effect for both eval_proc and py_func paths, with a ``sys.prefix != sys.base_prefix`` marker to catch the case where the override happens to be a system interpreter (path-equality alone would miss it, the venv's ``bin/python3`` being a symlink to the host). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-19 08:19:57 +02:00
parent 6f832cd67b
commit 4d8cafb5a0
10 changed files with 321 additions and 17 deletions
--- a/DESIGN.md
+++ b/DESIGN.md
@@ -114,11 +114,20 @@ To add a new API call usable from subprocesses:
 ### External interpreter resolution (`bins.py`)
 `src/testium/interpreter/utils/bins.py` — single source of truth for the paths to the external Python and Lua interpreters used by subprocesses.

- `python_bin()` / `lua_bin()` : resolve once, cache in memory. User can override via the `python_bin` / `lua_bin` global dict keys (typically populated from the YAML config). Falls back to discovery on PATH (candidates: `python3`/`python` and `lua`/`lua5.5`/`lua5.4`/`lua5.3`/`lua5.2`/`lua5.1`).
+- `python_bin()` / `lua_bin()` : resolve and cache. The cache is keyed by `(name, override)` so that a later change to `gd[python_bin]` (typically when a `param.yaml` sets the key) triggers a re-resolution on the next lookup instead of returning the stale auto-discovered path. Falls back to discovery on PATH (candidates: `python3`/`python` and `lua`/`lua5.5`/`lua5.4`/`lua5.3`/`lua5.2`/`lua5.1`).
 - `ensure(*names)` : called by `TestSet._validate_runtime_deps()` at test load. Always requires `python` (the eval engine always runs); requires `lua` only if a `lua_func` item is in the tree. Fails fast with a clear error citing tried candidates and override key.

 Engines (`PyProcessBase`, `LuaProcessBase`, `EvalExecEngine`) call `bins.python_bin()`/`bins.lua_bin()` themselves — call sites never pass an explicit binary path.

+#### Override-timing contract (`apply_overrides`)
+`bins.python_bin()` is called for the **first** time inside `eval_process_init()` (the long-lived inline-`<| … |>` subprocess), which happens **before** the YAML param files are loaded. To make `-d python_bin=…` and the GUI `python_bin` preference take effect for `eval_proc` itself, `process.py:run()` applies them to gd **before** `eval_process_init()` via the `apply_overrides()` helper extracted from `update_global()`. The post-load `update_global()` call then re-applies the same overrides (after `prepare_global()` clears gd), keeping the gd value in sync with the cached resolution.
+
+| Override source | `eval_proc` | `py_func` / `cycle` / `post_exec` |
+|---|---|---|
+| `-d python_bin=…` (CLI) | ✅ | ✅ |
+| GUI `python_bin` preference | ✅ | ✅ |
+| `python_bin: …` in `param.yaml` | ❌ (eval_proc already started) | ✅ (cache re-resolves on key change) |
+
 ## Key files

 | Path | Role |
@@ -261,12 +270,18 @@ Both Flatpak and AppImage export `TESTIUM_VERSION` from a launcher (Flatpak: lau
 - `unittest` item: renamed from `unittest_file`.
 - GUI test tree: check and fold state preserved across same-file reloads.
 - Licence: EUPL-1.2.
+- Interpreter override timing: `apply_overrides()` extracted from `update_global()` and called by `process.py:run()` before `eval_process_init()`, so `-d python_bin=…` / GUI prefs reach `bins.python_bin()` on its first lookup. `bins._resolve()` cache is now keyed by `(name, override)` so later `param.yaml` changes are picked up by subsequently constructed engines.

 ## Validation tests
-Located in `test/validation/`. Run with `-b` flag:
+Located in `test/validation/`. Two entry points:
 ```
-./run.sh -b -- test/validation/main.tum
+./test/validation/run.sh        # wrapper — uses a dedicated venv (see below)
+./run.sh -b -- test/validation/main.tum   # direct — testium's own python is used for test execution
 ```
+The `run.sh` / `run.bat` wrappers create a dedicated Python venv at `${TMPDIR:-/tmp}/testium-validation-venv` (Linux) or `%TEMP%\testium-validation-venv` (Windows), with `--system-site-packages` + `pip install junit-xml`, and run the suite with `-d python_bin=…` so every test-execution subprocess (eval_proc, py_func, cycle, post_exec) runs inside the venv. testium itself keeps running in the project's own environment. `clean` as the first argument recreates the venv.
+
+The `venv` item (`test/validation/items/venv/`) asserts that the override actually took effect: `python_bin` is set, `sys.executable` matches it, `sys.prefix == dirname(dirname(python_bin))`, and `sys.prefix != sys.base_prefix` (the last marker catches the case where `python_bin` happens to be a system interpreter, which path-equality alone would miss because the venv's `bin/python3` is a symlink to the host). Both `eval_proc` (inline `<| … |>`) and `py_func` paths are exercised.
+
 Parallel item tests: `test/validation/items/parallel/test.tum`

 ## Dependencies
--- a/src/testium/interpreter/process.py
+++ b/src/testium/interpreter/process.py
@@ -16,6 +16,7 @@ from interpreter.utils.test_init import (
    env_init,
    prepare_global,
    update_global,
+    apply_overrides,
    set_standard_gd_keys,
    test_run_init,
    test_run_header,
@@ -210,6 +211,19 @@ class TestProcess(Process):

                env_init()

+                # Apply GUI defaults and CLI defines to the global dict
+                # *before* eval_proc starts: bins.python_bin() reads
+                # ``python_bin`` from gd on its very first call (during
+                # eval_process_init) and caches the result. Without this,
+                # ``-d python_bin=...`` and the GUI ``python_bin`` preference
+                # would only take effect for items spawned *after* the cache
+                # was already populated with the auto-discovered interpreter,
+                # i.e. they would silently be ignored for eval_proc itself.
+                # _load_initial_params re-applies the same overrides after
+                # ``prepare_global()`` clears gd, so the gd value stays in
+                # sync with the cached path.
+                apply_overrides(self.__defs, self.__gui_defaults)
+
                # Creation of the python evaluation process for loading of the complete test
                eval_proc = eval_process_init(api_request, 10, test_dir)
                eval_proc.start()
--- a/src/testium/interpreter/utils/bins.py
+++ b/src/testium/interpreter/utils/bins.py
@@ -202,16 +202,24 @@ _SPECS = {
    "lua":    ("Lua 5.1+",  "lua_bin",    _LUA_CANDIDATES,    _is_lua51),
 }

+# Cached per (name, override) so that runtime changes to gd[gd_key] —
+# e.g. ``python_bin`` set from a YAML config file loaded *after*
+# eval_proc has already resolved its own interpreter — are picked up by
+# the next lookup instead of returning the stale, auto-discovered path.
+# Long-lived subprocesses (eval_proc) keep whatever they captured at
+# construction time, but every new PyProcessBase / FuncExecEngine spawned
+# afterwards sees the current override.
 _resolved = {}


 def _resolve(name):
-    if name in _resolved:
-        return _resolved[name]
-
    display, gd_key, candidates, validator = _SPECS[name]
    override = tm.gd(gd_key, "") or ""

+    cached = _resolved.get(name)
+    if cached is not None and cached[0] == override:
+        return cached[1]
+
    path = ""
    if override:
        # Absolute path: accept as-is (user knows exactly what they want).
@@ -239,7 +247,7 @@ def _resolve(name):
                path = p
                break

-    _resolved[name] = path
+    _resolved[name] = (override, path)
    return path


--- a/src/testium/interpreter/utils/test_init.py
+++ b/src/testium/interpreter/utils/test_init.py
@@ -165,11 +165,14 @@ def env_init():
    _constants_init()


-def update_global(config_files, defines, gui_defaults, silent=False):
-    """Global dict updated with the content of the config file and a dict provided.
-    this function returns the resulting dict.
+def apply_overrides(defines, gui_defaults):
+    """Push GUI defaults then CLI defines into the global dict.
+
+    Extracted from update_global so it can be called *before* eval_proc
+    starts: interpreter overrides (python_bin, lua_bin) must be visible
+    to bins.python_bin() on its first lookup, which happens during
+    eval_process_init.
    """
-    # GUI preferences applied first
    for k, v in gui_defaults.items():
        try:
            val = ast.literal_eval(v)
@@ -177,7 +180,6 @@ def update_global(config_files, defines, gui_defaults, silent=False):
            val = v
        tm.setgd(k, val)

-    # Then command line defines
    for k, v in defines.items():
        try:
            val = ast.literal_eval(v)
@@ -185,6 +187,14 @@ def update_global(config_files, defines, gui_defaults, silent=False):
            val = v
        tm.setgd(k, val)

+
+def update_global(config_files, defines, gui_defaults, silent=False):
+    """Global dict updated with the content of the config file and a dict provided.
+    this function returns the resulting dict.
+    """
+    # GUI preferences applied first, then command line defines
+    apply_overrides(defines, gui_defaults)
+
    # Then the configuration files
    # load global dic before test item
    _feed_gd_with_params(config_files, silent)
--- a/test/validation/README.md
+++ b/test/validation/README.md
@@ -1,10 +1,43 @@
 # Validation

-This directory contains the necessary material to run the testium validation.
+This directory contains the testium validation suite.

-Here is the documentation on how to configure the validation, run it and check that the
-results are correct.
+## Running the suite

-# Tests
+```sh
+./test/validation/run.sh           # Linux
+test\validation\run.bat            # Windows
+```

-TBD
+The wrapper creates a dedicated Python venv in the system temp dir
+(`${TMPDIR:-/tmp}/testium-validation-venv` on Linux, `%TEMP%\testium-validation-venv`
+on Windows), using `--system-site-packages` so existing system packages
+stay visible. The validation suite is then run with that venv pinned as
+`python_bin`. Every test-execution subprocess (inline `<| ... |>`
+evaluation, `py_func`, `cycle`, `post_execution`, ...) runs inside the
+venv, while testium itself keeps running in the project's own
+environment.
+
+Pass `clean` as the first argument to recreate the venv from scratch
+(useful after a system Python upgrade):
+
+```sh
+./test/validation/run.sh clean
+```
+
+## What is checked
+
+The `venv` item under `items/venv/` asserts that the venv is actually
+being used:
+
+* `python_bin` is set in the global dict.
+* The eval subprocess (used for `<| ... |>` expressions) has
+  `sys.executable == python_bin`, `sys.prefix == dirname(dirname(python_bin))`,
+  and `sys.prefix != sys.base_prefix` (i.e. is actually inside a venv).
+* A `py_func` subprocess passes the same three checks.
+
+These checks use `abspath`/`normpath` rather than `realpath` on
+purpose: the venv's `bin/python3` is a symlink to the host interpreter,
+so `realpath` would map both venv and non-venv interpreters to the same
+target. `sys.prefix != sys.base_prefix` is the venv-specific marker
+that distinguishes the two cases.
--- a/test/validation/items/venv/param.yaml
+++ b/test/validation/items/venv/param.yaml
@@ -0,0 +1 @@
+no_param: Null
--- a/test/validation/items/venv/test.tum
+++ b/test/validation/items/venv/test.tum
@@ -0,0 +1,53 @@
+# venv test: assert that the dedicated validation venv is the python
+# being used for every test-execution subprocess (eval_proc / py_func /
+# cycle / ...). The venv path is pinned by ``-d python_bin=...`` in
+# test/validation/run.sh (or run.bat).
+#
+# We use ``abspath``/``normpath`` rather than ``realpath`` on purpose:
+# the venv's ``bin/python3`` is a symlink to the host python, so
+# realpath would map every venv interpreter to the same system path and
+# the comparison would silently pass even *without* a venv.
+# ``sys.prefix != sys.base_prefix`` is the venv-specific marker that
+# catches that case.
+
+- check:
+    name: python_bin is set in the global dict
+    key: $(test)_PASS
+    values:
+        - <| bool(r"$(python_bin)") |>
+
+- check:
+    name: eval_proc subprocess runs in the validation venv
+    key: $(test)_PASS
+    values:
+        - <| os.path.normpath(os.path.abspath(sys.executable)) == os.path.normpath(os.path.abspath(r"$(python_bin)")) |>
+
+- check:
+    name: eval_proc sys.prefix matches python_bin venv root
+    key: $(test)_PASS
+    values:
+        - <| os.path.normpath(os.path.abspath(sys.prefix)) == os.path.dirname(os.path.dirname(os.path.normpath(os.path.abspath(r"$(python_bin)")))) |>
+
+- check:
+    name: eval_proc is actually inside a venv (sys.prefix != sys.base_prefix)
+    key: $(test)_PASS
+    values:
+        - <| os.path.normpath(os.path.abspath(sys.prefix)) != os.path.normpath(os.path.abspath(sys.base_prefix)) |>
+
+- py_func:
+    name: py_func subprocess runs in the validation venv
+    key: $(test)_PASS
+    file: $(test_path)$(psep)verify_venv.py
+    func_name: check_sys_executable
+
+- py_func:
+    name: py_func sys.prefix matches python_bin venv root
+    key: $(test)_PASS
+    file: $(test_path)$(psep)verify_venv.py
+    func_name: check_sys_prefix_in_venv
+
+- py_func:
+    name: py_func is actually inside a venv
+    key: $(test)_PASS
+    file: $(test_path)$(psep)verify_venv.py
+    func_name: check_is_venv
--- a/test/validation/items/venv/verify_venv.py
+++ b/test/validation/items/venv/verify_venv.py
@@ -0,0 +1,62 @@
+import os
+import sys
+
+import py_func.tm as tm
+
+
+def _norm(p):
+    # normpath + normcase, without resolving symlinks. realpath() would
+    # follow the venv's ``python3`` symlink to ``/usr/bin/python3.X`` and
+    # defeat the comparison.
+    return os.path.normcase(os.path.normpath(os.path.abspath(p)))
+
+
+def _venv_dir():
+    # python_bin is at ``<venv>/(bin|Scripts)/python*`` so the venv root
+    # is two levels above the executable.
+    exe = tm.gd("python_bin", "")
+    if not exe:
+        return ""
+    return os.path.dirname(os.path.dirname(_norm(exe)))
+
+
+def check_sys_executable():
+    """py_func subprocess: sys.executable must match the configured python_bin."""
+    expected = _norm(tm.gd("python_bin", ""))
+    actual = _norm(sys.executable)
+    if expected and actual == expected:
+        return True
+    return (
+        -1,
+        f"sys.executable={actual!r} differs from python_bin={expected!r}",
+    )
+
+
+def check_sys_prefix_in_venv():
+    """py_func subprocess: sys.prefix must match the venv root derived
+    from python_bin (two levels up from the executable)."""
+    venv = _venv_dir()
+    if not venv:
+        return (-1, "python_bin is not set in the global dict")
+    actual = _norm(sys.prefix)
+    if actual == venv:
+        return True
+    return (
+        -1,
+        f"sys.prefix={actual!r} is not the validation venv {venv!r}",
+    )
+
+
+def check_is_venv():
+    """py_func subprocess: confirm we are inside a venv, i.e. sys.prefix
+    differs from sys.base_prefix. This catches the case where python_bin
+    happens to be a system interpreter and the path-based check would
+    pass trivially."""
+    actual = _norm(sys.prefix)
+    base = _norm(sys.base_prefix)
+    if actual != base:
+        return True
+    return (
+        -1,
+        f"sys.prefix == sys.base_prefix == {actual!r}: not running in a venv",
+    )
--- a/test/validation/run.bat
+++ b/test/validation/run.bat
@@ -0,0 +1,61 @@
+@echo off
+SETLOCAL EnableExtensions
+
+REM Runs the testium validation suite with a dedicated Python venv used
+REM by every py_func / cycle / inline-eval subprocess. testium itself
+REM keeps running in the project's own environment; the validation venv
+REM only isolates *test execution*.
+REM
+REM   test\validation\run.bat [clean] [extra testium args]
+REM
+REM Requires the project venv to already exist (run the project's
+REM run.bat once first, or any other testium install method).
+
+SET "SCRIPT_DIR=%~dp0"
+SET "PROJECT_DIR=%SCRIPT_DIR%..\.."
+REM Venv in the user temp dir (Windows equivalent of /tmp).
+SET "VENV_DIR=%TEMP%\testium-validation-venv"
+SET "PROJECT_VENV=%PROJECT_DIR%\test\tmp\testium_venv"
+
+IF /I "%~1"=="clean" (
+    rmdir /s /q "%VENV_DIR%"
+    SHIFT
+)
+
+REM Locate a host Python.
+SET "PYTHON_EXE=python"
+py --version >nul 2>&1
+IF %ERRORLEVEL% EQU 0 (
+    SET "PYTHON_EXE=py"
+    goto :PYTHON_FOUND
+)
+python --version >nul 2>&1
+IF %ERRORLEVEL% EQU 0 (
+    SET "PYTHON_EXE=python"
+    goto :PYTHON_FOUND
+)
+echo ERROR : Python could not be found on this system.
+exit /b 1
+
+:PYTHON_FOUND
+
+IF NOT EXIST "%VENV_DIR%" (
+    echo Creating validation venv at %VENV_DIR%
+    %PYTHON_EXE% -m venv --system-site-packages "%VENV_DIR%"
+    IF %ERRORLEVEL% NEQ 0 (
+        echo ERROR while creating the validation venv.
+        exit /b 1
+    )
+    call "%VENV_DIR%\Scripts\pip" install --quiet --upgrade pip
+    call "%VENV_DIR%\Scripts\pip" install --quiet junit-xml
+)
+
+SET "VENV_PYTHON=%VENV_DIR%\Scripts\python.exe"
+
+IF NOT EXIST "%PROJECT_VENV%" (
+    echo ERROR : project venv not found at %PROJECT_VENV%. Run the project run.bat once first.
+    exit /b 1
+)
+
+call "%PROJECT_VENV%\Scripts\activate"
+python "%PROJECT_DIR%\src\testium" -b -d "python_bin=%VENV_PYTHON%" -- "%SCRIPT_DIR%main.tum" %*
--- a/test/validation/run.sh
+++ b/test/validation/run.sh
@@ -0,0 +1,47 @@
+#!/bin/bash
+# Runs the testium validation suite with a dedicated Python venv used by
+# every py_func / cycle / inline-eval subprocess (i.e. everything that
+# goes through ``bins.python_bin()``). testium itself keeps running in
+# the project's own environment — the validation venv only isolates
+# *test execution*.
+#
+#   ./test/validation/run.sh [clean] [extra testium args]
+#
+# ``clean`` (optional, must be the first arg) removes the venv before
+# recreating it; this is the way to refresh the venv after a system
+# Python upgrade.
+
+set -e
+
+SCRIPT_PATH="$(readlink -f "$0")"
+SCRIPT_DIR="$(realpath "$(dirname "$SCRIPT_PATH")")"
+PROJECT_DIR="$(realpath "$SCRIPT_DIR/../..")"
+# Venv lives in the system temp dir so it stays out of the project tree
+# (and is naturally cleaned up by tmpfiles/reboot on most distros).
+VENV_DIR="${TMPDIR:-/tmp}/testium-validation-venv"
+
+if [ "${1:-}" = "clean" ]; then
+    rm -rf "$VENV_DIR"
+    shift
+fi
+
+if [ ! -d "$VENV_DIR" ]; then
+    echo "Creating validation venv at $VENV_DIR"
+    # --system-site-packages so we don't have to reinstall pyside6, lxml
+    # & friends just to support the validation helpers. We still pip
+    # install junit-xml below because it is the one dep that does *not*
+    # ship as a system package on most distros and is required by
+    # post_execution.py.
+    python3 -m venv --system-site-packages "$VENV_DIR"
+    "$VENV_DIR/bin/pip" install --quiet --upgrade pip
+    "$VENV_DIR/bin/pip" install --quiet junit-xml
+fi
+
+VENV_PYTHON="$VENV_DIR/bin/python3"
+
+# Delegate to the project's run.sh so testium itself still runs in the
+# project venv (with pyside6, gitpython, ...). ``-d python_bin=...``
+# pins every test-execution subprocess to the validation venv.
+exec "$PROJECT_DIR/run.sh" -b \
+    -d "python_bin=$VENV_PYTHON" \
+    -- "$SCRIPT_DIR/main.tum" "$@"