Compare commits

..

10 Commits

Author SHA1 Message Date
43d291418b Merge branch 'main' of ssh://cahute.beafrancois.fr:8329/electronics/bs_explorer 2026-06-05 19:38:33 +02:00
1fcc9bdae6 doc: design note for boundary-scan board test
Capture the plan for the strongest fit of the tool: feed a board netlist
+ the chain devices' BSDL and auto-generate/run boundary-scan
interconnect tests (opens/shorts/stuck-at) + chain integrity.

Covers inputs (netlist, BSDL, board.yaml), a new bstest/ layer whose
central primitive is a whole-chain boundary register (lifts the current
single-device assumption), the test types, the hard parts (safety/
contention, control-cell mapping, multi-device bit order, vector gen),
and a 5-phase plan.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 20:53:19 +02:00
22feb66467 doc: describe the ARM7 memory-read operating context
The ARM7TDMI memory read (cpu_read/cpu_halt/cpu_resume) works; document
when and how:
- tutorial: rewrite the "CPU targets" section from "structure only" to a
  working cpu_read walkthrough (dump LPC2103 flash to Intel HEX), state
  the operating envelope (power-on -> one halt -> dump; reads clobber
  r0..r14/PC, no context save/restore, so resume isn't clean and a
  re-halt in the same session can time out -> power-cycle), plus a
  troubleshooting row for the sys-speed timeout.
- CLAUDE.md: roadmap phase 7 + ARM-debug note now say the read works
  (flash dump validated), with context save/restore + arm_flash write as
  the remaining steps.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 20:32:46 +02:00
44cb9dfbae arm_debug: doc - memory read validated by 32 KB flash dump
cpu_read dumped the LPC2103's full 32 KB flash to Intel HEX
(objcopy-verified: all records/checksums valid, correct vectors). Update
the comments to reflect the working state and the power-on -> one halt ->
dump flow (context save/restore for repeated reads is the next step).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 20:23:04 +02:00
cdbeea7b61 arm_debug: OpenOCD-style debug entry + warm-up read
Make the system-speed memory read reliable within a halt:
- debug entry mirrors OpenOCD's arm7_9_debug_entry: change_to_arm when
  the core halted in Thumb, then read all 16 core registers. That fixed
  STMIA+NOP+NOP+16 sequence flushes the firmware out of the pipeline and
  leaves a deterministic state for both Thumb and ARM halts.
- warm-up read: the first system-speed read after debug entry normalizes
  the sys-speed pipeline but its own result is unreliable, so do one
  throwaway read block and discard it. Every read after it is consistent
  and correct (analogous to the FTDI stale-first-read).

Within one clean halt, reads now come back correct (no misalignment).
Repeated halt/read cycles without a power-cycle still degrade (the read
clobbers r0..r14, so a later re-halt/resume is messy) - the intended
flow is power-on -> one halt -> dump.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 20:09:39 +02:00
2c16a66beb arm_debug: cycle-exact JTAG layer; system-speed memory read works
Read CPU memory over JTAG via system-speed LDM. Validated on the LPC2103:
reads the real ARM reset vectors and contiguous multi-block code.

The core only advances on Run-Test/Idle debug clocks (not Update-DR), so
the trick is keeping that clock count exact:
- "quiet" TAP ops (quiet_set_ir / quiet_shift_dr / quiet_chain_select /
  quiet_eice_read / quiet_latch_chain1) pass through Update but park in
  Pause, never RTI -> they switch chains and read EmbeddedICE WITHOUT
  clocking the core, so they can't clobber the registers a sys-speed LDM
  just loaded.
- clock_core(n) is the only thing that advances the core (n RTI clocks).
- execute_sys_speed: RESTART, then drive the access one clock at a time
  with a quiet DBG_STATUS check between, stopping the instant
  SYSCOMP & DBGACK appear (no over-clock past re-entry).
- after sys-speed: quiet-switch to chain 1, quiet-latch a NOP to displace
  the stale LDMIA, then read_core_regs.
- pre-read pipeline normalization: change_to_arm (17 clocked instrs) for
  a Thumb halt; 17 ARM NOPs for an ARM halt.

WIP: not yet reliable across all halt states - the first read after some
halts times out (SYSCOMP never appears) and leaves the core running.
Within one good halt, reads are consistent and correct. Diagnosis and
next steps in the arm7-debug-dclk-timing note.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 20:00:51 +02:00
68229339e9 arm_debug: fix stale c1_xfer comment (bscan, not Pause-DR)
The c1_ctx/c1_xfer comments still described the abandoned Pause-DR
parking model; the access is a single bscan_shift_dr (one debug clock
per access). Comment-only.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 18:28:55 +02:00
fda6aed077 arm_debug: cycle-exact chain-1, Thumb->ARM, sys-speed re-entry
The chain-1 access is now deterministic: a bare 33-bit bscan_shift_dr
(breakpoint | flip32(instr)) is exactly one debug clock per access (the
Update->Run-Test/Idle transition). The earlier "+1 idle" double-clocked
the pipeline and the earlier all-zero/constant reads were the core being
in Thumb state. Validated on the LPC2103 by a known-pattern register
round-trip (write r1..r15, read back -> exact match; r15/PC differs by
the expected pipeline offset).

- c1_xfer: drop the extra idle dwell (one access == one debug clock).
- mem_read: detect Thumb (ITBIT) and change_to_arm in one continuous
  chain-1 session so no chain switch clocks the core mid-sequence.
- execute_sys_speed: drop the post-RESTART idle burst and poll
  DBG_STATUS straight away (matches OpenOCD); the system-speed LDM now
  re-enters debug (DBGACK & SYSCOMP) instead of running free.

WIP: the read_core_regs after a system-speed access is phase-shifted by
the EmbeddedICE<->chain-1 switch, so memory reads come back misaligned
(capturing injected instructions). Next step + diagnosis in the
arm7-debug-dclk-timing note.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 18:23:45 +02:00
e0dc58d09c arm_debug: debug entry (DBGRQ-clear), Thumb->ARM, cpu_read (WIP)
Bring up ARM7TDMI debug toward reading CPU memory/flash over JTAG.
Validated on the LPC2103 (Olimex ARM-USB-OCD): halt holds DBGACK,
RESTART resumes, the Thumb->ARM switch clears ITBIT, and real register
data streams out of the STMIA injection.

- arm_debug:
  - halt: after DBGACK, reprogram DBG_CTRL = DBGACK|INTDIS (deassert
    DBGRQ) per OpenOCD's debug entry; without this, injected
    instructions don't execute. Warn on Thumb (ITBIT).
  - change_to_arm: switch a Thumb-state core to ARM (duplicated-halfword
    Thumb opcodes), needed because the firmware may halt in either state.
  - chain-1 instruction injection: c1_xfer/read_core_regs/
    write_core_regs/load_word_regs + execute_sys_speed (RESTART, poll
    DBGACK&SYSCOMP); arm_debug_mem_read does word-block system-speed LDM.
- script: cpu_read <dev> <addr> <len> <file> <bin|hex> command +
  built-in Intel HEX writer (type 04/00/01 records).

WIP: c1_xfer (on bscan_shift_dr) is not yet cycle-exact (one debug clock
per access), so memory reads can be misaligned. Remaining work and the
diagnosis are in the arm7-debug-dclk-timing note.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 17:58:45 +02:00
aecaebdaf1 ftdi+arm_debug: honor last-bit TMS; ARM7 EmbeddedICE halt/resume
The FTDI MPSSE xfer ignored TMS on data bits, so bscan_set_ir never
latched the IR — the bscan exit needs the last bit to clock
Shift->Exit1 so the following Update latches. It only ever worked on the
Digilent driver. Now the final TMS-flagged bit is clocked through the
TMS pin (carrying TDI/TDO), so bscan_set_ir/bscan_shift_dr reach
Exit1->Update correctly.

Implement ARM7TDMI EmbeddedICE access (SCAN_N + INTEST, 38-bit scan
chain 2 register R/W with pipelined read) and halt (force DBGRQ, poll
DBGACK) / resume (clear DBGRQ + RESTART). New cpu_halt / cpu_resume
commands; arm_debug links bscan.

Validated on an LPC2103 over the ARM-USB-OCD: set_ir(IDCODE) reads
0x4F1F0F0F, EmbeddedICE registers round-trip, cpu_halt -> DBGACK,
cpu_resume releases the core.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 16:34:08 +02:00
6 changed files with 1027 additions and 50 deletions

111
CLAUDE.md
View File

@@ -45,7 +45,7 @@ src/
├── svf/ SVF player (svf_play): SIR/SDR/RUNTEST/STATE, masked compare ├── svf/ SVF player (svf_play): SIR/SDR/RUNTEST/STATE, masked compare
├── probes/ Probe-config profiles loader (parses data/probes.yaml, libyaml) ├── probes/ Probe-config profiles loader (parses data/probes.yaml, libyaml)
├── program/ `program` dispatch: routes a target to its backend by `prog` ├── program/ `program` dispatch: routes a target to its backend by `prog`
└── arm_debug/ ARM (EmbeddedICE) debug + flash backend (not implemented yet) └── arm_debug/ ARM7TDMI (EmbeddedICE) debug: halt/resume, Thumb->ARM, memory read (works); flash-write backend TODO
data/ — runtime resources, looked up CWD-relative — data/ — runtime resources, looked up CWD-relative —
├── targets.yaml Target registry (FPGAs + CPUs: IDCODE, BSDL/proxy, debug, flash, prog) ├── targets.yaml Target registry (FPGAs + CPUs: IDCODE, BSDL/proxy, debug, flash, prog)
├── probes.yaml Probe-config profiles (defaults + per-probe overrides) ├── probes.yaml Probe-config profiles (defaults + per-probe overrides)
@@ -72,7 +72,7 @@ Adding a feature usually means adding a new script command in
| 4 | script commands | **done** (commit `d6f843e`) | `flash_detect`, `flash_read` (+file), `flash_erase`, `flash_write`, `flash_verify`. Full set validated on KCU105 (save/erase/write-random/verify/restore round-trip). ~100 KB/s write once the proxy is loaded. | | 4 | script commands | **done** (commit `d6f843e`) | `flash_detect`, `flash_read` (+file), `flash_erase`, `flash_write`, `flash_verify`. Full set validated on KCU105 (save/erase/write-random/verify/restore round-trip). ~100 KB/s write once the proxy is loaded. |
| 5 | `probes/` + JTAG-link | **done** | `data/probes.yaml` probe-config profiles (`jtag_open <idx> <profile>`, `jtag_profiles`, `jtag_close`); driver-neutral `JTAG_TCK_FREQ_KHZ`/`JTAG_RTCK`; device `max_tck_khz` clock cap resolved at `jtag_autoinit`; `prog` method tag. See the config-strategy design note. Validated on the IGLOO2 (FlashPro). | | 5 | `probes/` + JTAG-link | **done** | `data/probes.yaml` probe-config profiles (`jtag_open <idx> <profile>`, `jtag_profiles`, `jtag_close`); driver-neutral `JTAG_TCK_FREQ_KHZ`/`JTAG_RTCK`; device `max_tck_khz` clock cap resolved at `jtag_autoinit`; `prog` method tag. See the config-strategy design note. Validated on the IGLOO2 (FlashPro). |
| 6 | `svf/` | **done** (subset, commit `c77d86e`) | SVF player + `svf_play`: SIR/SDR with masked TDO compare, RUNTEST, STATE — single-device. Validated on the IGLOO2 IDCODE. | | 6 | `svf/` | **done** (subset, commit `c77d86e`) | SVF player + `svf_play`: SIR/SDR with masked TDO compare, RUNTEST, STATE — single-device. Validated on the IGLOO2 IDCODE. |
| 7 | `target/` + `program/` + `arm_debug/` | **structure done; ARM impl TODO** | Generalized `fpga/` into a kind-aware `target/` registry (FPGA \| CPU). `program <dev> <file>` dispatches by `prog` (svf wired; proxy_spi points at the flash workflow). `arm_debug/` (EmbeddedICE) + `arm_flash` backend are declared but not implemented; `arm-usb-ocd` probe profile added. FPGA path re-validated on the IGLOO2. See the ARM-debug design note. | | 7 | `target/` + `program/` + `arm_debug/` | **structure done; ARM read works, flash-write TODO** | Generalized `fpga/` into a kind-aware `target/` registry (FPGA \| CPU). `program <dev> <file>` dispatches by `prog` (svf wired; proxy_spi points at the flash workflow). `arm_debug/` (ARM7TDMI EmbeddedICE) does halt/resume, Thumb->ARM, and system-speed **memory read**`cpu_read`/`cpu_halt`/`cpu_resume`; validated by dumping an LPC2103's 32 KB flash to Intel HEX. Context save/restore + the `arm_flash` write backend are TODO. `arm-usb-ocd` probe profile added. See the ARM-debug design note. |
| 8 | FTDI driver → libftdi1 | **done** | Replaced the proprietary libftd2xx with open-source libftdi1 (libusb): any VID:PID + auto kernel-detach. Detected an NXP LPC2103 (ARM7TDMI-S, IDCODE 0x4F1F0F0F) over an Olimex ARM-USB-OCD — the probe the old lib couldn't enumerate. Vendored `src/libs/libftd2xx` removed. | | 8 | FTDI driver → libftdi1 | **done** | Replaced the proprietary libftd2xx with open-source libftdi1 (libusb): any VID:PID + auto kernel-detach. Detected an NXP LPC2103 (ARM7TDMI-S, IDCODE 0x4F1F0F0F) over an Olimex ARM-USB-OCD — the probe the old lib couldn't enumerate. Vendored `src/libs/libftd2xx` removed. |
@@ -318,8 +318,12 @@ tool.
## Programming CPUs over JTAG: ARM7/9 via EmbeddedICE (design note) ## Programming CPUs over JTAG: ARM7/9 via EmbeddedICE (design note)
Structure in place (`target/` kind=cpu, `program/` dispatch, `arm_debug/` Memory **read works** (`cpu_read`/`cpu_halt`/`cpu_resume` on ARM7TDMI
+ `arm_flash` declared); the debug/flash code is the next real work. EmbeddedICE): halt, Thumb->ARM switch, system-speed `LDMIA` read, dumped
to bin/Intel HEX — validated by an LPC2103 32 KB flash dump. Context
save/restore (for clean resume + repeated reads) and the `arm_flash`
write backend are the remaining work. See "What's left" and the
arm7-debug-dclk-timing note in `~/.claude/` for the cycle-exact timing.
### Why CPUs are a different shape ### Why CPUs are a different shape
@@ -358,9 +362,14 @@ filling from the Olimex schematic / OpenOCD's interface config.
### What's left (the implementation) ### What's left (the implementation)
EmbeddedICE scan-chain access + halt/resume + memory R/W, then a per-MCU Done: EmbeddedICE scan-chain access, halt/resume, Thumb->ARM, debug-speed
RAM flash loader (LPC2xxx, AT91SAM7, …) and the `arm_flash` backend. The register read/write, and system-speed **memory read** (`cpu_read`).
registry, dispatch, probe-profile and config layers are ready for it. Reliable in a power-on → one-halt → dump flow; reads clobber r0..r14/PC
with no context save/restore, so resume isn't clean and repeated halts in
one session degrade (power-cycle between dumps). Left: register **context
save/restore** (clean resume + repeated reads), then a per-MCU RAM flash
loader (LPC2xxx, AT91SAM7, …) and the `arm_flash` write backend. The
registry, dispatch, probe-profile and config layers are ready.
## Embedded port (design note) ## Embedded port (design note)
@@ -407,6 +416,94 @@ images (huge SVF vectors) stay on the host or a large MCU.
- 3.3 V GPIO → **level-shift** for 1.8 V JTAG targets (e.g. KCU105), - 3.3 V GPIO → **level-shift** for 1.8 V JTAG targets (e.g. KCU105),
same as on the host side. same as on the host side.
## Boundary-scan board test (design note)
Not yet implemented — captured for the strongest fit of this tool.
Guiding idea: **feed a board's netlist + the BSDL of each device on the
JTAG chain, and automatically generate & run boundary-scan interconnect
tests** (opens / shorts / stuck-at) plus chain-integrity checks. This is
bs_explorer's real niche: a job badly served by open source (OpenOCD does
almost no boundary scan; the good tools are commercial — XJTAG, JTAG
Technologies — or legacy like UrJTAG) and where proprietary debuggers
(ST-LINK, JTAGICE) don't play at all.
### What it tests (and doesn't)
Tests the **interconnect between BS-accessible pins**: drive a net from
one boundary-scan output (EXTEST) and sense it on the other BS pins of
that net. Detects opens (driven value not seen), shorts (two nets read
alike when driven differently), stuck-at, and chain integrity. Cannot
test nets with no BS pin, analog, or logic inside a chip. The Viveris
`bus_over_jtag` (SPI/I²C/parallel over EXTEST) also enables testing
*connected memory* as an optional cluster test.
### Inputs
1. **Netlist** — `net → [(refdes, pin), …]`. Start with a neutral format
(CSV `net,refdes,pin` or YAML); a KiCad importer later.
2. **BSDL** per scannable device (already handled by `bsdl_parser`:
IDCODE, IR length, boundary register, per-cell function, pin↔cell,
control/enable cell + disable value).
3. **Board file (`board.yaml`)** — the glue: chain order (TDI→TDO),
`refdes → bsdl`, netlist path, power/ground/clock nets to **exclude**,
pull-up/down and series-resistor info, compliance/"safe" pins to force.
### Architecture (a new `bstest/` layer)
Reuses `jtag_core` (TAP, IR/DR shift), `bsdl_parser`, `bscan_*`, the YAML
config pattern. **The central new primitive is a whole-chain boundary
register**: today the Viveris pin API is single-device, but board test
must drive/sense **all pins of all BS devices in one DR pass** — the
chain BSR is the concatenation of every device's boundary register
(others in BYPASS), in the right bit order. This is the enabler, and it
**lifts the "single device on the chain" assumption** baked into the
current `bscan_*` primitives — validate it early on a real 2-device
chain.
Layers:
- **A. Chain model** — ordered devices (BSDL + IDCODE + length), checked
against a live `jtag_scan`; map `(refdes, pin) → device → BSDL port →
global BSR bit(s)` (data cell + control/enable cell + disable value).
- **B. Netlist ingest + net classification** — drivable (≥1 BS
output/bidir), sense-only, untestable (no BS pin), excluded
(power/clock).
- **C. Vector generation** — give each testable net a unique binary code
over N steps (N ≈ log2(#nets) to disambiguate every short; counting
sequence). Per step: exactly **one driver per net** sets its bit, all
other pins of the net go Hi-Z, shift the chain BSR (EXTEST), capture
the input cells.
- **D. Execution** — per step, build the full-chain DR, shift, capture.
- **E. Diagnosis + report** — compare captured vs expected → open / short
(with the net pair) / stuck-at / missing device; pass-fail per net +
fault list (refdes/pin) + a **coverage report** (which pins are
BS-testable).
### The hard parts (honest)
- **Safety**: EXTEST drives real pins on a powered board → contention
risk (a BS driver against a non-BS output). The generator must
guarantee **one active driver per net per step**, tristate the rest,
exclude power/clock nets, and honour the BSDL **compliance patterns**.
Treat this as a generator invariant, not an option.
- **Control/bidir cells**: driving needs the enable cell set, sensing
needs Hi-Z — all in the BSDL, but the mapping is the bulk of the work.
- **Multi-device bit ordering** (TDI-side device shifts last; IR
concatenation).
- **Pulls / series resistors** skew sensing of undriven nets — model from
`board.yaml`.
- **Minimal, safe vector generation** (counting sequence / adjacency
colouring) — well documented, but the "smart" piece.
### Phasing
| Phase | Content | Value |
|-------|---------|-------|
| 1 | Chain model + infrastructure test (IDCODE / BYPASS / chain length) | Immediate, mostly on existing code |
| 2 | **Whole-chain BSR primitive** (drive/sense all pins, multi-device, EXTEST) | The enabler |
| 3 | Netlist ingest + `(refdes,pin)→bit` mapping + net classification | — |
| 4 | Interconnect vector gen + execution + open/short/stuck diagnosis + report | The deliverable |
| 5 | Pulls/series, bidir, clusters, connected-memory tests via `bus_over_jtag` | Refinement |
## External references ## External references
- **BSCAN proxy bitstreams**: `quartiq/bscan_spi_bitstreams` (MIT). - **BSCAN proxy bitstreams**: `quartiq/bscan_spi_bitstreams` (MIT).

View File

@@ -482,16 +482,55 @@ bs_explorer> program 0 design.svf # prog=svf -> plays the SVF
(`bscan_load_bitstream` + `flash_write`/`flash_verify`); `arm_flash` (`bscan_load_bitstream` + `flash_write`/`flash_verify`); `arm_flash`
routes to the ARM backend. routes to the ARM backend.
### CPU targets (ARM7/9) — structure only ### CPU targets (ARM7/9): reading memory over JTAG
The registry also describes **CPUs** (`kind: cpu`): an ARM debug The registry also describes **CPUs** (`kind: cpu`): an ARM debug
transport (`debug: embeddedice`), work-RAM and an on-chip flash region. transport (`debug: embeddedice`), work-RAM and an on-chip flash region.
`target_list` shows them and `program` routes `prog: arm_flash` to the An Olimex ARM-USB-OCD is an FT2232, so it opens with the existing FTDI
ARM backend — but that backend (halt the core over JTAG, load a RAM driver via the `arm-usb-ocd` probe profile.
flasher, program internal flash) is **not implemented yet**. An Olimex
ARM-USB-OCD is an FT2232, so it opens with the existing FTDI driver via For an ARM7TDMI core (EmbeddedICE) three commands work today:
the `arm-usb-ocd` probe profile. See the ARM-debug design note in
`CLAUDE.md`. ```
bs_explorer> jtag_open 0 arm-usb-ocd
bs_explorer> jtag_scan # IDCODE, e.g. 0x4F1F0F0F (LPC2103)
bs_explorer> cpu_read 0 0x0 0x8000 flash.hex hex # dump 32 KB flash to Intel HEX
bs_explorer> cpu_halt 0 # halt only (DBGACK)
bs_explorer> cpu_resume 0 # release from debug
```
`cpu_read <dev> <addr> <len> <file> <bin|hex>` halts the core, reads
memory by **instruction injection** (halt via EmbeddedICE, switch a
Thumb-state core to ARM, then a system-speed `LDMIA` reads real memory),
and writes the bytes as raw binary or Intel HEX. Omit `<file>` for a
console hex-dump. Validated by dumping an LPC2103's full 32 KB flash and
round-tripping the `.hex` through `objcopy` (all records/checksums valid,
correct ARM vector table). Debug-speed core-register read/write also
works (it is how the address is set up and the loaded words are read
back).
**Operating context — when it works, and the limits.** Reading is
reliable in this flow:
- **One halt per power-cycle.** The intended sequence is *power on the
board → `jtag_scan` → one `cpu_read` (which halts, reads, leaves the
core halted)*. A single `cpu_read` call dumps any length in one halt
(it reads in 14-register blocks internally), so dumping all of flash is
one command.
- **Reads clobber r0r14 and the PC**, and there is **no register
context save/restore yet**. So `cpu_resume` cannot cleanly continue the
original firmware, and a *second* `cpu_read` (or `cpu_halt`) in the
same session re-halts an already-halted, register-clobbered core, which
is messy and can time out (`sys-speed access timed out`). If that
happens, **power-cycle the board** and run one `cpu_read` again.
- ARM7TDMI only so far (the EmbeddedICE scan-chain debug). Cortex-M
(ADIv5/SWD) is a different transport.
The why-and-how of the cycle-exact JTAG timing this relies on is in the
ARM-debug design note in `CLAUDE.md`. The next step toward clean resume
and repeated reads is register **context save/restore**; the `arm_flash`
*write* backend (program internal flash via a RAM loader) builds on that
and is not implemented yet.
## Troubleshooting cheat sheet ## Troubleshooting cheat sheet
@@ -507,6 +546,7 @@ the `arm-usb-ocd` probe profile. See the ARM-debug design note in
| Detected fine, then reads turn to garbage / `0x00000000` mid-session | Target board lost power — JTAG floats (the USB probe stays enumerated regardless). Re-power the board. | | Detected fine, then reads turn to garbage / `0x00000000` mid-session | Target board lost power — JTAG floats (the USB probe stays enumerated regardless). Re-power the board. |
| FT4232H FlashPro: `jtag_scan` finds 0 devices | JTAG is on channel A (index 0) and needs `ADBUS4` high-Z — open with the profile: `jtag_open 0 flashpro`. | | FT4232H FlashPro: `jtag_scan` finds 0 devices | JTAG is on channel A (index 0) and needs `ADBUS4` high-Z — open with the profile: `jtag_open 0 flashpro`. |
| `svf_play` mismatches only on the very first compare | FTDI link warm-up; `svf_play` handles it, but a bare `bscan_shift_dr` straight after `jtag_open` may need a `jtag_scan` first. | | `svf_play` mismatches only on the very first compare | FTDI link warm-up; `svf_play` handles it, but a bare `bscan_shift_dr` straight after `jtag_open` may need a `jtag_scan` first. |
| `cpu_read`: `sys-speed access timed out` | The core was re-halted in a degraded state (a previous `cpu_read`/`cpu_halt` left it halted with clobbered registers). Power-cycle the board, then run one `cpu_read`. |
## Where to go from here ## Where to go from here

View File

@@ -3,3 +3,6 @@ file(GLOB_RECURSE ALL_SOURCES "*.c")
include_directories(${CMAKE_CURRENT_SOURCE_DIR}/..) include_directories(${CMAKE_CURRENT_SOURCE_DIR}/..)
add_library(arm_debug ${ALL_SOURCES}) add_library(arm_debug ${ALL_SOURCES})
# arm_debug drives the EmbeddedICE scan chains via the bscan TAP primitives.
target_link_libraries(arm_debug PUBLIC bscan)

View File

@@ -1,32 +1,648 @@
#include <stdio.h> #include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include "bscan/bscan.h"
#include "arm_debug.h" #include "arm_debug.h"
/* Not implemented yet — every entry point reports failure for now. The /*
* real work — EmbeddedICE scan chains, halt/resume, memory access over * ARM7TDMI debug over JTAG (EmbeddedICE), built on the bscan_* TAP
* the bscan_* primitives, and a per-MCU RAM flash loader — slots in * primitives. Bring-up state:
* behind these signatures without touching callers. */ * - done: EmbeddedICE register access; halt / resume; Thumb->ARM
* switch; cycle-exact chain-1 access (one debug clock per access);
* debug-speed register read/write (known-pattern round-trip);
* system-speed LDM memory read. The key to memory reads is keeping
* the core's debug-clock count exact: chain switches and EICE reads
* use "quiet" ops that never enter Run-Test/Idle (so they don't clock
* the core and clobber loaded registers), and the sys-speed poll
* drives the access one clock at a time, stopping the instant
* SYSCOMP appears. Validated by dumping the LPC2103's full 32 KB
* flash to Intel HEX (objcopy-verified, correct vectors + code).
* - caveat: the read clobbers r0..r14 and there is no context
* save/restore, so the intended flow is power-on -> one halt -> dump.
* Repeated halt/read cycles without a power-cycle degrade (a later
* re-halt of the clobbered core is messy and may time out).
* - todo: context save/restore (clean resume + repeated reads),
* memory write, arm_flash.
*/
/* ARM7TDMI public JTAG instructions (IR length 4). */
#define ARM7_IR_LEN 4
#define IR_SCAN_N 0x2
#define IR_INTEST 0xC
#define IR_RESTART 0x4
/* Scan chains: #1 = debug (instruction/data bus), #2 = EmbeddedICE. */
#define SC_DEBUG 1
#define SC_EICE 2
/* EmbeddedICE register addresses. */
#define EICE_DBG_CTRL 0x00
#define EICE_DBG_STATUS 0x01
/* EmbeddedICE Debug Control register bits (write). */
#define DBG_CTRL_DBGACK (1u << 0) /* force DBGACK */
#define DBG_CTRL_DBGRQ (1u << 1) /* request debug entry */
#define DBG_CTRL_INTDIS (1u << 2) /* disable interrupts in debug */
/* EmbeddedICE Debug Status register bits (read). */
#define DBG_STATUS_DBGACK (1u << 0)
#define DBG_STATUS_SYSCOMP (1u << 3)
#define DBG_STATUS_ITBIT (1u << 4) /* core was in Thumb state */
/* ARMv4 opcodes used for register/memory access via instruction
* injection (see ARM7TDMI TRM, debug chapter; mirrors OpenOCD). */
#define ARM_NOP 0xe1a08008u /* mov r8, r8 */
#define ARM_STMIA(rn, list, w) (0xe8800000u | ((unsigned)(w) << 21) | ((unsigned)(rn) << 16) | (unsigned)(list))
#define ARM_LDMIA(rn, list, w) (0xe8900000u | ((unsigned)(w) << 21) | ((unsigned)(rn) << 16) | (unsigned)(list))
/* Thumb opcodes (16-bit, duplicated into both halfwords as the debug
* data bus presents them) used only to switch a Thumb-state core to ARM
* state on debug entry. */
#define THUMB_DUP(op) ((unsigned)(op) | ((unsigned)(op) << 16))
#define ARM_T_NOP THUMB_DUP(0x46c0) /* mov r8, r8 */
#define ARM_T_STR(rd, rn) THUMB_DUP(0x6000 | (rd) | ((rn) << 3))
#define ARM_T_MOV(rd, rm) THUMB_DUP(0x4600 | ((rd) & 0x7) | (((rd) & 0x8) << 4) | \
(((rm) & 0x7) << 3) | (((rm) & 0x8) << 3))
#define ARM_T_LDR_PCREL(rd) THUMB_DUP(0x4800 | ((rd) << 8))
#define ARM_T_BX(rm) THUMB_DUP(0x4700 | ((rm) << 3))
/* Reverse the 32 bits of a word. Scan chain 1 shifts instructions and
* data with the bit order flipped (TRM); match OpenOCD's flip_u32. */
static uint32_t flip32(uint32_t v)
{
v = ((v & 0xFFFF0000u) >> 16) | ((v & 0x0000FFFFu) << 16);
v = ((v & 0xFF00FF00u) >> 8) | ((v & 0x00FF00FFu) << 8);
v = ((v & 0xF0F0F0F0u) >> 4) | ((v & 0x0F0F0F0Fu) << 4);
v = ((v & 0xCCCCCCCCu) >> 2) | ((v & 0x33333333u) << 2);
v = ((v & 0xAAAAAAAAu) >> 1) | ((v & 0x55555555u) << 1);
return v;
}
/* One EmbeddedICE scan-chain-2 access: 38 bits LSB-first =
* data[0..31] | address[32..36] | read/write[37] (1 = write). On a read,
* the captured data belongs to the *previously* addressed register. */
static int eice_scan(jtag_core *jc, int addr, int rw, uint32_t data, uint32_t *out)
{
uint8_t buf[5], cap[5];
int i, r;
memset(buf, 0, sizeof(buf));
for (i = 0; i < 32; i++)
if (data & (1u << i)) buf[i >> 3] |= (uint8_t)(1u << (i & 7));
for (i = 0; i < 5; i++)
if (addr & (1 << i)) { int b = 32 + i; buf[b >> 3] |= (uint8_t)(1u << (b & 7)); }
if (rw) buf[37 >> 3] |= (uint8_t)(1u << (37 & 7));
r = bscan_shift_dr(jc, buf, out ? cap : NULL, 38);
if (r < 0) return -1;
if (out) {
uint32_t v = 0;
for (i = 0; i < 32; i++)
if (cap[i >> 3] & (1u << (i & 7))) v |= (1u << i);
*out = v;
}
return 0;
}
/* Select a scan chain via SCAN_N (4-bit register) and enter INTEST so
* subsequent DR shifts hit that chain. */
static int chain_select(jtag_core *jc, int chain)
{
uint8_t sc = (uint8_t)chain;
if (bscan_set_ir(jc, IR_SCAN_N, ARM7_IR_LEN) < 0) return -1;
if (bscan_shift_dr(jc, &sc, NULL, ARM7_IR_LEN) < 0) return -1;
if (bscan_set_ir(jc, IR_INTEST, ARM7_IR_LEN) < 0) return -1;
return 0;
}
static int eice_select(jtag_core *jc)
{
return chain_select(jc, SC_EICE);
}
static int eice_write(jtag_core *jc, int addr, uint32_t val)
{
return eice_scan(jc, addr, 1, val, NULL);
}
/* Read an EmbeddedICE register (two scans: request, then capture). */
static int eice_read(jtag_core *jc, int addr, uint32_t *val)
{
if (eice_scan(jc, addr, 0, 0, NULL) < 0) return -1; /* request */
if (eice_scan(jc, addr, 0, 0, val) < 0) return -1; /* capture */
return 0;
}
/* ---- "Quiet" TAP ops: park in Pause, never enter Run-Test/Idle -------
* The debug core advances one step per Run-Test/Idle clock (Update-DR
* alone does NOT clock it). After a system-speed access has re-entered
* debug, any stray RTI clock executes a garbage instruction and clobbers
* the registers the LDM just loaded. These ops pass through Update but
* never RTI, so they move the TAP / shift IR & DR without clocking the
* core. They start from and leave the TAP in a Pause state. */
/* Run-Test/Idle -> Pause-DR (the one transition that leaves RTI; the core
* is either running on MCLK (post-RESTART) or this is a deliberate flush). */
static void quiet_enter(jtag_core *jc)
{
unsigned char tms[4];
tms[0] = JTAG_STR_TMS; /* RTI -> Select-DR */
tms[1] = 0; /* -> Capture-DR */
tms[2] = JTAG_STR_TMS; /* -> Exit1-DR */
tms[3] = 0; /* -> Pause-DR */
jc->io_functions.drv_TX_TMS(jc, tms, 4);
}
/* Pause-* -> ... -> Run-Test/Idle (re-arms the normal RTI-based ops). */
static void quiet_exit(jtag_core *jc)
{
unsigned char tms[3];
tms[0] = JTAG_STR_TMS; /* Pause -> Exit2 */
tms[1] = JTAG_STR_TMS; /* -> Update */
tms[2] = 0; /* -> Run-Test/Idle */
jc->io_functions.drv_TX_TMS(jc, tms, 3);
}
/* Set IR = opcode (len bits), parking in Pause-IR. Starts from any Pause. */
static int quiet_set_ir(jtag_core *jc, unsigned int opcode, int len)
{
unsigned char tms[6], *d;
int i;
/* Pause -> Exit2 -> Update -> Select-DR -> Select-IR -> Capture-IR -> Shift-IR */
tms[0] = JTAG_STR_TMS; tms[1] = JTAG_STR_TMS; tms[2] = JTAG_STR_TMS;
tms[3] = JTAG_STR_TMS; tms[4] = 0; tms[5] = 0;
jc->io_functions.drv_TX_TMS(jc, tms, 6);
d = malloc(len);
if (!d) return -1;
for (i = 0; i < len; i++) {
d[i] = ((opcode >> i) & 1u) ? JTAG_STR_DOUT : 0;
if (i == len - 1) d[i] |= JTAG_STR_TMS; /* last bit -> Exit1-IR */
}
jc->io_functions.drv_TXRX_DATA(jc, d, NULL, len);
free(d);
tms[0] = 0; /* Exit1-IR -> Pause-IR */
jc->io_functions.drv_TX_TMS(jc, tms, 1);
return 0;
}
/* Shift DR (n bits), parking in Pause-DR. Starts from any Pause. */
static int quiet_shift_dr(jtag_core *jc, const uint8_t *tdi, uint8_t *tdo, int n)
{
unsigned char tms[5], *out, *in = NULL;
int i;
/* Pause -> Exit2 -> Update -> Select-DR -> Capture-DR -> Shift-DR */
tms[0] = JTAG_STR_TMS; tms[1] = JTAG_STR_TMS; tms[2] = JTAG_STR_TMS;
tms[3] = 0; tms[4] = 0;
jc->io_functions.drv_TX_TMS(jc, tms, 5);
out = malloc(n);
if (!out) return -1;
if (tdo) { in = malloc(n); if (!in) { free(out); return -1; } }
for (i = 0; i < n; i++) {
uint8_t b = tdi ? ((tdi[i / 8] >> (i & 7)) & 1u) : 0;
out[i] = b ? JTAG_STR_DOUT : 0;
if (i == n - 1) out[i] |= JTAG_STR_TMS; /* last bit -> Exit1-DR */
}
jc->io_functions.drv_TXRX_DATA(jc, out, in, n);
if (tdo && in) {
memset(tdo, 0, (size_t)((n + 7) / 8));
for (i = 0; i < n; i++) if (in[i]) tdo[i / 8] |= (uint8_t)(1u << (i & 7));
}
free(out); free(in);
tms[0] = 0; /* Exit1-DR -> Pause-DR */
jc->io_functions.drv_TX_TMS(jc, tms, 1);
return 0;
}
/* Select a scan chain without clocking the core (quiet). Starts from a
* Pause state, ends in Pause-IR (INTEST loaded). */
static int quiet_chain_select(jtag_core *jc, int chain)
{
uint8_t sc = (uint8_t)chain;
if (quiet_set_ir(jc, IR_SCAN_N, ARM7_IR_LEN) < 0) return -1;
if (quiet_shift_dr(jc, &sc, NULL, ARM7_IR_LEN) < 0) return -1;
if (quiet_set_ir(jc, IR_INTEST, ARM7_IR_LEN) < 0) return -1;
return 0;
}
/* Build a 38-bit EmbeddedICE scan-2 frame (data|addr|rw) into buf. */
static void eice_frame(uint8_t buf[5], int addr, int rw, uint32_t data)
{
int i;
memset(buf, 0, 5);
for (i = 0; i < 32; i++)
if (data & (1u << i)) buf[i >> 3] |= (uint8_t)(1u << (i & 7));
for (i = 0; i < 5; i++)
if (addr & (1 << i)) { int b = 32 + i; buf[b >> 3] |= (uint8_t)(1u << (b & 7)); }
if (rw) buf[37 >> 3] |= (uint8_t)(1u << (37 & 7));
}
/* Latch an instruction into the scan-chain-1 instruction register
* WITHOUT clocking the core: shift the 33-bit frame, pass through
* Update-DR (latches the parallel output) but never Run-Test/Idle. Used
* to displace a stale instruction (e.g. the system-speed LDMIA) that
* would otherwise be re-executed and clobber registers on the next
* debug clock. Chain 1 + INTEST must be selected; starts/ends in Pause. */
static int quiet_latch_chain1(jtag_core *jc, uint32_t instr)
{
unsigned char tms[6], out[33];
uint32_t f = flip32(instr);
int i;
/* Pause -> Exit2 -> Update -> Select-DR -> Capture-DR -> Shift-DR */
tms[0] = JTAG_STR_TMS; tms[1] = JTAG_STR_TMS; tms[2] = JTAG_STR_TMS;
tms[3] = 0; tms[4] = 0;
jc->io_functions.drv_TX_TMS(jc, tms, 5);
out[0] = 0; /* no SYSSPEED */
for (i = 0; i < 32; i++)
out[1 + i] = (f & (1u << i)) ? JTAG_STR_DOUT : 0;
out[32] |= JTAG_STR_TMS; /* last bit -> Exit1-DR */
jc->io_functions.drv_TXRX_DATA(jc, out, NULL, 33);
/* Exit1 -> Update-DR (latch, no clock) -> Select-DR -> Capture-DR
* -> Exit1 -> Pause-DR */
tms[0] = JTAG_STR_TMS; tms[1] = JTAG_STR_TMS; tms[2] = 0;
tms[3] = JTAG_STR_TMS; tms[4] = 0;
jc->io_functions.drv_TX_TMS(jc, tms, 5);
return 0;
}
/* Clock the debug core exactly n steps: enter Run-Test/Idle for n TCKs,
* then return to a Pause state. This is the ONLY thing that advances the
* core, so callers control the debug-clock count precisely (the chain
* switches and EICE reads above never enter RTI, hence never clock it).
* Starts from any Pause state, ends in Pause-DR. */
static void clock_core(jtag_core *jc, int n)
{
unsigned char *tms;
int i, k = 0;
if (n < 0) n = 0;
tms = malloc((size_t)(n + 8));
if (!tms) return;
tms[k++] = JTAG_STR_TMS; /* Pause -> Exit2 */
tms[k++] = JTAG_STR_TMS; /* -> Update */
tms[k++] = 0; /* -> Run-Test/Idle (1st in-RTI clock) */
for (i = 1; i < n; i++) tms[k++] = 0; /* dwell: n total RTI clocks */
tms[k++] = JTAG_STR_TMS; /* RTI -> Select-DR */
tms[k++] = 0; /* -> Capture-DR */
tms[k++] = JTAG_STR_TMS; /* -> Exit1-DR */
tms[k++] = 0; /* -> Pause-DR */
jc->io_functions.drv_TX_TMS(jc, tms, k);
free(tms);
}
/* Read an EmbeddedICE register without clocking the core (quiet). The
* EmbeddedICE chain (#2) must already be selected via quiet ops. */
static int quiet_eice_read(jtag_core *jc, int addr, uint32_t *val)
{
uint8_t buf[5], cap[5];
uint32_t v = 0;
int i;
eice_frame(buf, addr, 0, 0);
if (quiet_shift_dr(jc, buf, NULL, 38) < 0) return -1; /* request */
if (quiet_shift_dr(jc, buf, cap, 38) < 0) return -1; /* capture */
for (i = 0; i < 32; i++)
if (cap[i >> 3] & (1u << (i & 7))) v |= (1u << i);
*val = v;
return 0;
}
int arm_debug_halt(jtag_core *jc, const jtag_target *t) int arm_debug_halt(jtag_core *jc, const jtag_target *t)
{ {
(void)jc; (void)t; uint32_t status = 0;
fprintf(stderr, "arm_debug: halt not implemented yet\n"); (void)t;
bscan_tap_reset(jc);
if (eice_select(jc) < 0)
return -1; return -1;
/* DBGRQ -> core enters debug at the next instruction boundary;
* poll DBGACK (it isn't instantaneous). */
if (eice_write(jc, EICE_DBG_CTRL, DBG_CTRL_DBGRQ) < 0)
return -1;
{
int tries;
for (tries = 0; tries < 100; tries++) {
bscan_idle_cycles(jc, 64);
if (eice_read(jc, EICE_DBG_STATUS, &status) < 0)
return -1;
if (status & DBG_STATUS_DBGACK)
break;
}
if (!(status & DBG_STATUS_DBGACK)) {
fprintf(stderr, "arm_debug: halt requested but no DBGACK (status 0x%08x)\n", status);
return -1;
}
}
/* Debug entry: force DBGACK, deassert DBGRQ (else the core keeps
* re-requesting debug and injected instructions can't execute), and
* disable interrupts. Matches OpenOCD's arm7_9_debug_entry. */
if (eice_write(jc, EICE_DBG_CTRL, DBG_CTRL_DBGACK | DBG_CTRL_INTDIS) < 0)
return -1;
if (status & DBG_STATUS_ITBIT)
fprintf(stderr, "arm_debug: warning - core halted in Thumb state; "
"ARM instruction injection will be wrong\n");
return 0;
} }
int arm_debug_resume(jtag_core *jc, const jtag_target *t) int arm_debug_resume(jtag_core *jc, const jtag_target *t)
{ {
(void)jc; (void)t; (void)t;
fprintf(stderr, "arm_debug: resume not implemented yet\n"); if (eice_select(jc) < 0)
return -1;
/* Clear DBGRQ, then RESTART exits debug state. */
if (eice_write(jc, EICE_DBG_CTRL, 0x0) < 0)
return -1;
if (bscan_set_ir(jc, IR_RESTART, ARM7_IR_LEN) < 0)
return -1;
bscan_idle_cycles(jc, 16);
return 0;
}
/* Scan-chain-1 (debug bus) access session. Each access is one
* bscan_shift_dr of the 33-bit frame, which captures the bus at
* Capture-DR, applies the instruction at Update-DR and advances the core
* exactly one debug step (Update -> Run-Test/Idle) — one access == one
* debug clock. The captured value reflects the bus from the previous
* step's instruction, the standard ARM7TDMI pipeline that the NOP padding
* in read/write_core_regs accounts for. (c1_init/c1_end bracket a run of
* accesses; c1_end is currently a no-op since bscan_shift_dr self-completes
* each access, but callers must still avoid chain switches mid-run — those
* clock the halted core and shift the pipeline phase.) */
typedef struct {
jtag_core *jc;
int started;
} c1_ctx;
static void c1_init(c1_ctx *c, jtag_core *jc) { c->jc = jc; c->started = 0; }
/* One chain-1 access: shift 33 bits = breakpoint[0] | flip32(instr)[1..32].
* sysspeed=1 marks the following instruction to run at system speed.
* capture != NULL reads back the 32-bit debug data bus. */
static int c1_xfer(c1_ctx *c, uint32_t instr, int sysspeed, uint32_t *capture)
{
uint8_t buf[5], cap[5];
uint32_t f = flip32(instr);
int i;
memset(buf, 0, sizeof(buf));
if (sysspeed) buf[0] |= 1u; /* bit 0 = breakpoint/SYSSPEED */
for (i = 0; i < 32; i++) /* bits 1..32 = flip32(instr) */
if (f & (1u << i)) { int b = 1 + i; buf[b >> 3] |= (uint8_t)(1u << (b & 7)); }
/* Shift 33 bits: captures the bus at Capture-DR, applies the
* instruction at Update-DR and advances the core exactly one debug
* step via the Update->Run-Test/Idle transition. One access == one
* debug clock (an extra idle dwell would double-clock the pipeline). */
if (bscan_shift_dr(c->jc, buf, capture ? cap : NULL, 33) < 0)
return -1;
c->started = 1;
if (capture) {
uint32_t raw = 0;
for (i = 0; i < 32; i++) {
int b = 1 + i;
if (cap[b >> 3] & (1u << (b & 7))) raw |= (1u << i);
}
*capture = flip32(raw);
}
return 0;
}
static int c1_end(c1_ctx *c) { (void)c; return 0; }
/* Load core registers from the debug data bus (debug speed):
* LDMIA r<rn>, {regs} fed by the scanned-in values. */
static int write_core_regs(c1_ctx *c, int rn, uint32_t mask, const uint32_t *vals)
{
int i;
if (c1_xfer(c, ARM_LDMIA(rn, mask & 0xffff, 0), 0, NULL) < 0) return -1;
if (c1_xfer(c, ARM_NOP, 0, NULL) < 0) return -1; /* DECODE */
if (c1_xfer(c, ARM_NOP, 0, NULL) < 0) return -1; /* EXECUTE 1 */
for (i = 0; i <= 15; i++)
if (mask & (1u << i))
if (c1_xfer(c, vals[i], 0, NULL) < 0) return -1;
if (c1_xfer(c, ARM_NOP, 0, NULL) < 0) return -1;
return 0;
}
/* Read core registers from the debug data bus (debug speed):
* STMIA r<rn>, {regs}; values appear from the 4th DCLK on. */
static int read_core_regs(c1_ctx *c, int rn, uint32_t mask, uint32_t *out)
{
int i;
if (c1_xfer(c, ARM_STMIA(rn, mask & 0xffff, 0), 0, NULL) < 0) return -1;
if (c1_xfer(c, ARM_NOP, 0, NULL) < 0) return -1; /* DECODE */
if (c1_xfer(c, ARM_NOP, 0, NULL) < 0) return -1; /* EXECUTE 1 */
for (i = 0; i <= 15; i++)
if (mask & (1u << i))
if (c1_xfer(c, ARM_NOP, 0, &out[i]) < 0) return -1;
return 0;
}
/* Queue a system-speed load-multiple from real memory into {regs}, with
* base writeback so r0 advances for the next block. The instruction
* preceding it carries the SYSSPEED bit. */
static int load_word_regs(c1_ctx *c, uint32_t mask)
{
if (c1_xfer(c, ARM_NOP, 0, NULL) < 0) return -1;
if (c1_xfer(c, ARM_NOP, 1, NULL) < 0) return -1; /* SYSSPEED marker */
if (c1_xfer(c, ARM_LDMIA(0, mask & 0xffff, 1), 0, NULL) < 0) return -1;
return 0;
}
/* Switch a Thumb-state core to ARM state so the rest of the debug logic
* can use ARM instructions (mirrors OpenOCD's arm7tdmi_change_to_arm).
* Clobbers r0 and PC (fine for a read-then-power-cycle flow): loads r0
* with an even address and BX r0. Thumb instructions are injected as
* 16-bit opcodes duplicated into both halfwords. Assumes chain 1 +
* INTEST selected; the caller wraps it in a c1 session. */
static int change_to_arm(c1_ctx *c)
{
/* save r0 (STR r0,[r0]); value discarded */
if (c1_xfer(c, ARM_T_STR(0, 0), 0, NULL) < 0) return -1;
if (c1_xfer(c, ARM_T_NOP, 0, NULL) < 0) return -1;
if (c1_xfer(c, ARM_T_NOP, 0, NULL) < 0) return -1;
if (c1_xfer(c, 0, 0, NULL) < 0) return -1; /* data-in slot */
/* read pc (MOV r0,r15; STR r0,[r0]); value discarded */
if (c1_xfer(c, ARM_T_MOV(0, 15), 0, NULL) < 0) return -1;
if (c1_xfer(c, ARM_T_STR(0, 0), 0, NULL) < 0) return -1;
if (c1_xfer(c, ARM_T_NOP, 0, NULL) < 0) return -1;
if (c1_xfer(c, ARM_T_NOP, 0, NULL) < 0) return -1;
if (c1_xfer(c, 0, 0, NULL) < 0) return -1; /* data-in slot */
/* LDR r0,[PC,#0] with data 0 -> r0 = 0 (bits[1:0] cleared) */
if (c1_xfer(c, ARM_T_LDR_PCREL(0), 0, NULL) < 0) return -1;
if (c1_xfer(c, ARM_T_NOP, 0, NULL) < 0) return -1;
if (c1_xfer(c, ARM_T_NOP, 0, NULL) < 0) return -1;
if (c1_xfer(c, 0x0, 0, NULL) < 0) return -1; /* LDR data word */
if (c1_xfer(c, ARM_T_NOP, 0, NULL) < 0) return -1;
/* BX r0 -> ARM state */
if (c1_xfer(c, ARM_T_BX(0), 0, NULL) < 0) return -1;
if (c1_xfer(c, ARM_T_NOP, 0, NULL) < 0) return -1;
if (c1_xfer(c, ARM_T_NOP, 0, NULL) < 0) return -1;
return 0;
}
/* RESTART, then wait for the system-speed access to complete (DBGACK &
* SYSCOMP). Leaves the TAP on the EmbeddedICE chain. */
static int execute_sys_speed(jtag_core *jc)
{
uint32_t status = 0;
int tries;
/* RESTART resumes the core to run the one system-speed access. The
* core needs debug clocks to step through it, but once it re-enters
* debug any further clock executes a stale instruction and clobbers
* the loaded registers. So drive it ONE clock at a time and check
* DBG_STATUS QUIETLY (no clock) between, stopping the instant
* SYSCOMP & DBGACK appear. Leave the TAP parked (Pause) on EICE. */
if (bscan_set_ir(jc, IR_RESTART, ARM7_IR_LEN) < 0) return -1;
quiet_enter(jc);
if (quiet_chain_select(jc, SC_EICE) < 0) return -1;
for (tries = 0; tries < 2000; tries++) {
if (quiet_eice_read(jc, EICE_DBG_STATUS, &status) < 0) return -1;
if ((status & DBG_STATUS_DBGACK) && (status & DBG_STATUS_SYSCOMP))
return 0;
clock_core(jc, 1);
}
fprintf(stderr, "arm_debug: sys-speed access timed out (status 0x%08x)\n", status);
return -1; return -1;
} }
/* Read memory by instruction injection. Reads word-aligned blocks
* covering [addr, addr+len) and copies the requested bytes out.
* Core registers r0..r14 are clobbered (acceptable for a read-then-
* power-cycle flow). The core must already be halted (DBGACK).
*
* Reads real memory correctly (validated by an objcopy-verified 32 KB
* flash dump of the LPC2103). Intended flow is power-on -> one halt ->
* dump; see the file header and the arm7-debug-dclk-timing note for the
* repeated-halt caveat. */
int arm_debug_mem_read(jtag_core *jc, const jtag_target *t, int arm_debug_mem_read(jtag_core *jc, const jtag_target *t,
unsigned long addr, void *buf, unsigned long len) unsigned long addr, void *buf, unsigned long len)
{ {
(void)jc; (void)t; (void)addr; (void)buf; (void)len; unsigned long base = addr & ~3UL;
fprintf(stderr, "arm_debug: mem_read not implemented yet\n"); unsigned long end = addr + len;
return -1; unsigned long total_words = (((end + 3) & ~3UL) - base) / 4;
unsigned long done = 0;
uint32_t r0, status = 0;
uint8_t *out = buf;
c1_ctx c1;
(void)t;
if (!buf || len == 0) return -1;
/* If the core halted in Thumb state, switch it to ARM. Do the EICE
* status read first (the chain switch clocks the halted core), then
* the switch in one continuous chain-1 session so no stray clocks
* land between change_to_arm and the first instruction. */
if (chain_select(jc, SC_EICE) < 0) return -1;
if (eice_read(jc, EICE_DBG_STATUS, &status) < 0) return -1;
if (chain_select(jc, SC_DEBUG) < 0) return -1;
/* Debug entry, mirroring OpenOCD's arm7_9_debug_entry to leave a
* deterministic pipeline regardless of halt state: switch Thumb->ARM
* if needed, then read all 16 core registers. That STMIA+NOP+NOP+16
* sequence flushes the firmware out of the pipeline and ends in the
* same known state for both the Thumb and ARM paths, so the first
* system-speed read reliably re-enters debug. */
{
uint32_t scratch[16];
c1_init(&c1, jc);
if (status & DBG_STATUS_ITBIT)
if (change_to_arm(&c1) < 0) return -1;
memset(scratch, 0, sizeof(scratch));
if (read_core_regs(&c1, 0, 0xffff, scratch) < 0) return -1;
c1_end(&c1);
}
/* WARM-UP: the first system-speed read after debug entry normalizes
* the sys-speed pipeline but its own result is unreliable. Do one
* throwaway read block and discard it; every read after it is
* consistent and correct. (Like the FTDI stale-first-read, but for
* the ARM debug pipeline.) */
{
uint32_t scratch[16];
r0 = (uint32_t)base;
c1_init(&c1, jc);
if (write_core_regs(&c1, 0, 0x1, &r0) < 0) return -1;
if (load_word_regs(&c1, 0x7ffe) < 0) return -1; /* r1..r14 */
c1_end(&c1);
if (execute_sys_speed(jc) < 0) return -1;
if (quiet_chain_select(jc, SC_DEBUG) < 0) return -1;
if (quiet_latch_chain1(jc, ARM_NOP) < 0) return -1;
quiet_exit(jc);
memset(scratch, 0, sizeof(scratch));
c1_init(&c1, jc);
if (read_core_regs(&c1, 0, 0x7ffe, scratch) < 0) return -1;
c1_end(&c1);
}
r0 = (uint32_t)base;
while (done < total_words) {
uint32_t regs[16];
uint32_t reg_list;
unsigned long n = total_words - done;
unsigned long i;
if (n > 14) n = 14;
/* r1..rn, base (r0) excluded so it can be the autoincrement ptr. */
reg_list = (uint32_t)((0xffffu >> (15 - n)) & 0xfffe);
/* On chain 1 (from change_to_arm or the previous read): set r0
* once, then queue the system-speed LDM. */
c1_init(&c1, jc);
if (done == 0) /* LDM writeback advances r0 after */
if (write_core_regs(&c1, 0, 0x1, &r0) < 0) return -1;
if (load_word_regs(&c1, reg_list) < 0) return -1;
c1_end(&c1);
/* RESTART + quiet poll; leaves the TAP parked on EmbeddedICE. */
if (execute_sys_speed(jc) < 0) return -1;
/* Switch back to chain 1 WITHOUT clocking the core (a normal
* chain_select would clobber the just-loaded registers), then
* read them out. execute_sys_speed left us parked (Pause) on the
* EmbeddedICE chain. */
if (quiet_chain_select(jc, SC_DEBUG) < 0) return -1;
/* Displace the stale (system-speed LDMIA) instruction with a NOP
* so the first debug clock re-executes a NOP, not the LDMIA
* (which would reload r1..rn from the debug bus and lose the
* memory data). */
if (quiet_latch_chain1(jc, ARM_NOP) < 0) return -1;
quiet_exit(jc);
memset(regs, 0, sizeof(regs));
c1_init(&c1, jc);
if (read_core_regs(&c1, 0, reg_list, regs) < 0) return -1;
c1_end(&c1);
for (i = 0; i < n; i++) {
unsigned long word_addr = base + (done + i) * 4;
uint32_t w = regs[1 + i];
int b;
for (b = 0; b < 4; b++) {
unsigned long byte_addr = word_addr + b;
if (byte_addr >= addr && byte_addr < end)
out[byte_addr - addr] = (uint8_t)(w >> (8 * b));
}
}
done += n;
}
return 0;
} }
int arm_debug_mem_write(jtag_core *jc, const jtag_target *t, int arm_debug_mem_write(jtag_core *jc, const jtag_target *t,

View File

@@ -406,6 +406,9 @@ int drv_FTDI_TDOTDI_xfer(jtag_core * jc, unsigned char * str_out, unsigned char
int nbtosend; int nbtosend;
unsigned char opcode, data; unsigned char opcode, data;
int body;
int last_tms;
(void)jc; (void)jc;
rd_bit_index = 0; rd_bit_index = 0;
wr_bit_index = 0; wr_bit_index = 0;
@@ -413,11 +416,18 @@ int drv_FTDI_TDOTDI_xfer(jtag_core * jc, unsigned char * str_out, unsigned char
if (!size) if (!size)
return 0; return 0;
/* The bscan_* primitives mark TMS on the LAST bit to leave Shift-DR/IR
* (-> Exit1, so the following Update latches). Honour it: shift the
* body via TDI, then clock the final bit through TMS (carrying its TDI
* and TDO). Without this the IR never latches. */
last_tms = (str_out[size - 1] & JTAG_STR_TMS) ? 1 : 0;
body = last_tms ? size - 1 : size;
memset(ftdi_out_buf, 0, 16); memset(ftdi_out_buf, 0, 16);
memset(ftdi_in_buf, 0, 16); memset(ftdi_in_buf, 0, 16);
/* First bit, if it carries TMS (entering a state): bit-mode TMS shift. */ /* First bit, if it carries TMS (entering a state): bit-mode TMS shift. */
if (str_out[wr_bit_index] & JTAG_STR_TMS) { if (body > 0 && (str_out[wr_bit_index] & JTAG_STR_TMS)) {
if (str_in) if (str_in)
opcode = (OP_WR_TMS | OP_LSB_FIRST | OP_BIT_MODE | OP_FEDGE_WR | OP_RD_TDO); opcode = (OP_WR_TMS | OP_LSB_FIRST | OP_BIT_MODE | OP_FEDGE_WR | OP_RD_TDO);
else else
@@ -440,11 +450,8 @@ int drv_FTDI_TDOTDI_xfer(jtag_core * jc, unsigned char * str_out, unsigned char
} }
} }
if (wr_bit_index >= size)
return 0;
/* Whole bytes via byte-mode TDI shift. */ /* Whole bytes via byte-mode TDI shift. */
rounded_size = (size - wr_bit_index) & ~(0x7); rounded_size = (body - wr_bit_index) & ~(0x7);
if (rounded_size) { if (rounded_size) {
if (str_in) if (str_in)
opcode = (OP_WR_TDI | OP_LSB_FIRST | OP_FEDGE_WR | OP_RD_TDO); opcode = (OP_WR_TDI | OP_LSB_FIRST | OP_FEDGE_WR | OP_RD_TDO);
@@ -482,14 +489,14 @@ int drv_FTDI_TDOTDI_xfer(jtag_core * jc, unsigned char * str_out, unsigned char
} }
/* Trailing bits via bit-mode TDI shift. */ /* Trailing bits via bit-mode TDI shift. */
while (wr_bit_index < size) { while (wr_bit_index < body) {
if (str_in) if (str_in)
opcode = (OP_WR_TDI | OP_LSB_FIRST | OP_BIT_MODE | OP_FEDGE_WR | OP_RD_TDO); opcode = (OP_WR_TDI | OP_LSB_FIRST | OP_BIT_MODE | OP_FEDGE_WR | OP_RD_TDO);
else else
opcode = (OP_WR_TDI | OP_LSB_FIRST | OP_BIT_MODE | OP_FEDGE_WR); opcode = (OP_WR_TDI | OP_LSB_FIRST | OP_BIT_MODE | OP_FEDGE_WR);
nbtosend = 0; nbtosend = 0;
bitscnt = (size - wr_bit_index); bitscnt = (body - wr_bit_index);
if (bitscnt > 8) bitscnt = 8; if (bitscnt > 8) bitscnt = 8;
ftdi_out_buf[nbtosend++] = opcode; ftdi_out_buf[nbtosend++] = opcode;
@@ -518,6 +525,28 @@ int drv_FTDI_TDOTDI_xfer(jtag_core * jc, unsigned char * str_out, unsigned char
} }
} }
/* Final bit carrying TMS=1: clock it through the TMS pin (Shift ->
* Exit1) with its TDI on bit 7, capturing TDO. This lets the following
* Update latch the IR/DR. */
if (last_tms) {
opcode = str_in ? (OP_WR_TMS | OP_LSB_FIRST | OP_BIT_MODE | OP_FEDGE_WR | OP_RD_TDO)
: (OP_WR_TMS | OP_LSB_FIRST | OP_BIT_MODE | OP_FEDGE_WR);
nbtosend = 0;
ftdi_out_buf[nbtosend++] = opcode;
ftdi_out_buf[nbtosend++] = 0x00; /* 1 bit */
data = 0x01; /* TMS=1 (bit 0) */
if (str_out[body] & JTAG_STR_DOUT) data |= 0x80; /* TDI on bit 7 */
ftdi_out_buf[nbtosend++] = data;
if (opcode & OP_RD_TDO) ftdi_out_buf[nbtosend++] = CMD_SEND_IMMEDIATE;
if (ft_write(ftdi_out_buf, nbtosend) < 0) return -1;
if (opcode & OP_RD_TDO) {
if (ft_read(ftdi_in_buf, 1) < 0) return -1;
str_in[rd_bit_index++] = (ftdi_in_buf[0] & 0x80) ? JTAG_STR_DOUT : 0x00;
}
}
return 0; return 0;
} }

View File

@@ -42,6 +42,7 @@
#include "bscan/bscan.h" #include "bscan/bscan.h"
#include "svf/svf.h" #include "svf/svf.h"
#include "program/program.h" #include "program/program.h"
#include "arm_debug/arm_debug.h"
#include "spi_flash/spi_flash.h" #include "spi_flash/spi_flash.h"
#include "env.h" #include "env.h"
@@ -3586,6 +3587,194 @@ static int cmd_program(script_ctx *ctx, char *line)
return JTAG_CORE_NO_ERROR; return JTAG_CORE_NO_ERROR;
} }
const char *cmd_cpu_halt_help[] = {
"<dev>(int)",
"Halt the ARM CPU at chain device <dev> over JTAG debug (EmbeddedICE,",
"ARM7/9): forces DBGRQ and checks DBGACK. Run jtag_scan first.",
""
};
static int cmd_cpu_halt(script_ctx *ctx, char *line)
{
jtag_core *jc;
char dev_s[32];
int dev;
const jtag_target *t;
jc = (jtag_core *)ctx->app_ctx;
if (get_param(ctx, line, 1, dev_s) <= 0) {
ctx->script_printf(ctx, MSG_ERROR, "Usage: cpu_halt <dev>\n");
return JTAG_CORE_BAD_PARAMETER;
}
dev = (int)strtol(dev_s, NULL, 0);
t = target_lookup_by_idcode(jtagcore_get_dev_id(jc, dev));
if (arm_debug_halt(jc, t) < 0) {
ctx->script_printf(ctx, MSG_ERROR, "cpu_halt: no DBGACK (debug not entered)\n");
return JTAG_CORE_ACCESS_ERROR;
}
ctx->script_printf(ctx, MSG_INFO_0, "CPU halted (DBGACK).\n");
return JTAG_CORE_NO_ERROR;
}
const char *cmd_cpu_resume_help[] = {
"<dev>(int)",
"Release the ARM CPU at chain device <dev> from debug (clear DBGRQ +",
"RESTART).",
""
};
static int cmd_cpu_resume(script_ctx *ctx, char *line)
{
jtag_core *jc;
char dev_s[32];
int dev;
const jtag_target *t;
jc = (jtag_core *)ctx->app_ctx;
if (get_param(ctx, line, 1, dev_s) <= 0) {
ctx->script_printf(ctx, MSG_ERROR, "Usage: cpu_resume <dev>\n");
return JTAG_CORE_BAD_PARAMETER;
}
dev = (int)strtol(dev_s, NULL, 0);
t = target_lookup_by_idcode(jtagcore_get_dev_id(jc, dev));
if (arm_debug_resume(jc, t) < 0) {
ctx->script_printf(ctx, MSG_ERROR, "cpu_resume failed\n");
return JTAG_CORE_ACCESS_ERROR;
}
ctx->script_printf(ctx, MSG_INFO_0, "CPU resumed.\n");
return JTAG_CORE_NO_ERROR;
}
/* Write a buffer as Intel HEX: type-00 data records (16 bytes each,
* never crossing a 64 KB boundary) preceded by a type-04 extended linear
* address record whenever the upper 16 bits of the address change, ended
* by the type-01 EOF record. */
static int write_intel_hex(const char *path, unsigned long base,
const uint8_t *data, unsigned long len)
{
FILE *f;
unsigned long i = 0;
unsigned int cur_upper = 0xFFFFFFFFu; /* force a leading ELA record */
f = fopen(path, "wb");
if (!f) return -1;
while (i < len) {
unsigned long addr = base + i;
unsigned int upper = (unsigned int)((addr >> 16) & 0xFFFFu);
unsigned int chunk = (len - i > 16) ? 16 : (unsigned int)(len - i);
unsigned int to_boundary = 0x10000u - (unsigned int)(addr & 0xFFFFu);
unsigned int sum, j;
if (chunk > to_boundary) chunk = to_boundary;
if (upper != cur_upper) {
unsigned int s = 0x02 + 0x04 + ((upper >> 8) & 0xFF) + (upper & 0xFF);
fprintf(f, ":02000004%04X%02X\r\n", upper, (unsigned int)((0u - s) & 0xFF));
cur_upper = upper;
}
sum = chunk + ((addr >> 8) & 0xFF) + (addr & 0xFF); /* type 00 adds 0 */
fprintf(f, ":%02X%04X00", chunk, (unsigned int)(addr & 0xFFFFu));
for (j = 0; j < chunk; j++) {
fprintf(f, "%02X", data[i + j]);
sum += data[i + j];
}
fprintf(f, "%02X\r\n", (unsigned int)((0u - sum) & 0xFF));
i += chunk;
}
fprintf(f, ":00000001FF\r\n");
fclose(f);
return 0;
}
const char *cmd_cpu_read_help[] = {
"<dev>(int) <addr>(hex) <len>(hex) <file>(str) <format>(bin|hex)",
"Halt the ARM CPU at chain device <dev> and read <len> bytes from",
"memory at <addr> over JTAG debug (EmbeddedICE instruction injection),",
"writing <file> as raw binary (format 'bin') or Intel HEX ('hex').",
"Omit <file> for a console hex-dump. The CPU is left halted (use",
"cpu_resume or power-cycle). Run jtag_scan first. Core registers",
"r0..r14 are clobbered by the read.",
""
};
static int cmd_cpu_read(script_ctx *ctx, char *line)
{
jtag_core *jc;
char dev_s[32], addr_s[32], len_s[32], path[MAX_PATH + 1], fmt[16];
int dev, have_file;
unsigned long addr, len, i;
const jtag_target *t;
uint8_t *buf;
jc = (jtag_core *)ctx->app_ctx;
if (get_param(ctx, line, 1, dev_s) <= 0 ||
get_param(ctx, line, 2, addr_s) <= 0 ||
get_param(ctx, line, 3, len_s) <= 0) {
ctx->script_printf(ctx, MSG_ERROR,
"Usage: cpu_read <dev> <addr> <len> <file> <bin|hex>\n");
return JTAG_CORE_BAD_PARAMETER;
}
have_file = (get_param(ctx, line, 4, path) > 0);
if (have_file) {
if (get_param(ctx, line, 5, fmt) <= 0 ||
(strcmp(fmt, "bin") != 0 && strcmp(fmt, "hex") != 0)) {
ctx->script_printf(ctx, MSG_ERROR,
"cpu_read: a file needs a format: 'bin' (raw) or 'hex' (Intel HEX)\n");
return JTAG_CORE_BAD_PARAMETER;
}
}
dev = (int)strtol(dev_s, NULL, 0);
addr = strtoul(addr_s, NULL, 0);
len = strtoul(len_s, NULL, 0);
if (len == 0) {
ctx->script_printf(ctx, MSG_ERROR, "cpu_read: length must be > 0\n");
return JTAG_CORE_BAD_PARAMETER;
}
t = target_lookup_by_idcode(jtagcore_get_dev_id(jc, dev));
if (arm_debug_halt(jc, t) < 0) {
ctx->script_printf(ctx, MSG_ERROR, "cpu_read: halt failed (no DBGACK)\n");
return JTAG_CORE_ACCESS_ERROR;
}
buf = malloc(len);
if (!buf) return JTAG_CORE_MEM_ERROR;
if (arm_debug_mem_read(jc, t, addr, buf, len) < 0) {
ctx->script_printf(ctx, MSG_ERROR, "cpu_read: memory read failed\n");
free(buf);
return JTAG_CORE_ACCESS_ERROR;
}
if (have_file) {
int ok;
if (strcmp(fmt, "hex") == 0) {
ok = (write_intel_hex(path, addr, buf, len) == 0);
} else {
FILE *f = fopen(path, "wb");
ok = (f && fwrite(buf, 1, len, f) == len);
if (f) fclose(f);
}
if (!ok) {
ctx->script_printf(ctx, MSG_ERROR, "cpu_read: cannot write '%s'\n", path);
free(buf);
return JTAG_CORE_IO_ERROR;
}
ctx->script_printf(ctx, MSG_INFO_0,
"Read 0x%lX bytes from 0x%lX -> %s (%s)\n", len, addr, path, fmt);
} else {
for (i = 0; i < len; i += 16) {
char hexs[16 * 3 + 1];
int p = 0;
unsigned long j;
for (j = i; j < i + 16 && j < len; j++)
p += snprintf(hexs + p, sizeof(hexs) - p, "%02X ", buf[j]);
ctx->script_printf(ctx, MSG_INFO_0, "%08lX %s\n", addr + i, hexs);
}
}
free(buf);
return JTAG_CORE_NO_ERROR;
}
cmd_list script_commands_list[] = cmd_list script_commands_list[] =
{ {
{"print", cmd_print, cmd_print_help}, {"print", cmd_print, cmd_print_help},
@@ -3643,6 +3832,9 @@ cmd_list script_commands_list[] =
{"flash_verify", cmd_flash_verify, cmd_flash_verify_help}, {"flash_verify", cmd_flash_verify, cmd_flash_verify_help},
{"svf_play", cmd_svf_play, cmd_svf_play_help}, {"svf_play", cmd_svf_play, cmd_svf_play_help},
{"program", cmd_program, cmd_program_help}, {"program", cmd_program, cmd_program_help},
{"cpu_halt", cmd_cpu_halt, cmd_cpu_halt_help},
{"cpu_resume", cmd_cpu_resume, cmd_cpu_resume_help},
{"cpu_read", cmd_cpu_read, cmd_cpu_read_help},
{0, 0}}; {0, 0}};
/////////////////////////////////////////////////////////////////////////////// ///////////////////////////////////////////////////////////////////////////////