THE BIG INTERRUPT/MEMORY CORRUPT PROBLEM
06 nov 2025
For a solid two months I chased a heisenbug, the symptom of which
was Z80 execution of random instructions. The entire time the error
was related, somehow, to VGA screen drawing. This was narrowed down
to full-screen or mostly full- writes, eg. scrolling. It finally
reduced to a repeatable alternating between two console 0 resizes
that would trigger the fault ("Z80 halt" or other bad-instruction)
about 50% of the time.
Massive effort was spent in the emulator (it was solid, no errors
found; @davidly's !), and my window scrolling (array index+1 errors
abound, lol), testing rendered that solid; every single runtime
index array use was bound-checked and reported error.
After window bounding done, which also resulted in rewrites that
reduced complexity and increased speed, I found most of my display
complexity was in cursor management (VGA4BIT handled it well for the
screen; but fz80 has four windows...) so this entailed some
substantial changes to VGA4BIT (see below).
Z80 interrupts were clearly involved, from the start. With interrupts
disabled (via RTC Z80 port, from the Z80 side) it never failed.
But hundreds of hours of test-hammering, using two MP/M machines
running four test programs designed to poke at weaknesses,
many tens of millions of Z80 interrupts, revealed not one single
error (found and fixed an, my, IFF2 error). Interrupt behavior
was testably correct.
The breakthrough came when I got a repeatable fault trigger:
window-resize keys such that I could quickly alternate between
one size and another in console0. The error, when it happened,
occurred only after Con's REDRAWTIMER fired; but all of the code
under that tested out impeccably.
In exasperation I commented-out the RTC IntervalTimer -- faults
stopped, 100%. It was now impossible to induce a fault -- modulo
heisenbug, which this was (nearly any change to the code induced
change in the manifestation of the fault).
But now it was testable, simply. IntervalTimer running, faults.
Not running, fault.
Also, none of this happens, not even once, on CP/M. Only MP/M.
But the fault manifests on the Teensy, not Z80. Same emulator,
same windows, same interrupts; the CP/M :: MP/M distinction
is entirely within the emulation.
Except memory usage.
WTF HYPOTHESIS
CP/M runs entirely within one single 64K static byte array.
MP/M consumed 210K of memory, and since I've used "all" memory,
the four MP/M banks are split across RAM1 and RAM2, ram1,
and "DMAMEM".
MP/M uses large amounts of RAM in the same physical memory
as the DMA accessed VGA display.
A lot of my testing was related to this adjacency of Z80 banks
and the DMA-accessed VGA arrays.
I assume this is related. Add in something-something interrupts.
THE SOLUTION
Not entirely satisfiying... but reliable. Z80 timer tick, 50 Hz
20 mS, is done with a software timer in Dev's loop(). Because SD
card can block for 15 mS, this loses time; but task-switch ints
lost amount to nothing because the task is blocking on SD/disk
(by MP/M design). So the only problem was TOD, the ONESECOND tick.
That was repaired by having the timer tick read the RTC device
one-second register, and generate the XDOS ONESECOND flag event
on one-second transitions. Short term jitter is terrible; that
amounts to nothing, and long-term stability is excellent.
CHANGES TO VGA4BIT
VGA4BIT is kinda the center around which fZ80 revolves; without it
there would be no fZ80.
As this project scaled up screen performance and complexity
became an issue; the "automatic" cursor became a curse (lol)
and so I turned it off and rolled my own and complexity
plummeted.
But when I got into this "interrupt" problem I suspected
everything. I cloned VGA4BIT and started chopping, hoping for
a nice dumb array-out-of-bounds error (it's C, after all!)
Alas (lol!) the code seemed flawless. This work coincided
with stress-testing Con.h (H19Out and the window manager)
and I found a lot of performance increase in the reduced
code... so that chopping resulted in me retaining the
feature-reduced version of VGA4BIT as SR-VGA_4BIT_T4-MINIMAL
in the SRResources library.
SR-VGA_4BIT_T4-MINIMAL dropped everything that was not
write-char-to-RAM; cursors and its overhead, gone.
write() now takes (row, col) argument for every write.
Cursor is entirely done in Con:loop() and is a dozen lines.
Though I'd done it initially as throw-away debug/exploration
the performance increase was so massive I decided to
keep it.