OK, here are some results based on
a revised version of Paul's original test program, combined with
an adapted version of
@jmarsh's
PR#708. As previously noted, the differing constraints on page boundaries between 8MB and 16MB parts mean that the AHB prefetch has to be reined in a bit with the latter, so you can't reliably achieve the same speed gains.
The revised test gives some information about actual memory transfer speeds which can't usefully be inferred from the overall test duration due to test overheads. The theoretical speed may be a bit suspect - I tried to calculate it based on addressing overhead and prefect burst length, but it may well be wrong.
Size [MB] | Speed [MHz] | Prefetch | Block size [bytes] | To mem [MB/s] | From mem [MB/s] | DMA to [MB/s] | DMA from [MB/s] | Tests | Test duration [sec] | CPU speed [MB/s] | DMA speed [MB/s] | Test speed [MB/s] | Theoretical speed [MB/s] | Versus base: per block | Versus base: overall test |
16 | 105.6 | FALSE | 2044 | 34.27 | 34.54 | 15.49 | 8.67 | 57 | 77.38 | 34.41 | 12.08 | 23.57 | 43.32 | | |
16 | 120 | TRUE | 2044 | 40.5 | 47.05 | 16.94 | 30.64 | 57 | 65.31 | 43.78 | 23.79 | 27.93 | 49.23 | 27.2% | 18.5% |
16 | 105.6 | TRUE | 2044 | 34.49 | 41.66 | 15.47 | 29.91 | 57 | 71.03 | 38.08 | 22.69 | 25.68 | 43.32 | 10.7% | 8.9% |
16 | 120 | FALSE | 2044 | 40.1 | 38.66 | 16.94 | 9.41 | 57 | 72.35 | 39.38 | 13.18 | 25.21 | 49.23 | 14.5% | 7.0% |
8 | 105.6 | TRUE | 2044 | 34.49 | 51.8 | 15.47 | 36.93 | 57 | 31.7 | 43.15 | 26.20 | 28.77 | 43.32 | 25.4% | 22.1% |
8 | 120 | TRUE | 2044 | 41 | 58.84 | 16.94 | 37 | 57 | 29.22 | 49.92 | 26.97 | 31.21 | 49.23 | 45.1% | 32.4% |
The revised test creates blocks of test data in Teensy's RAM, then copies to and from PSRAM using either
memcpy(), or DMA. Only a couple of the 57 tests are set to use DMA. The blocks are 2044 bytes long, so actually cross two page boundaries; it doesn't seem to make a lot of difference, and can easily be changed:
C++:
//#define BLK_SIZE 255 // 255*uint32_t is 1020 bytes
#define BLK_SIZE 511 // 511*uint32_t is 2044 bytes
At the moment the prefetch is similarly controlled by macro definitions:
C++:
#define USE_PREFETCH
#define LIMIT_PREFETCH_SIZE // needed for ISSI 16MB part
If you have a 16MB part, and define
USE_PREFETCH but do
not define
LIMIT_PREFETCH_SIZE, then the test will fail. The 8MB part seems fine with using unlimited prefetch, and thus gives the maximum possible PSRAM speed.
I confess I don't understand why the DMA copy speed is so poor - I wouldn't have expected contention for memory bandwidth would be that significant, but maybe it is. Maybe a proper DMA guru can enlighten us!