Teensy 4.0 First Beta Test

Status
Not open for further replies.
Here is a hack (proof-of-concept) of using PIT to clock ADC via XBAR (and ADC_ETC), derived from EVKB SDK example
boards/evkbimxrt1050/driver_examples/adc_etc/adc_etc_hardware_trigger_conv/
Reads 12-bit ADC values every second from A0 and A1, similar to Teensy3 PDB clocking.
https://github.com/manitou48/teensy4/blob/master/pitxbaradc.ino

The next layer to add is DMA ....?

EDIT: see Mike's more robust implementation
https://github.com/mjs513/WIP/tree/master/adc_etc_hardware_trigger_conv

Talking with manitou on porting the example over to for T4 use, I managed to get that done if anyone is interested. I basically takes the drivers and defines and puts them into a sketch. A lot of consolidation can be done with the defines, was just a fast port even though it took me all day. It combines @manitou's sketch and the sdk example. Made it easier to get ported over. If you want to change pins from A0/A1 check Table 14-1 out in the manual. The sketch is on my T4 WIP GitHub page: https://github.com/mjs513/WIP
 
Paul: I didn't watch first upload verbose - did T_loader update the T4 bootloader by any chance? Any change to T4 USB code?

Nope & nope.

So far we've had only 1 bootloader update (don't recall which beta, one of the earliest). That update fixed the ARM cycle counter issue.

The USB code hasn't significantly changed. I'm finally getting better, still pretty congested & coughing here, but probably will be back up to speed next week. USB is near the top of my list.


I believe there may be a bug lurking in teensy_ports, causing some of the trouble you're seeing with 3+ boards. Also high on my list of stuff to investigate.
 
@PaulStoffregen
I'm finally getting better, still pretty congested & coughing here, but probably will be back up to speed next week. USB is near the top of my list.
Theraflu and Hall's cough drops worked for me (took me about 3 weeks).

I believe there may be a bug lurking in teensy_ports, causing some of the trouble you're seeing with 3+ boards. Also high on my list of stuff to investigate.
I have had similar problems when I was using a USB Hub attached to the PC. I could work with 3 boards attached for a while but would loose the Teensy identification at some point, but the port would still show. But then I saw something you posted about not all hubs are created equal.
 
That may also explain the response to the nxp forum question - limit is imposed by bus arbitration, etc.... They never did explain the etc. part of the response.

It's really looking like NXP "designed" this chip with many cases where the fast CPU has to wait long times for sync with peripherals running at slower speeds. Every individual piece seems to be amazing, but it's hard to escape the feeling they just blindly lashed everything together with little care or concern for the performance impact of bus arbitration.

I'm really glad we decided to switch to the 1062 chip, so at least GPIO will be faster.
 
Just so you know I'm not ignoring this... I've tried reading this 3 times now and I still can't make any sense of it.

I would really like to know a way to reproduce this "T4 again in the ODD state" problem.

I was half ignoring it myself - in between recurrences {3 or 4 now?} - leaving notes as I see it happen just in case it rang a bell in passing or in the future. Funny nobody else seen anything like it for a meToo.

Notes are contemporaneous/disjointed as/when I remember/recognize what I'm doing to get the system unstuck, sometimes I've done an 'obvious' thing or two before finding it won't go away and stuff gets lost. It is a disconcerting time finding it 'stuck' doing something else and no clear indicators as to problem source or solution.

I'm hoping the updated bi-directional USB stack lead to resolution - directly or indirectly as it matures or gets debugged. The issue of a T4 after '15 second reset' not presenting a valid USB device suggests there is some lurking miscue on USB startup.

AFAIK - The repeats have some common elements: T4 and T_3.1's for Serial1 and Serial4, Win 10. Seems to involve sending fresh SerMon USB<>Serial# echo code to the T_3.1's. At first I was using TyComm TLoader active - recent cases were IDE board change and TLoader update. Also oddly at some rare times the T4 won't go online when there is a powered T_3 on USB before it. I do need to added SerMon windows - so I've been using TyCommander and assuming it is not interfering with T_Loader programming as it does not on T_3's. There was a beta TyComm that was T4 aware and I went back to prior so it should not be touching the unrecognized T4, it will do SerMon on T4 - but I keep it 'Serial Off' on T4.

T4 generally works really well as I've done some hundreds of uploads with no problem - would be interesting if TLoader or MCU Bootloader were counting. I have a sketchbook\T4 subdirectory with 74 sketch folders created - surprised if all were not uploaded at least 10 times - and some many dozens - and that doesn't include the ones in in the debug library folder.

Not just when ODD happens I see IDE Teensy_ports - delivers a shorted list of what Teensy's are online. In fact right now it only shows one of the T_3's beside the T4. And sometimes it is the T4 that is missing. But CMDLine just gave this where the COM19 is not on the IDE TPort list:
T:\arduino-1.8.8T4_146\hardware\tools>teensy_ports.exe
{
"address": "usb:0/140000/0/8/2",
"online": true,
"label": "COM19 (Teensy 3.2) Serial",
"vid": "16C0",
"pid": "0483",
"iserial": "86366",
"boardName": "Teensy 3.2",
"protocol": "Teensy"
}
{
"address": "usb:0/140000/0/8/7",
"online": true,
"label": "COM20 (Teensy 4-Beta) Serial",
"vid": "16C0",
"pid": "0483",
"iserial": "100002",
"boardName": "Teensy 4-Beta",
"protocol": "Teensy"
}
{
"address": "usb:0/140000/0/8/3",
"online": true,
"label": "COM13 (Teensy 3.2) Serial",
"vid": "16C0",
"pid": "0483",
"iserial": "86813",
"boardName": "Teensy 3.2",
"protocol": "Teensy"
}

So no focus on it yet is not a problem - so far it has come back working and there are more important fish to fry. Current WIN machine uptime is 6 days 11 hours. And closing IDE and all that is Teensy for a minute (?) and repowering seems to start fresh.
 
Nope & nope.

So far we've had only 1 bootloader update (don't recall which beta, one of the earliest). That update fixed the ARM cycle counter issue.

The USB code hasn't significantly changed. I'm finally getting better, still pretty congested & coughing here, but probably will be back up to speed next week. USB is near the top of my list.


I believe there may be a bug lurking in teensy_ports, causing some of the trouble you're seeing with 3+ boards. Also high on my list of stuff to investigate.

Okay I didn't expect there were changes, but this last time seemed a bit different and wanted to be sure. Github didn't suggest USB changed, but that doesn't show bootloader.

We cross posted - but yes, something is confusing or exposing a bug in t_ports with online devices not shown or usable in IDE. Sometimes it seems an 'off by 1' error tracking devices coming and going. AFAIK the IDE Serial ports list is always correct and works when T4 not on T_ports list - though here IDE has two 'static' [ COM5, COM6 ] that are always there - which I assume are recently used devices currently offline. The T4 seems to have something unique going on causing my ODD offline behavior - not sure if T_ports issue could cause that - or something in T4 is confusing T_ports. So this is me being patient waiting for the T4 USB stack update before I try to find a way to make it happen.

Good luck getting back to full health, and not relapsing as Mike notes. My wife just rests when she gets sick and gets better more quickly [unlike her co-workers that come in making her sick]- last time I tried powering through it just made it last longer.
 
It's really looking like NXP "designed" this chip with many cases where the fast CPU has to wait long times for sync with peripherals running at slower speeds. Every individual piece seems to be amazing, but it's hard to escape the feeling they just blindly lashed everything together with little care or concern for the performance impact of bus arbitration.

I'm really glad we decided to switch to the 1062 chip, so at least GPIO will be faster.

Perhaps the word 'Crossover' isn't there by accident - while sounding cool and positive it is actually the most honest admission 'marketing' would allow that this thing is caught with the compromises of bringing worlds together - taking two revs of the silicon to get this close.

Indeed the 1062 extra RAM is important to allow buffer space to minimize moving the data elsewhere at inopportune times, but even so half of RAM gets wait stated. What does a memcpy from FLASH look like timewise?

I saw faster GPIO noted - will be good to see that work out.
 
How to reach 44100Hz with the given Audio-PLL-freq? I don't know...
I have Audio from the SGTL now, but not with 44100Hz.
@Paul... a value I could find is to use 768MHz for the Audio-PLL.. and yes...this gives "our" well-known 44117.6471Hz... :)
But that's not good for SPDIF and other formats.
 
@Tim, most of the time copy from Flash is not that bad because of the cache. But if the data is not cached, it will be slow.. well most of the newer chips use external flash.
We need some wise decisions on the memory model.. or some we can select from the Arduino-Menu?
 
@Tim, most of the time copy from Flash is not that bad because of the cache. But if the data is not cached, it will be slow.. well most of the newer chips use external flash.
We need some wise decisions on the memory model.. or some we can select from the Arduino-Menu?

I was wondering about reading flash to the different memory areas … if some might have extra bus conflicts - or perhaps benefits. Data cache will only cover 32kb area in the best case if not already covering something else.
 
The issue of a T4 after '15 second reset' not presenting a valid USB device suggests there is some lurking miscue on USB startup.

Nope, that's just a known bug in the restore image. I discovered this bug the day before we started the beta and decided to leave it.

To explain briefly... I tried to improve upon the full erase Teensy 3.5 & 3.6 do, where the flash is left in a completely erased state. If you power cycle or cause a reboot using Teensy Loader, there is no program at all in flash memory, so no USB works. You have to press the button again (just a short press should do).

For Teensy 4, I created a small program that's written to the beginning of the flash, which implements RawHID and is able to hear the auto-reboot request. The idea is your board gets fully erased and then flashed with a program very similar to the one we write during testing, rather than being left with a completely empty flash.

The bug is I didn't get that tiny RawHID program quite right. Linux and Mac are happy with it, but Windows doesn't like it. I probably got some minor detail in the descriptors as it should be. So after you do the 15 sec button press, the T4 is restored to a not-quite-right RawHID which isn't recognized by Windows.

It's not a problem with the restore process. The bug is in the 4K image that gets written to the flash. The restore code is properly erasing and properly writing the image, but the image itself has something wrong in its USB descriptors.

I will fix this before we switch to 1062 chips. Whether I make any way to update the 1052 boards is a good question. The restore image is not part of the bootloader. It's stored in the top 4K of the flash chip. By default, neither Teensy Loader nor the bootloader normally write there, so updating that code for the boards you already have isn't easy.
 
How to reach 44100Hz with the given Audio-PLL-freq? I don't know...
I have Audio from the SGTL now, but not with 44100Hz.
@Paul... a value I could find is to use 768MHz for the Audio-PLL.. and yes...this gives "our" well-known 44117.6471Hz... :)
But that's not good for SPDIF and other formats.

@Frank.
Glad you got it working. Audio was always like a black box for me, so excuse if I don't get this right. I have a question though, you said that 44.1kHz is good for SPDIF and other formats. Was wondering what is considered a good sampling rate for SPDIF? You got me curious so I looked at the SGTL5000 datasheet and it looks like it would support 48kHz as well as 96kHz (there is a caveat for 96kHz though).

Mike
 
SPDIF has no dedicated clock connection, instead, the clock-signal is encoded in the transmitted data. Usually there is a pll on the receiving side that reconstructs the clock signal.
The requirements for the sameplerate are much harder than for other digital formats, it's a few percent. 44118 is close to max allowed... A few Hz more and "nothing works". Adat and aes/ebu have even harder requirements.
There were users who experienced problems with the 18Hz off already.. not every SPDIF input likes that and switches to error-state.
The library requires a fixed samplerate for all objects. So 44100 would be much better. Its possible on the imxrt - but we need to modify the Audio-PLL clock for this. I don' t think the default will work. Or, maybe I'm wrong and just don't see the solution..
 
SPDIF has no dedicated clock connection, instead, the clock-signal is encoded in the transmitted data. Usually there is a pll on the receiving side that reconstructs the clock signal.
The requirements for the sameplerate are much harder than for other digital formats, it's a few percent. 44118 is close to max allowed... A few Hz more and "nothing works". Adat and aes/ebu have even harder requirements.
There were users who experienced problems with the 18Hz off already.. not every SPDIF input likes that and switches to error-state.
The library requires a fixed samplerate for all objects. So 44100 would be much better. Its possible on the imxrt - but we need to modify the Audio-PLL clock for this. I don' t think the default will work. Or, maybe I'm wrong and just don't see the solution..

@Frank
Thanks for the explanation, didn't know it was that sensitive. If you don't have the SDK I can extract the clock sections for. Looking at it this morning it does something with the audio pll to get it configured properly. Not just setting the clock, which looks like its at 12 MHz to start.
 
SPI non blocking writes support and memory

I put up some SPI asynch transfer support up in my github branch: https://github.com/KurtE/SPI/tree/T4_Async_Support

I believe that at least for smaller buffers it is working, except maybe how we handle memory... For me there are a couple of issues or things I wonder about.

Example on T3.5 or T3.6:
I can create a buffer for the ili9341 display in normal memory like: uint8_t screen_buffer[320*240*2];

But on current beta, this won't compile (link) saying it does not fit into memory...
I can malloc one of them... If I want two of them the second malloc fails...
Can allocate one in DMAMEM, have not tried allocating two of them there...

So wondering...

Note: buffer is malloc and rxBuffer is DMAMEM

Now usage: Some of the time I get garbage...
Code:
  for (uint32_t i = 0; i < BUFFER_SIZE; i++) buffer[i] = i & 0xff;
  for (uint32_t i = 0; i < BUFFER_SIZE; i++)  rxBuffer[i] = 0x5a;
  DBGSerial.println("Async read Small"); DBGSerial.flush();
  digitalWriteFast(CS_PIN, LOW);
  SPIT.setTransferWriteFill(0x42);
  SPIT.transfer(NULL, rxBuffer, SMALL_TRANSFER_SIZE, event);
//  arm_dcache_delete(rxBuffer, SMALL_TRANSFER_SIZE);
  while (!event_happened) ;
  event_happened = false;
  dumpBuffer(rxBuffer, SMALL_TRANSFER_SIZE);
  validateTXBuffer(0);
  delay(5);
With the above test, some of rxBuffer is garbage... Example I did second test like this with large transfer and get...
Results like:
Code:
5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 
5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 
5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 
5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
Where full pages 32 bytes are old values (the 5A values)..
With the arm_dcache_delete(rxBuffer, SMALL_TRANSFER_SIZE);
call I get proper output (at least in small case).

This is with or without @Frank B's hack to startup - which I think only fixed the Writing to memory flushed from cache into RAM, not the updates in RAM by device updates cache...

So wondering if I need to add this call internal to SPI? Do I need to probably also add call to flush the Write buffer, probably in that case: arm_dcache_flush.
But that will maybe slow some stuff down as it loops addressing all 32 byte blocks within specified memory range?
 
@Frank
Thanks for the explanation, didn't know it was that sensitive. If you don't have the SDK I can extract the clock sections for. Looking at it this morning it does something with the audio pll to get it configured properly. Not just setting the clock, which looks like its at 12 MHz to start.

You could try to set 44100Hz and look what PLL frequency is set. If it's not the default, the values which set this freq. would be good to know. If it is default, the I2S clock settings are interesting.. :)
Thank you :)

Edit: I'd go for 903168000 Hz - that allows multiples like 2x44100 or 4x44100 with MLCK=256*fs . But, still , the SDK settings for this case would be interesting:)

@Kurt: Yes, set it to MEM_NOCACHE if you want to read. This disables the cache for that region. Or use the functions..
 
You could try to set 44100Hz and look what PLL frequency is set. If it's not the default, the values which set this freq. would be good to know. If it is default, the I2S clock settings are interesting.. :)
Thank you :)

Edit: I'd go for 903168000 Hz - that allows multiples like 2x44100 or 4x44100 with MLCK=256*fs . But, still , the SDK settings for this case would be interesting:)
….

Here you go. the first two sections come from sai_interrupt_transfer.c while clock function is extracted from fsl_ciock.c. Not sure if it helps or not but you never know. Then there are other functions for initializing Tx, not sure if you want that.

Code:
/* Select Audio/Video PLL (786.48 MHz) as sai1 clock source */
#define DEMO_SAI1_CLOCK_SOURCE_SELECT (2U)
/* Clock pre divider for sai1 clock source */
#define DEMO_SAI1_CLOCK_SOURCE_PRE_DIVIDER (0U)
/* Clock divider for sai1 clock source */
#define DEMO_SAI1_CLOCK_SOURCE_DIVIDER (63U)
/* Get frequency of sai1 clock */
#define DEMO_SAI_CLK_FREQ                                                        \
    (CLOCK_GetFreq(kCLOCK_AudioPllClk) / (DEMO_SAI1_CLOCK_SOURCE_DIVIDER + 1U) / \
     (DEMO_SAI1_CLOCK_SOURCE_PRE_DIVIDER + 1U))

	 
    /*Clock setting for SAI1*/
    CLOCK_SetMux(kCLOCK_Sai1Mux, DEMO_SAI1_CLOCK_SOURCE_SELECT);
    CLOCK_SetDiv(kCLOCK_Sai1PreDiv, DEMO_SAI1_CLOCK_SOURCE_PRE_DIVIDER);
    CLOCK_SetDiv(kCLOCK_Sai1Div, DEMO_SAI1_CLOCK_SOURCE_DIVIDER);
//--------------------------------------------------------
/*
 * AUDIO PLL setting: Frequency = Fref * (DIV_SELECT + NUM / DENOM)
 *                              = 24 * (32 + 77/100)
 *                              = 786.48 MHz
 */
const clock_audio_pll_config_t audioPllConfig = {
    .loopDivider = 32,  /* PLL loop divider. Valid range for DIV_SELECT divider value: 27~54. */
    .postDivider = 1,   /* Divider after the PLL, should only be 1, 2, 4, 8, 16. */
    .numerator = 77,    /* 30 bit numerator of fractional loop divider. */
    .denominator = 100, /* 30 bit denominator of fractional loop divider */
};

CLOCK_InitAudioPll(&audioPllConfig);
//--------------------------------------------------------------

//From fsl_clock.c
/*!
 * brief Initializes the Audio PLL.
 *
 * This function initializes the Audio PLL with specific settings
 *
 * param config Configuration to set to PLL.
 */
void CLOCK_InitAudioPll(const clock_audio_pll_config_t *config)
{
    uint32_t pllAudio;
    uint32_t misc2 = 0;

    /* Bypass PLL first */
    CCM_ANALOG->PLL_AUDIO = (CCM_ANALOG->PLL_AUDIO & (~CCM_ANALOG_PLL_AUDIO_BYPASS_CLK_SRC_MASK)) |
                            CCM_ANALOG_PLL_AUDIO_BYPASS_MASK | CCM_ANALOG_PLL_AUDIO_BYPASS_CLK_SRC(config->src);

    CCM_ANALOG->PLL_AUDIO_NUM = CCM_ANALOG_PLL_AUDIO_NUM_A(config->numerator);
    CCM_ANALOG->PLL_AUDIO_DENOM = CCM_ANALOG_PLL_AUDIO_DENOM_B(config->denominator);

    /*
     * Set post divider:
     *
     * ------------------------------------------------------------------------
     * | config->postDivider | PLL_AUDIO[POST_DIV_SELECT]  | MISC2[AUDIO_DIV] |
     * ------------------------------------------------------------------------
     * |         1           |            2                |        0         |
     * ------------------------------------------------------------------------
     * |         2           |            1                |        0         |
     * ------------------------------------------------------------------------
     * |         4           |            2                |        3         |
     * ------------------------------------------------------------------------
     * |         8           |            1                |        3         |
     * ------------------------------------------------------------------------
     * |         16          |            0                |        3         |
     * ------------------------------------------------------------------------
     */
    pllAudio =
        (CCM_ANALOG->PLL_AUDIO & (~(CCM_ANALOG_PLL_AUDIO_DIV_SELECT_MASK | CCM_ANALOG_PLL_AUDIO_POWERDOWN_MASK))) |
        CCM_ANALOG_PLL_AUDIO_ENABLE_MASK | CCM_ANALOG_PLL_AUDIO_DIV_SELECT(config->loopDivider);

    switch (config->postDivider)
    {
        case 16:
            pllAudio |= CCM_ANALOG_PLL_AUDIO_POST_DIV_SELECT(0);
            misc2 = CCM_ANALOG_MISC2_AUDIO_DIV_MSB_MASK | CCM_ANALOG_MISC2_AUDIO_DIV_LSB_MASK;
            break;

        case 8:
            pllAudio |= CCM_ANALOG_PLL_AUDIO_POST_DIV_SELECT(1);
            misc2 = CCM_ANALOG_MISC2_AUDIO_DIV_MSB_MASK | CCM_ANALOG_MISC2_AUDIO_DIV_LSB_MASK;
            break;

        case 4:
            pllAudio |= CCM_ANALOG_PLL_AUDIO_POST_DIV_SELECT(2);
            misc2 = CCM_ANALOG_MISC2_AUDIO_DIV_MSB_MASK | CCM_ANALOG_MISC2_AUDIO_DIV_LSB_MASK;
            break;

        case 2:
            pllAudio |= CCM_ANALOG_PLL_AUDIO_POST_DIV_SELECT(1);
            break;

        default:
            pllAudio |= CCM_ANALOG_PLL_AUDIO_POST_DIV_SELECT(2);
            break;
    }

    CCM_ANALOG->MISC2 =
        (CCM_ANALOG->MISC2 & ~(CCM_ANALOG_MISC2_AUDIO_DIV_LSB_MASK | CCM_ANALOG_MISC2_AUDIO_DIV_MSB_MASK)) | misc2;

    CCM_ANALOG->PLL_AUDIO = pllAudio;

    while ((CCM_ANALOG->PLL_AUDIO & CCM_ANALOG_PLL_AUDIO_LOCK_MASK) == 0)
    {
    }

    /* Disable Bypass */
    CCM_ANALOG->PLL_AUDIO &= ~CCM_ANALOG_PLL_AUDIO_BYPASS_MASK;
}
I am attaching a skinned down version of the version just in case I missed something: View attachment evkbimxrt1050_sai_interrupt_transfer.zip
 
@Kurt: Yes, set it to MEM_NOCACHE if you want to read. This disables the cache for that region. Or use the functions..
Thanks Frank,

But I am thinking more about the end user and what do they need to do and/or if the libraries need to reflect on this and handle it.

Right now I am thinking of doing some form of hack inside of SPI, that maybe does something like:

if (tx_buffer >= 0x20200000U) arm_dcache_flush(tx_buffer, count);
if (rx_buffer >= 0x20200000U) arm_dcache_delete(rx_buffer, count);

But not sure if the rx_buffer delete could be done at the start of the DMA or do I need to wait until the end of the DMA... Will try start and see what happens.

But again wondering about how the memory works.
My assumption is, there is lower memory, where variables and stack go: If I read the stuff correctly, we have: 128k of DTCM memory.
We then have 256KB of OCRAM where currently DMAMEM and heap go...

I did some quick and dirty prints in test app:
Code:
buffer = (uint8_t *)malloc(BUFFER_SIZE);
  //rxBuffer = (uint8_t *)malloc(BUFFER_SIZE);

  DBGSerial.print("Buffer: ");
  DBGSerial.print((uint32_t)buffer, HEX);
  DBGSerial.print(" RX Buffer: ");
  DBGSerial.print((uint32_t)rxBuffer, HEX);
  DBGSerial.print(" ");
  DBGSerial.println(BUFFER_SIZE, DEC);
  DBGSerial.printf("Static buffer: %x, Stack Buffer: %x\n", (uint32_t)static_buffer, (uint32_t)stack_buffer);
  DBGSerial.printf("Heap Start: %x, Heap End: %x\n", (uint32_t)&_heap_start, (uint32_t)&_heap_end);
  event.attachImmediate(&asyncEventResponder);
Output shows:
Code:
SPI Test program
Buffer: 20209C48 RX Buffer: 20200000 40000
Static buffer: 20001790, Stack Buffer: 2001ffc4
Heap Start: 20209c40, Heap End: 20240000

So the current definition of DMAMEM - imiplies it does not work with DMA? Unless you force the cache to work for you?

Also wondering about the lower memory. That is wondering if we should allow the unused parts to be used for heap as well?
That is if you subtract my buffer I put on stack, from my static buffer, they differ by about 120K in memory location. Note: I did not look to see where the actual last variable was actually allocated, probably including buffers for things like Serial objects and the like.

But if we don't allow something like malloc and new to get to this space, than maybe we should be default enlarge system buffers to make more use of this space?

Edit: Should mention that I understand that there is not enough memory available for two full ILI9341 buffers on these chips.
That is each one takes 153600 bytes, so 2 times that would take over 300K. But could potentially do it using multiple memory locations...
That is if one allocated lets says: the full 153600 for one buffer, and for the other buffer do it with two allocations like 100K from high memory and the other 53K from lower memory... Note: At one point was playing with I believe the ESP32, where I wanted a backing buffer for ILI9341 and did something similar as you could not allocate that much memory in one allocation...
 
Last edited:
The wonders of fragmentation . . . :)
So glad the 1062 came around for an improvement! Having the doubled RAM will make the alternate areas big enough to use when we get there. But still confusion with control over what is where.

If the TLA is right (D)TCM is the No Wait memory portion - that doesn't need caching - the larger half portion not in that area open to the user will be better with cache hits - but data cache is only 32KB of the 512KB(+?) area outside that region. Then of course there is the read only Flash code space. [ it seems Paul noted that 512K of TCM is used to cover other things as well ? ]

Does the full 1MB of different types of RAM appear as a contiguous area to the 'sketch'? Does it make sense to build a second 'malloc' for FAST .vs. slow RAM that might want with or without cache ?
 
My thought was to make the heap (malloc) cacheable, dmamem with cache disabled. And to provide functions that can en- or disable the caching , or even reconfigure the MPU completely. Its the users's T4, no harm will occur, so why should'nt he.. :)
 
Last edited:
@Frank
I implemented the clock function that I posted to test what frequency it was giving me versus what @WMXZ code gave me:
Code:
WMXZ   -  AudioPll Clock: 768000000
SDK      -  AudioPll Clock: 786480000
The 2 are close but the sdk is what I believe is recommended.

EDIT: Haven't put a scope on the clk pin yet - have to set it up - don't exactly have much room on my desk to keep it setup permanently
 
... It the users's T4, no harm will occur, so why should'nt he.. :)

Not sure how to read that?

With the two RAM pools 'no wait' under TCM and the 'other w/wait' RAM I assume the no wait part would be best for DMA and not ever be under Cache right? Of course the internal bus lines may such that DMA to area outside the TCM will work as well since the DMA won't be running at 'full speed' at a mere 30 to 100 MHz for SPI for example. That was the nature of my wonder yesterday about reading FLASH to RAM - if FLASH has waits do the wait periods get effectively shared/overlap or doubled one after the other.
 
Thanks Frank,

But I am thinking more about the end user and what do they need to do and/or if the libraries need to reflect on this and handle it.

So the current definition of DMAMEM - imiplies it does not work with DMA? Unless you force the cache to work for you?

Good point, and I think the same. DMAMEM should provide memory that works with DMA... :)
Edit: If I want some caching (like write-trough), I can a) modify the MPU-Setting or b) just use malloc (if the heap is cacheable)
 
Status
Not open for further replies.
Back
Top