Could there be something like an ISR template function?

Why worry about C when you can just use C++ callbacks from C. That is what I do and then I get the benefits from either.

Yes, you would still need to compile some of the core with G++, but who cares if you still get a C API anyway?
Basically speaking all you need to do is disable the name mangling in the cpp file to get the C API... you do this with:
extern "C" {
// whatever mixed C with access to C++ namespaces, etc...
}

...and the same goes in the header file, with the needed #ifdef for C++, Simple, really.
 
Some comments about Frank B's code (and the library that basically implements his idea).
Reading the "Definitive Guide to-ARM Cortex-M3, 2nd ed" I came across this:
After the vector table is ready, we can relocate the vector table to the new one in the SRAM. However,
to ensure that the transfer of the vector handler is complete, the DSB instruction is used.
p. 191ff.: 11.4 Example of Vector Table Relocation.
I've included that DSB instruction in the GitHub library.
The DSB instruction is a simple ASM instruction (asm volatile ("dsb"); ).

and

Second, the interrupt vector table should also be put into the code region, if possible. Thus, vector
fetch and stacking can be carried out at the same time. If the vector table is located in the SRAM, extra
clock cycles might result in interrupt latency because both vector fetch and stacking could share the
same system bus (unless the stack is located in the code region, which uses a D-Code bus).
p. 207: 12.4 Performance Considerations.

So moving the vector table should only happen when it's really necessary, that is, only when somebody actually ask for a dma channel. This is also implemented now in the library.
 
Thank you, these instructions are not mentioned in the manual.

Second, the interrupt vector table should also be put into the code region, if possible. Thus, vector
fetch and stacking can be carried out at the same time. If the vector table is located in the SRAM, extra
clock cycles might result in interrupt latency because both vector fetch and stacking could share the
same system bus (unless the stack is located in the code region, which uses a D-Code bus).
p. 207: 12.4 Performance Considerations. .

Isn't this the case ?
The stack is located in the upper sram, whereas the datasection is located in lower sram.
Of course, you should NOT relocate the vectortable to the stack-area... :)

Linkerscript:
Code:
_estack = ORIGIN(RAM) + LENGTH(RAM);

Reference Manual:
3.5.3.4 SRAM accesses
The SRAM is split into two logical arrays that are 32-bits wide.
• SRAM_L — Accessible by the code bus of the Cortex-M4 core and by the backdoor
port.
• SRAM_U — Accessible by the system bus of the Cortex-M4 core and by the
backdoor port.

So moving the vector table should only happen when it's really necessary, that is, only when somebody actually ask for a dma channel.

Uhh.. again: this is not only for DMA useful!
 
Last edited:
I'm very new to this, so I wasn't sure where the vector table is placed.
I see in the reference manual that SRAM_L: Lower SRAM (ICODE/DCODE), so indeed this is not a problem.

Also, I forgot to add that we should follow the convention that the DMA channel that you are assigned should be used for the DMAMUX channel too, to avoid conflicts there.
 
I've included that DSB instruction in the GitHub library.
The DSB instruction is a simple ASM instruction (asm volatile ("dsb"); ).

you probably also want a "Data Memory Barrier" ( asm volatile ("dmb"); ) after you write your vector table to sram to "ensures that all explicit memory accesses before the DMB instruction complete before any explicit memory accesses after the DMB instruction start." http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka14041.html

Though the Cortex-M doesn't reorder memory access but dynamically setting up DMA access can have ill effect. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dai0321a/BIHEDAAF.html

This would make it more portable to other arm processor that reorder memory access.
 
Frank B:
My library is concerned only about the DMA stuff, but I agree that this is very useful in general. It was also very instructive for me to read about all this stuff that I had no idea about!

duff:
4.3 Instruction Descriptions, Table 4.27, p. 67:
InstructionDescription
DMBData memory barrier; ensures that all memory accesses are completed before new memory access is committed
DSB Data synchronization barrier; ensures that all memory accesses are completed before next instruction is executed
ISB Instruction synchronization barrier; flushes the pipeline and ensures that all previous instructions are completed before executing new instructions

So as I understand it, DSB is actually a "step further" than DMB, because no instructions of any kind will be executed until the memory transfer finishes (so no interrupts taking place), while DMB only guaranties that no new memory transfer will take place until the memory transfers above are done (so interrupts may still be raised!).
 
...so here is v3:

Code:
//Blink faster than expected for delay(500)

//See http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0553a/Ciheijba.html


#define _NUM_INTERRUPTS (NVIC_NUM_INTERRUPTS + 16)
#define IRQ_SYSTICK 15

void (* _vectorsRAM[_NUM_INTERRUPTS])(void)   __attribute__ ((aligned (512))); //on teensy 3.0, we can use align(256)

const int ledPin = 13;

//Our new ISR
void my_new_systick_isr(void)
{
	systick_millis_count+=8; //make it faster :-)
}

void setup() {
  pinMode(ledPin, OUTPUT);
 
  // Copy Vector Table to RAM
  uint32_t *src = (uint32_t*) SCB_VTOR; 
  uint32_t *dst = (uint32_t*) &_vectorsRAM[0];
  int i=0;
  while (i++ < _NUM_INTERRUPTS) *dst++ = *src++;

  //Switch to RAM 
  SCB_VTOR = (uint32_t) &_vectorsRAM[0];
  
  //Assign new systick
  asm volatile ("dmb");
  _vectorsRAM[IRQ_SYSTICK] = &my_new_systick_isr;
  asm volatile ("dsb");
}

void loop() {
  digitalWrite(ledPin, HIGH);   // set the LED on
  delay(500);
  digitalWrite(ledPin, LOW);    // set the LED off
  delay(500);
}

But i do not know, if this is really needed :).
Is'nt the 32bit write to VTOR atomic ? Can an interrupt occur during this write ?
 
Last edited:
What I understood from the manual (and what they have as an example code) is more on these lines:
Code:
    // Copy Vector Table to RAM
    uint32_t *src = (uint32_t*) SCB_VTOR;
    uint32_t *dst = (uint32_t*) &_vectorsRAM[0];
    int i=0;
    while (i++ < NUM_INTERRUPTS) *dst++ = *src++;

    /*
    After the vector table is ready, we can relocate the vector table to the new one in the SRAM. However,
    to ensure that the transfer of the vector handler is complete, the DSB instruction is used
    See Definitive Guide to-ARM Cortex-M3, 2nd ed, p. 191ff.: 11.4 Example of Vector Table Relocation.
    */
    asm volatile ("dsb"); // Data synchronization barrier


    //Switch to RAM
    SCB_VTOR = (uint32_t) &_vectorsRAM[0];
and
Code:
//Assign new isr
    _vectorsRAM[ISR_OFFSET + i] = new_isr;
    asm volatile ("dsb"); // Data synchronization barrier

I think that the idea of that dsb instruction is to make sure that all the vectors are copied before changing the SCB_VTOR, in case "SCB_VTOR = (uint32_t) &_vectorsRAM[0];" is executed before the copying finishes (so an interrupt could be called using the new vector before the isr has been copied).
The assembler example they give is (only an excerpt):
Code:
...
NVIC_VECTTBL EQU 0xE000ED08 ; Vector Table Offset Register
...

; Copy old vector table to new vector table
LDR r0,=0
LDR r1,=VectorTableBase
LDMIA r0!,{r2–r5} ; Copy 4 words
STMIA r1!,{r2–r5}

DSB ; Data synchronization barrier.

; Set vector table offset register
LDR r0,=NVIC_VECTTBL
LDR r1,=VectorTableBase
STR r1,[r0]
...

DSB ; Data synchronization barrier. Make sure
; everything ready before enabling interrupt
MOV r0, #0 ; select IRQ#0
BL EnableIRQ

...
 
First, let me apologize for ignoring this important thread far too long. Maker Faire took time, and then I wanted to get 1.19 released (lots of bug fixes) and the remaining incompatible libraries ported before introducing substantial changes to the Teensy 3.x platform. At least that's all done! :)

I *still* have a big box of breakout boards and other hardware in need to compatibility testing/porting, many libraries working but not documented on the website or included in the Teensyduino installer, and many patches and neglected plans on the Audio library, and on top of all that, maybe someday I'll even design new PJRC products.....

Even with all that other stuff pending, I believe it's important to get these features into the Teensyduino core library. Long term, we really do need these features to build awesome libraries that "just work" when used together. This really needs to be in the core library as a long-term stable API.

Here's my current thoughts. At this moment, pretty much everything is up for discussion.

First, I want to move the interrupt vectors to RAM. I'm not very excited about dynamically changing after C++ constructors and/or user or non-core library code has run. The downside is we "lose" 252 bytes of RAM on Teensy 3.0 and 448 bytes on Teensy 3.1, or maybe a bit more if the linker doesn't pack things efficiently with the memory alignment requirements. I'm considering implementing this very early in the startup code.

I'm also considering a DMA class for the core library, which would be meant to replace hard-coded access to the DMA_TCD and DMAMUX registers, with a nice Arduino-style API. The constructor would manage a static bitmap, similar to Pedvide's DMAControl library.

Eventually, all the DMA-based libraries will need to be updated. So far, these are the ones I know exist: OctoWS2811, Audio, SmartMatrix, ADC. Are there others out there?
 
First, I want to move the interrupt vectors to RAM. I'm not very excited about dynamically changing after C++ constructors and/or user or non-core library code has run. The downside is we "lose" 252 bytes of RAM on Teensy 3.0 and 448 bytes on Teensy 3.1, or maybe a bit more if the linker doesn't pack things efficiently with the memory alignment requirements. I'm considering implementing this very early in the startup code.

I see the latest commit on github has the interrupt vectors in ram, so a question, is this going to change how we implement ISR's in our code? What i mean do i need to define my own ISR functions or does it still work like before, such as.
Code:
[FONT=Menlo][COLOR=#bb2ca2]void[/COLOR] dma_ch0_isr([/FONT][COLOR=#BB2CA2][FONT=Menlo]void[/FONT][/COLOR][FONT=Menlo]) { ... }
[/FONT]
[FONT=Menlo][COLOR=#bb2ca2]void[/COLOR] wakeup_isr([COLOR=#bb2ca2]void[/COLOR]) { ... }
etc..[/FONT]
It looks like to me that i won't have to change anything? Also out of curiosity what does having the Interrupt Vector in Ram give us? I researched this a little this morning and see one use is for a bootloader but that would not apply here since the bootloader is in the mini54. Also, loading a new application into ram would need this but probably would not apply to the Teensy 3.0/3.1 architecture also.

I'm also considering a DMA class for the core library, which would be meant to replace hard-coded access to the DMA_TCD and DMAMUX registers, with a nice Arduino-style API. The constructor would manage a static bitmap, similar to Pedvide's DMAControl library.
I think this would be very beneficial if this was in the core. I have a few libraries that use DMA that could really benefit from this. I can update them myself once i see how you do this.
 
Also out of curiosity what does having the Interrupt Vector in Ram give us?

It allows someone to override the core ISRs without having to modify the core code. I have one project that uses it's own handler for pin interrupts. Now consider my options for building this:

  1. Since the port ISRs are part of core, I can modify the core code (modify Teensyduino files), and get it to compile. Sure that works for that one project, but say I want to compile a project that uses the standard handler functions. For that I would have to revert all the changes back to original code, or maintain multiple Teensyduino installs.
  2. Alternatively if the vector table is in RAM, my code can swap out the default handler functions without having to modify the Teensyduino code at all. It is somewhat less efficient in Flash since the default handlers will be there even when they are not used, but frankly Flash is plentiful on these parts, and time is not (eg. the hassle of doing it any other way).
It actually doesn't matter if the core vector table is in RAM or not. The procedure has been shown how to move it from Flash to RAM anyway. The arguable benefit of having it in RAM by default is that one could wrap a simplified API around the code for changing a particular vector. Although behaviorally the only simplification of such code is that it doesn't need to move the vector table to RAM as a first step, so really it seems minimal benefit of having it there by default (I suppose there is a corner case where it will flag potential out-of-memory at compile time, versus doing it on-the-fly).
 
Yes, the main benefit to a RAM-based vector table is the ability to dynamically attach different interrupt routines.

Of course, this is possible with a performance penalty by creating a fixed interrupt routine that calls a function through a pointer. On AVR, that's the only way to accomplish such a feat. Today IntervalTimer and attachInterrupt() work this way. Eventually, they'll probably be updated, though I'm not quite sure how it can be done with IntervalTimer to maintain compatibility with existing functions that don't touch the PIT timer.

For DMA, the extra overhead of first running a flash-based ISR that reads RAM and does an indirect call seems too much. Some libraries, like OctoWS2811 and Audio might be able to tolerate the extra overhead, since they configure fairly large DMA buffers. But even then, the whole point of DMA is higher performance, so burdening all DMA interrupts with so much extra overhead.

The good news is this scheme has perfect compatibility with existing code. You can still use the pre-defined isr names.

Normally, you should not write directly to the vector table. I'm going to create a version of attachInterrupt for this. I still have some fiddling to do with converting the ISR numbers from plain constants to an enum as part of that effort.
 
Paul,
can you make the RAM-based vector table optional ?
Something like #ifndef VECTOR_TABLE_FLASH

So, an [experienced] user can decide to switch of the RAM-Copy to save RAM if it is not used.
 
Last edited:
I'm trying to ascertain the state of the DMA, and I ran into an unexpected problem. Is it correct that the DMAMUX_CHCFG* registers are write-only?

If I try reading them it freezes, eg:
Code:
void setup() { Serial.begin(115200); }
void loop()
{
    while(!Serial.available());
    Serial.print("This is before\n");
    uint8_t val = DMAMUX0_CHCFG0;
    Serial.print("You'll never see this\n");
}
 
The DMAMUX, like most peripherals in this chip, is disabled by default. You must enable it before you can access any of its registers. Any access to disabled peripherals causes a hard fault.

Try running this:

Code:
void setup() {
  Serial.begin(115200);
  SIM_SCGC6 |= SIM_SCGC6_DMAMUX;
}
void loop()
{
    while(!Serial.available());
    Serial.print("This is before\n");
    uint8_t val = DMAMUX0_CHCFG0;
    Serial.print("You'll never see this\n");
    delay(100);
}
 
I've committed work-in-progress code on DMAChannel, so you can see what I'm thinking and comment before this is finalized.... ;)

https://github.com/PaulStoffregen/cores/blob/master/teensy3/DMAChannel.h

The basic idea is you create instances of this object for each DMA channel you want to use. The object constructor manages which channels are in use, with a simple bitmask, similar to Pedvide's getChannel() function.

The object has the channel number and a reference to the TCD as public members. If you want direct control of the DMA channel, you can use them through the object. For example:

Code:
        myDMA.TCD.SADDR = frameBuffer;
        myDMA.TCD.SOFF = 1;
        myDMA.TCD.ATTR = DMA_TCD_ATTR_SSIZE(0) | DMA_TCD_ATTR_DSIZE(0);
        myDMA.TCD.NBYTES = 1;
        myDMA.TCD.SLAST = -bufsize;
        myDMA.TCD.DADDR = &GPIOD_PDOR;
        myDMA.TCD.DOFF = 0;
        myDMA.TCD.CITER = bufsize;
        myDMA.TCD.DLASTSGA = 0;
        myDMA.TCD.CSR = DMA_TCD_CSR_DREQ;
        myDMA.TCD.BITER = bufsize;

As you can see in DMAChannel.h, I'm trying to create an Arduino-style API for the common cases. Of course, the DMA controller has tremendously flexible options, but the really common cases are single variable, normal buffer and circular buffer, transferring either all the data at once or one item on each trigger event.

My hope is something like this can become possible, for DMA with Arduino-style simplicity :)

Code:
        myDMA.sourceBuffer(frameBuffer, sizeof(frameBuffer));
        myDMA.destination(GPIOD_PDOR);
        myDMA.options( /* TBD */ );

Of course, the code isn't finished, and I'm not entire sure how the "options" and transfer length should be handled in this API. There's more questions in the comments in DMAChannel.h. If anyone has any suggestions or opinions on how this API should develop, now is the time to speak up!!
 
This is great Paul, I'm having a look at it right now.

Maybe it's also a good idea to have some functions to change the DMA_CR, so that different kinds of priorities can be chosen (round-robin or fixed), and also a way of changing the priorities of the channels. Something simplified maybe like: channelPriotity(priority), where priority=HIGH, MED, LOW or something like that. Of course this may lead to libraries trying to claim the highest priorities for themselves...
 
Also:

// TODO: functions to configure major/minor loop
// option #1 - trigger moves 1 byte/word (minor=1, major=count)
// option #2 - trigger moves all data (minor=count, major=1)
// option ?? - more complex config, write TCD manually....

option 2 is not really good I think because a minor count can't be interrupted. Using option 1 allows the DMA to change to other channels when the minor loop is done, so that a channel doesn't block the DMA transfers for a long time.
 
The DMAMUX, like most peripherals in this chip, is disabled by default. You must enable it before you can access any of its registers. Any access to disabled peripherals causes a hard fault.

Ah, thanks, I figured it was something basic like that, although I was unaware clock gating had to be on even for read access.

Regarding DMA, one of the reasons I was trying to ascertain state was that I wanted to use it to initialize an in-use channel list (as opposed to assuming everyone is using the dynamic DMA system from the get-go). This information must be available in the registers already.

I realize this might be tricky or maybe not tractable, but one idea I had was to initialize the DMA state at the end of setup(). At that point any libraries that had hardcoded DMA usage would have allocated their channel, and dynamic allocation could then use whatever was left. This might smooth compatibility issues with existing hardcoded libraries. I'll code something up as an example.

Also, I want to incorporate this new system into I2C, but one thing I'm unclear on is how to deal with error conditions. My guess is in an error condition the peripheral stops asking for data, and the DMA side just hangs there in the middle of what it was doing. Is that about right? If so I would need some method of properly clearing it out and starting again - or is simply starting a new transfer later sufficient?
 
option 2 is not really good I think because a minor count can't be interrupted. Using option 1 allows the DMA to change to other channels when the minor loop is done, so that a channel doesn't block the DMA transfers for a long time.

But that's a choice to be made by the end-user. If I have a data acquisition loop running via DMA, I don't want it to be interrupted.
 
Of course, the code isn't finished, and I'm not entire sure how the "options" and transfer length should be handled in this API.


For my applications now i would need to be able to change some options dynamically in the DMA's ISR or library functions, things like destination address and transfer length. It would be nice if I could have access invidual DMA registers through this api (maybe an advanced options list).
edit... never mind i see you can do this from your post above!!!!!

I have a situtation where I change the pointer address of the DMA_TCD(n)_DADDR that is a (uint8_t buffer[x]) if a certian byte is received through the UART. I know i can setup another DMA to control this but i'm trying not use up more DMA channels as possible and that way works good for me and is really simple.

And another thought is maybe setup some #ifdef's to cause a compiling error if there are not any DMA channels left? Not sure if this would possible but would make debugging alot easier than trying to track down the hard fault source.
 
Last edited:
Back
Top