Teensy 4.0 First Beta Test

mjs513 · Mar 30, 2019

@defragster

Probably because of the datum setting being used in the set up:

Code:

tft.setTextDatum(BL_DATUM);

BL is bottom left. Options available to you are:

Code:

TL_DATUM = Top left
TC_DATUM = Top centre
TR_DATUM = Top right
ML_DATUM = Middle left
MC_DATUM = Middle centre
MR_DATUM = Middle right
BL_DATUM = Bottom left
BC_DATUM = Bottom centre
BR_DATUM = Bottom right

You could try using MC_DATUM. Or try not using any but you may have to adjust the y-positions. Oh I also did a modification to the file to shift the plot over by 80 pixels so its not in the middle of the screen. Attached:
View attachment TFT_BUDDHABROT.zip

Fair warning the writing the image to the SD Card is in there which is still a work in progress, so you will get a sd card not initialized on startup you can just comment that out - it still runs though

Right now - it does write 1 image then hangs on following images, tried delay but same same. I put a bunch of prints to see where but then the T4 just kept on resetting. Have a different sd breakout board coming today so will see it that helps. Also, have to figure out color spaces - argh!

EDIT: I just tried writing multiple files to the T3.6 builtin sd card and it did exactly the same thing - wrote one file and then stopped - on now for some debugging of the function

EDIT2: OK found the problem - malloc. :

Code:

  // create image data; heavily modified version via:
  // http://stackoverflow.com/a/2654860
  //unsigned char *img = NULL;            // image data
  //if (img) {                            // if there's already data in the array, clear it
  //  free(img);
  //}
  //img = (unsigned char *)malloc(3*imgSize);

Just made it a global array - found the issue on the forum

mjs513 · Mar 30, 2019

@Paul

I know this may be a little late in the game for the new breakout board but would it be possible to breakout the 5v pin to one of the pins on the breakout board? In not - not to worry

Thanks
Mike

KurtE · Mar 30, 2019

defragster said:
@Paul: Are there any 'components' on Beta Breakout board in the path of SPI pins :: 9/10/11/12/14[13] ?
With DVM between T4 header and the '7 wide' header set:: I find T4[13] connects '7 wide' #14 as expected. And 10 and 12 are connected, but I get no continuity on 9 and 11.

Hi @defragster,

If you are trying to use the T4 breakout board, with the pins that are setup to plug in an Audio board, the pins are there, but not all in their normal positions:
But they are all on pins that are labeled:

I put some of this in the usbhost_t36 example program Packman:

/* Teensy 4.0 Beta Pins */
/* 23 = RST (Marked MCLK on T4 beta breakout) */
/* 10 = D/C (Marked CS) */
/* 9 = CS (Marked MEMCS) */

The pins: 11,12, 13 are on their normal names: MISO, MOSI, SCK
But these are not all in normal spots...

KurtE · Mar 30, 2019

mjs513 said:

EDIT2: OK found the problem - malloc. :

Code:

  // create image data; heavily modified version via:
  // http://stackoverflow.com/a/2654860
  //unsigned char *img = NULL;            // image data
  //if (img) {                            // if there's already data in the array, clear it
  //  free(img);
  //}
  //img = (unsigned char *)malloc(3*imgSize);

Just made it a global array - found the issue on the forum

Which issue was this? Was it that malloc/free is not working? Or is it due issues of how the memory access is configured?

For example if I print out the address of your exposure array: Serial.printf("Address of exposure: %x\n", (uint32_t)exposure);

I see:

Code:

Address of exposure: 20001f94
Initializing SD card...initialization failed!

But if I try to move this to DMAMEM: DMAMEM byte exposure[dim*dim];

Code:

Address of exposure: 20200000

The image shown on the screen is garbage

and likewise if I change exposure to malloc:

Code:

byte *exposure; //[dim*dim];
...
  exposure = (byte*)malloc(dim*dim);
  Serial.printf("Address of exposure: %x\n", (uint32_t)exposure);
  Serial.print("Initializing SD card...");

Code:

Address of exposure: 20200008
Initializing SD card...initialization failed!

The malloc succeeds, but again the image is garbage.

Note: I remember issues with using DMA with this memory in current default configuration, where I could sort of fix by calling off to flush, with calls like:

Code:

  arm_dcache_flush(exposure, sizeof(exposure));

Before I called off to do the DMA operation. But again not doing a DMA operation here. I believe it is simply reading and writing values into this array...

Frank B · Mar 30, 2019

OCRAM is buffered by the cache. a) You need to use Paul's provided functions (or their code) or b) disable the cache in a way that is sufficient.
Disabling the cache globally in the official core is not the way we should go. It can be really useful. And I learned that it is able to detect memset() ( and automatically switches to writethrough in this case) so my biggest concerns are gone

mjs513 · Mar 30, 2019

@defragster

I am not sure what the problem is with free and malloc or the issue is with //unsigned char *img = NULL;. Or even a memory issue. I do know it if use that construct it stops the sketch on the T4 or the T3.6. I tried it on both and it had the same effect. When I changed it to hard code the img array it worked to create multiple bmp's on the SD card.

Here is something else that has to be worked on (by me). On the T3.6 if I write to Builtin SD card the image is worse than if I used an external SD Card. Strange.

As for the image being garbage, I agree, and I think that is related to color space used to draw the image vs the colors saved to the BMP. Haven't sorted that one out yet. Right now this is all a work in progress. Right now getting ready to test it on the T4, so that one is next.

@Frank B.

OCRAM is buffered by the cache. a) You need to use Paul's provided functions (or their code) or b) disable the cache in a way that is sufficient.

Ok what provided functions - I am lost on that one. Which is normal for me

mjs513 · Mar 30, 2019

@defragster
Now, on the T4 is I specify the img array size I am back to the DTCM error. If I try the flush it draws the pic and keeps resetting on the second image save.

KurtE · Mar 30, 2019

Frank B said:
OCRAM is buffered by the cache. a) You need to use Paul's provided functions (or their code) or b) disable the cache in a way that is sufficient.
Disabling the cache globally in the official core is not the way we should go. It can be really useful. And I learned that it is able to detect memset() ( and automatically switches to writethrough in this case) so my biggest concerns are gone

Frank, I might agree with you on things that the user has done that is non-standard, like marking DMAMEM...

However I am not sure I agree with the idea, that all of the code including libraries that use malloc don't work right...

That is for example in this program memory access to the array like:

Code:

        exposure[ix*dim+iy]++;
      maxexposure = max(maxexposure,exposure[i*dim+j]);
      float ramp = exposure[i*dim+j] / (maxexposure / 2.5);

I believe these are all of the access to this array statements... So you would think it would behave the same regardless of how the memory was allocated?

And I know for example with my ili9341_t3n library the code worked differently when I used DMAMEM or malloc versus normal array. Now I think I was able to resolve some of this by the call I mentioned above: arm_dcache_flush()... On the frame buffer before I started a DMA operation to do the update (one shot). But I don't know how that would work if I turned on continuous DMA updates? Who is going to flush the cache back to memory?

And likewise I don't have the confidence that graphic operations all get through after all it is failing in the above program? Maybe it is because read/write operations (++) Where most of the simple operations maybe only do writes? But then if has graphic primitives that do smoothing?

Now if you are telling me, that there is some form of operation that I can perform, like arm_dcache_dont_cache(loc, size); then there may be work around...
But then maybe it should work the other way around: That is by default things like malloc work the same as on most every other processor and if you wish to enable some faster caching, you do something like: arm_dcache_enable_cache(mem, size);

manitou · Mar 30, 2019

defragster said:
No working T4 ILI9341 wired yet - but the number of cycles for T_3.6 versus T4 follows running the batch of exposures.

T4 cycles :: 232593
T_3.6 Cyc:: 462005
T_3.5 Cyc:: 462524

Not only are there more cycles/sec on T4 - but the calc requires ~half the cycles. All compiled with Std Opt

I have some comparative mandelbrot computation numbers on post #204

Frank also has a mandelbrot demo for ILI9341
https://github.com/FrankBoesing/ILI9341_DEMOS

mjs513 · Mar 30, 2019

@KurtE and @defragster and @Frank B
Ok got it working on the T4. Got the issue resolved with malloc yes it works. Think the issue lies it how the original author used free. I got around the issue by doing this:

Code:

unsigned char *img = NULL;

void setup() {
  ….
 …..

  img = (unsigned char *)malloc(3*w*h);
  memset(img,0,3*w*h);

}

By going back to the original code on stackoverflow the author also had the color arrays slightly backwards. Fixing it I get the following image from the BMP on the SD card:
View attachment 9PX_0016.bmp
A lot better. The fix for malloc also came from the original code on stackoverflow.

@manitou - will check the Mandelbrot sketches out.

Thanks for the reference.

EDIT: Here is the updated "Butterbrot" sketch

View attachment TFT_BUDDHABROT.zip

KurtE · Mar 30, 2019

mjs513 said:
@Frank B.
Ok what provided functions - I am lost on that one. Which is normal for me

Hi @mjs513,

The only ones I know of are in: imxrt.h
That is they have these functions:

Code:

// Flush data from cache to memory
//
// Normally arm_dcache_flush() is used when metadata written to memory
// will be used by a DMA or a bus-master peripheral.  Any data in the
// cache is written to memory.  A copy remains in the cache, so this is
// typically used with special fields you will want to quickly access
// in the future.  For data transmission, use arm_dcache_flush_delete().
__attribute__((always_inline, unused))
static inline void arm_dcache_flush(void *addr, uint32_t size)

// Delete data from the cache, without touching memory
//
// Normally arm_dcache_delete() is used before receiving data via
// DMA or from bus-master peripherals which write to memory.  You
// want to delete anything the cache may have stored, so your next
// read is certain to access the physical memory.
__attribute__((always_inline, unused))
static inline void arm_dcache_delete(void *addr, uint32_t size)


// Flush data from cache to memory, and delete it from the cache
//
// Normally arm_dcache_flush_delete() is used when transmitting data
// via DMA or bus-master peripherals which read from memory.  You want
// any cached data written to memory, and then removed from the cache,
// because you no longer need to access the data after transmission.
__attribute__((always_inline, unused))
static inline void arm_dcache_flush_delete(void *addr, uint32_t size)

I have hacks in SPI.cpp that if the memory is above a certain value and you do a DMA operation, it will
call the arm_dcache_flush on the source buffer and arm_dcache_delete on the destination buffer, which worked for getting a simple display to work (like Teensyview) using the SPI dma operations...

But these three walk every 32 bytes in the location/size you specify and do something like: SCB_CACHE_DCCMVAC = location;
in the case of flush...

What I have not seen is the ability to simply turn off the cache for a range of values.

But I have not studied the cache section of manual to see what can be done... I do see several registers defined:

Code:

#define SCB_CACHE_ICIALLU	(*(volatile uint32_t *)0xE000EF50)
#define SCB_CACHE_ICIMVAU	(*(volatile uint32_t *)0xE000EF58)
#define SCB_CACHE_DCIMVAC	(*(volatile uint32_t *)0xE000EF5C)
#define SCB_CACHE_DCISW		(*(volatile uint32_t *)0xE000EF60)
#define SCB_CACHE_DCCMVAU	(*(volatile uint32_t *)0xE000EF64)
#define SCB_CACHE_DCCMVAC	(*(volatile uint32_t *)0xE000EF68)
#define SCB_CACHE_DCCSW		(*(volatile uint32_t *)0xE000EF6C)
#define SCB_CACHE_DCCIMVAC	(*(volatile uint32_t *)0xE000EF70)
#define SCB_CACHE_DCCISW	(*(volatile uint32_t *)0xE000EF74)
#define SCB_CACHE_BPIALL	(*(volatile uint32_t *)0xE000EF78)

mjs513 · Mar 30, 2019

@KurtE

Thanks for the reference - as many times I was in imxrt.h I never noticed them

. Have to keep a note and figure out when and how to use them now - argh - another project

KurtE · Mar 30, 2019

@mjs513 and @frankB and @manitou and @Paul and others ...

Again there may (and probably are) many other things here that I don't know about or understand.

I know earlier I was able to get upper frame buffer to work by using @Frank's work around:
https://forum.pjrc.com/threads/54711-Teensy-4-0-First-Beta-Test?p=194674&viewfull=1#post194674

I know that @manitou described some stuff about disabling the cache, like in the posting: https://forum.pjrc.com/threads/54711-Teensy-4-0-First-Beta-Test?p=197430&viewfull=1#post197430

But again not sure if there is the ability to turn off a range?

And again @mjs513, in the case of this program, you are not doing anything special like using the malloced memory to output directly to device or the like, so by default IMO in should just work...

mjs513 · Mar 30, 2019

@KurtE

It should just work. But the problem seems to be the constant creating of the img array using malloc. I just ran a couple of tests to check:
1. I added this to the old function causing me the issue:

Code:

    unsigned char *img = NULL;            // image data
  if (img) {                            // if there's already data in the array, clear it
    free(img);
    Serial.println("IMG Freed");
  }
  img = (unsigned char *)malloc(3*imgSize);

This test never passes so "free" is never called. So if read the other thread right the heap just keeps growing?

2. If I just remove the "img" array test and just call "free", the T4 just keeps resetting itself.

Looking at the original stackoverflow code the malloc is only done once, basically in setup() like I showed in the earlier post. So it has to be something with the constant creating img array and then calling malloc in a loop fashion. Just another reference point.

KurtE · Mar 30, 2019

@mjs513 - Actually I think the code is busted....

That is you are always assigning NULL to img, so you will never actually free the old one...

You probably meant to write: static unsigned char *img = NULL;

So that it is only assigned to NULL once and then next time you call it will free the old one...

Or better yet, maybe do the free right around where you do the file.close() call

mxxx · Mar 30, 2019

slightly OT, my apologies -- does anyone (or Paul) have the part-number for the pogo pins used on the beta breakout-board? i can't seem to find the right height on mouser ... thanks!

mjs513 · Mar 30, 2019

@mxxx
Don't know the mouser number but these are the pins that Paul used:

I believe they were this one, 12.5 mm height.

https://www.aliexpress.com/item/100p...910364279.html

mjs513 · Mar 30, 2019

@KurtE

Yep - that was it. Like you said it was broke. Put this in the global declares: " unsigned char *img = NULL; // image data" and then did the malloc before using "img" and just put "free" right before "file.close".

PaulStoffregen · Mar 30, 2019

mxxx said:
does anyone (or Paul) have the part-number for the pogo pins used on the beta breakout-board?

They came from this Chinese vendor.

https://www.aliexpress.com/item/100...ctor-9-5-10-0-11-0-11-5-12-0/32910364279.html

I believe most were the 12.5 mm height.

PaulStoffregen · Mar 30, 2019

Just a quick update - the contract manufacturer put the chips onto the 6 layer boards... and I've been soldering the rest of the parts. The boards look awesome! Robin and I keep having a "double take" when we glance at them. They kind of look like a Teensy 3.2 from a distance, but then you notice it's different. The arrangement of parts on the top side isn't changed much from the center region of the first beta, but still, seeing it now actually in the Teensy form factor looks awesome.

It's only been a few months since I brought up the first round of beta boards, but it kinda feels like a lifetime ago. And ~2090 messages ago! At first I had a terrible sinking feeling, when the first 2 boards weren't working at all. A few hours into troubleshooting, I found my saved notes about the test procedure, which involves the important first step of setting a couple of the OTP fuses. With all this focus on Arduino stuff, and then on the PCB layout, I'd mostly forgotten that process which is based on stuff from my earliest experiments with the chip (long before the bootloader).

I'm still working on a couple unknown issues involving startup & rebooting. The 1062 chip is so similar, but does have minor differences.

I'm also still waiting on PCBs for pogo pins to mate with the new boards. For these first 2 boards, I've got a little cable harness soldered to test points. My hope is for those first test fixture boards sometime next week. We have several of these boards set aside for everyone who's been highly active on this thread. I believe we'll be able to ship them in about 10 days. Another larger batch (not involving me hand soldering) will happen in about 4 weeks. At that time, we'll make 1062-based boards available to all beta testers.

Just to give you a realistic (and perhaps somewhat disappointing) expectation, over the next 3-4 weeks almost all of my time & attention is going to go into the bed of nails test fixture and a variety of small but critically important issues dealing with the final PCB panel shape and how our contract manufacturer is going to work with this new board. This stuff needs to happen now, if we're going to be able to actually release the product and ship in volume.

I know there are a lot of issues that have come up here. I know it's frustrating when I don't get to pull requests and replying on this thread promptly. Please try to be patient, and please understand this is going to get worse over the next several weeks as we enter the final stretch of manufacturing & testing stuff. Then it will get so much better, once we're up and running with a proper test fixture and all the little manufacturing details are finalized. Then I'll be able to pour all my time into the software & documentation side... and there is so much I want to do there. Please keep posting here about any issues. I will be reviewing *all* the messages on this mammoth thread.

mjs513 · Mar 30, 2019

@PaulStoffregen

I know there are a lot of issues that have come up here. I know it's frustrating when I don't get to pull requests and replying on this thread promptly. Please try to be patient, and please understand this is going to get worse over the next several weeks as we enter the final stretch of manufacturing & testing stuff. Then it will get so much better, once we're up and running with a proper test fixture and all the little manufacturing details are finalized. Then I'll be able to pour all my time into the software & documentation side... and there is so much I want to do there. Please keep posting here about any issues. I will be reviewing *all* the messages on this mammoth thread.

There will always be issues and you get to them when you get them considering everything else you are doing at the same time. We are here to support you too! Think we are resting up now for round 2

manitou · Mar 30, 2019

KurtE said:
But again not sure if there is the ability to turn off a range?

All I've seen are commands to turn cache on/off, and to flush/delete address ranges in the cache. In T4 core startup.c, configure_cache() uses MPU commands to enable cache for various memory regions.

Code:

          0-512k                     ITCM no cache   text  linker 128K
          0x00200000 | REGION(1); // Boot ROM  128K  WT cache
          0x20000000 | REGION(2); // DTCM  no cache 512k   NOEXEC  linker 128K
          0x20200000 | REGION(3); // OCRAM  WBWA  256K   malloc  DMAMEM   NOEXEC
          0x40000000 | REGION(4); // Peripherals  64MB no cache   NOEXEC
          0x60000000 | REGION(5); // QSPI Flash  64MB  WBWA

so ITCM, DTCM, and peripheral memory have cache disabled. Maybe if you can figure out how to allocate memory for your data in ITCM and DTCM, then that data will have cache disabled.

references ?
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0646b/BABIGDDC.html
https://www.nxp.com/docs/en/application-note/AN12042.pdf

KurtE · Mar 30, 2019

mjs513 said:
@PaulStoffregen
There will always be issues and you get to them when you get them considering everything else you are doing at the same time. We are here to support you too! Think we are resting up now for round 2

2nd that! I am always amazed at how much gets done by you (and Robin)!

defragster · Mar 30, 2019

Paul - great to hear the 1062 board coming together - at least the T4 MCU won't have legs so T_3.2 confusion should be easy to in check.

Have to agree with KurtE and mjs513 - keeping the finish line aligned where the hardware scheduling is the critical path - keep up the good work! Beta1 hardware is impressive and usable - glad it is happening soon enough you just have to remember prior build lessons and not relearn them in full

- I made the post #2074 BUGBUG note as it was a reproducable way of requiring the 15s reset. And to do that it took that line add from FrankB to allow LTO usage where otherwise the build breaks when it drops out the __ASM jump for the HardFault Handler. That either points to linker issue or something needing attention to avoid the 15s reset. That's all software and subject to change with the RAM doubling and any segment shift in the 1062

mjs513 said:
@defragster

Probably because of the datum setting being used in the set up:

Code:

tft.setTextDatum(BL_DATUM);

BL is bottom left. Options available to you are: ...

I didn't know how right I was … had not seen that Datum control in use. Glad I asked before wasting time in the code overlooking that treasure.

TL_DATUM matches my expectations for use and that fully explains my observation.

@KurtE - thanks for the pin notes - will look for those other signals by silkscreen and trust that - which I had not been doing.

@manitou - nice pointer to those prior computation numbers. When I had it in hand with Cycle Counts it made clear the added CPU improvement independent of clock speed - I was surprised the T_3.5 used the same count as I thought the 3.6 had improved cache or something that might have made a difference.

defragster · Mar 31, 2019

Has anyone noticed issues or tested the micros() change I made [to get 1 not 10 us res] based on CycleCounter for T4? I was wondering about porting that to T_3.5/3.6 - even the T_3.2 would take the same code IIRC when I did a quick test on the __LDREXW&__STREXW it uses to repeat when systick int happens, it would just require having the CycleCounter activated on startup like the T4 does. In my testing the overhead of calc against the counter versus ref against the clock stuff it does is less and should be more accurate rounding that last 'us' [given multiple 96-256 Hz of cycles to play with per us] - and does not require interrupt stop/start on each call. T_3.6_micros .vs. T_4 micros

@mjs513 - got the buddhaBrot with nice shifted display pixels clearing the text area, works well.

Compiling the T_3.6 fastest/LTO/mpure drops 460K cycles down to 405K.

If you wanted to add cycle cnt display I did this:

Code:

	if ( ARM_DWT_CYCCNT == ARM_DWT_CYCCNT ) {
		ARM_DEMCR |= ARM_DEMCR_TRCENA; // T_3.x only needs this
		ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA;
		Serial.println("CycleCnt Started.");
	}
	// SD setup
	Serial.print("Initializing SD card...");
	if (!SD.begin(cardPin)) {
		Serial.println("initialization failed!");
		return;
	}
	Serial.println("initialization done.");
}

uint32_t CCdiff;
void loop() {
	plotPlots();
	static int Dexposures = 0;
	time++;
	if (time % 30 == 0) {
		// show progress every 2 seconds or so...
		CCdiff = ARM_DWT_CYCCNT;
		findMaxExposure();
		CCdiff = ARM_DWT_CYCCNT - CCdiff;
		renderBrot();
		saveBMP();
		// show exposure value
		tft.drawString("bailout:  ", 0, 0);
		tft.drawNumber(bailout, 0, 25);
		tft.drawString("exposures: ", 0, 40);
		tft.drawNumber(exposures, 0, 60);
		tft.drawNumber(exposures - Dexposures, 0, 80);
		tft.drawString("Cycles: ", 0, 100);
		tft.drawNumber(CCdiff, 0, 120);
		Dexposures = exposures;
	}
}

<edit> - Mike - it gets boring when it runs beyond some point - I added at loop() end this for a fresh random start:

Code:

	if ( exposures > 10000000 ) {
		exposures = 0;
		memset(exposure, 0, sizeof( exposure ));
	}

Nothing specific about 10M - though it does start to get muddy about then - maybe a snapshot BMP at 9M?

Teensy 4.0 First Beta Test

Senior Member+

Senior Member+

Senior Member+

Senior Member+

Senior Member

Senior Member+

Senior Member+

Senior Member+

Senior Member+

Senior Member+

Senior Member+

Senior Member+

Senior Member+

Senior Member+

Senior Member+

Well-known member

Senior Member+

Senior Member+

Well-known member

Well-known member

Senior Member+

Senior Member+

Senior Member+

Senior Member+

Senior Member+