NAND flash support in 1.54

defragster said:
Any understanding why the writes are so close to reads in speeds? Is writing that efficient - or reading somehow hobbled
Only thing I can think of is that it reading and blocks so times are probably going to be similar. Both are using QPI by the way.

Interesting that all 128KB ( 0x8000000==134,217,728 Bytes ) is 100% error free? Either the chip maps out bad blocks - or these are all perfect?
From what I was reading about this chip that seems to be what everyone is saying. Have to read more on bad blocks :)

Would be cool - ( yeah I not should say this if not prepared to do it myself ) - if that sketch could run the same whether NOR or NAND was detected?
Thought about that but it starts getting complicated to merge the two.

Think if @Paul decides to use SPIFFS for the "Official" file system for flash then I think it might be doable. Still have time to play.
 
According to the datasheet writing within the chip is approximately 10x slower than reading, but I'm not sure how much of the time is the transfer which is probably the same speed either way. I suspect a lot of the read/write overhead might just be that it's looping through everything 4 bytes at a time. Doing it with DMA would be a better comparison (and allows them to be sort of asynchronous as well.)

Think DMA would only help if you are using SPI for transfers but I could be wrong here as opposed to QSPI. Maybe somebody else has a better explanation.
 
I was assuming that it's possible to use TFDR0 and RFDR0 with DMA transfers, but I haven't actually investigated. I'll see if I can figure it out from the datasheet.

EDIT: Yeah, you can, per page 1653
 
According to the datasheet writing within the chip is approximately 10x slower than reading, but I'm not sure how much of the time is the transfer which is probably the same speed either way. I suspect a lot of the read/write overhead might just be that it's looping through everything 4 bytes at a time. Doing it with DMA would be a better comparison (and allows them to be sort of asynchronous as well.)

Wow 10X - something not perfect?

The check42() code does read in 2048 blocks - maybe the low level code does lesser parts:
Code:
w25n01g_readBytes(ii * 2048, x42, 2048);

...
 
I went to see about block read details ... and this code seems confusing?
Code:
int w25n01g_readBytes(uint32_t address, uint8_t *data, int length)
{
...

    if (length > W25N01G_PAGE_SIZE - column) {
        [B]transferLength [/B]= W25N01G_PAGE_SIZE - column;
    } else {
        [B]transferLength [/B]= length;
    }

   flexspi_ip_read(14, flashBaseAddr + column, data, [B]length[/B]);

...
    return [B]transferLength[/B];
}

transferLength is calculated - but the full 'length' is always read requested - but then it returns the transferLength?
 
Quick update to the sketch - goes spew nuts on errors if the 0x42's are missing
Check42() seems to work for starting on buffer boundary 'zero' and incrementing by 2048==W25N01G_PAGE_SIZE?

... will have to make a malleable check24 to shift and alter the block size ...
 
I was assuming that it's possible to use TFDR0 and RFDR0 with DMA transfers, but I haven't actually investigated. I'll see if I can figure it out from the datasheet.

EDIT: Yeah, you can, per page 1653

Got it now - now using DMA to transfer to the chip but for the transfers to and from the FIFO.
 
Quick update to the sketch - goes spew nuts on errors if the 0x42's are missing
Check42() seems to work for starting on buffer boundary 'zero' and incrementing by 2048==W25N01G_PAGE_SIZE?

... will have to make a malleable check24 to shift and alter the block size ...

no more spew :) Fixe addressing with ecc turned on pageSize = dataBytes + eccBytes = 2048+64 = 2112 bytes. Made that adjustment and increased the x42 buffer to 4096:
Code:
Begin Init

Found W25N01G Flash Chip

    NAND ============================ check42() : COMPARE !!!!
  0=     255
	+++ NOT 42 Good Run of 1 {bad @ 0}
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 

DEADBEEFdeadbeef
0x44[D], 0x45[E], 0x41[A], 0x44[D], 0x42[B], 0x45[E], 0x45[E], 0x46[F], 0x64[d], 0x65[e], 0x61[a], 0x64[d], 0x62[b], 0x65[e], 0x65[e], 0x66[f], 0x0a[
], 0x00[




    NAND ========== memory map ======  ====== check42() : WRITE !!!!
		NAND length 0x8000000 element size of 1
[B]	 took 4864655 elapsed us
[/B]    NAND ============================ check42() : COMPARE !!!!
[B]	 took 2959809 elapsed us
[/B]Good, 	Found 42 in NAND 0x8000000 Times
a bit of improvement. Not sure of that one bad block :)
 
Nice - That is a better read write diff! Read is 43.7 MB/sec and write at 26 MB/sec.

The test loop on 42 fail looks unclean for counted stop printing - so just return after first error. It would do infinite spew

Bad Block ... : I think that startup error was detecting the Flash not left with 0x42's?
 
Actually found another issue!

Had to do with crossing page boundaries. Since I decided to test with 4096 with is 2 pages of data you have to load 1 page write it and then load the second page and write. Reason using buffered read/writes. Was actually only reading and writing every other page!

Going to do some rework and use BUF MODE = 0 and see what happens to read and writes. This chip is a pain.
 
Actually found another issue!

Had to do with crossing page boundaries. Since I decided to test with 4096 with is 2 pages of data you have to load 1 page write it and then load the second page and write. Reason using buffered read/writes. Was actually only reading and writing every other page!

Going to do some rework and use BUF MODE = 0 and see what happens to read and writes. This chip is a pain.

Yeah - post #80 and #81 were touching on that page crossing and page++ writes.

As noted the Check42() can be left perhaps safe and simple for reference, and add a version of check24() that crosses page boundaries with varying sizes.
 
Just did an update to the repository.

Created one function call for flash writes which should address crossing page boundaries whether they are 2 partial writes. New Function:
Code:
    w25n01g_programDataLoad(0, buffer, 16);

Have to think about readBytes :)
 
@mjs513 - how are you building? Have been using TSET and it just finds .INO without needing matching directory name like IDE requires - problem when wrong INO picked up :(

Updated github - saw rename of 'driver.ino'. Then CmdLine TSET finds the w25n01g_driver.INO as the SKETCH and fails build - had to rename to AAw25n01g_driver, now it builds.

This shrunk again and it compiler flagged for too short with loop count at 20 :: const uint8_t beefy[] = "DEADBEEFdeadbeef1234\n";
> perhaps it should be sizeof(beefy) not "20"?

Is this new UNUSED func >> w25n01g_pageProgramDataLoad() to be used for multi page crossing instead of w25n01g_programDataLoad() ?

That has this line to start that looks puzzling :: if((length-Address) > W25N01G_PAGE_SIZE) { //Determine Number of Pages
> why does Address [startAddress] relate to the length of the read?
> and again in following :: if(W25N01G_PAGE_SIZE - startAddress > 0) {
--> Seems more like it should refer to the offset within a whole page? :: Address % W25N01G_PAGE_SIZE

Gotta run ...
 
@defragster
right now I using the IDE plus using SublimeText as the editor instead of the IDE editor. Did want to spend time on figuring out TSET just yet. Too many new things get me confused :)

Using w25n01g_pageProgramDataLoad for all writes now and seems to be working but you are right about the test on first page. Fixed that as I was working on the ReadByte function which is all screwed up that way we are using it. Couldn't sleep last night so was righting code and figuring out the logic at 4am in the morning until i posted. Took a nap and finding a few things that need fixing :) Am going to use you code snippet though - cleaner :)

Gotta run as well ..
 
@mjs513 - keep up the good work/effort.

Just noted that IDE won't build that without the FOLDER name matching sketch - so github could be re-organized? And as noted using TSET Frank_B's CmdLine searches for 'some'.INO - and depending on file name ALPHA sort it can find the wrong one ( so I added the 'AA' prefix and I can build now :( ) - so having multiple INO's is risky. Will have to see if I can add a way to CATCH the INO basename when creating and hardcode that into the batch file for first check.

Post #87 referred to 'w25n01g_programDataLoad()' which threw me off as it isn't new but is still in use.

I will stay tuned for updates.

If SublimeText is the editor - then with TSET files in a 'known' folder and the paths edited to your system as noted in readme, the only other step is to: find the Sublime BUILD TOOLS file that is invoked with "Ctrl+Shift+B". Somehow I found that and hopefully left postings or notes in readme.
This is linked in the readme :: pjrc.com/threads/38391-Use-Sublime-Text-as-an-Arduino-IDE-replacement
 
@mjs513 @defragster - Got my chips today. Ready to see if I can solder up and test. I have one more T4.1 and the T4.1 beta test board to play with. My humidity Indicator was blue - never seen this before but I guess that's a good thing:) Ordered from DigiKey.
 
@wwatson - My tag in the NAND's had blue on too - I didn't bother looking given they just got here double or triple wrapped. Are they usually pink?

Good luck with soldering! Mine went surprisingly well somehow - haven't tried to repeat again yet.

@mjs512 - looking into TSET edit so updating on that thread - to handle the dual INO case - that generally is working here
 
@wwatson - My tag in the NAND's had blue on too - I didn't bother looking given they just got here double or triple wrapped. Are they usually pink?

If they are using the typical humidity sensors, which are BLUE when dry & PINK when exposed to excessive humidity, then BLUE is good, right ??

Mark J Culross
KD5RXT
 
All
Just pushed a change to the driver. Should cover page crossings for any size array you want to use or any address start position. Made it so you dont have to do anything special except to call;
Code:
    w25n01g_pageProgramDataLoad(4000, buffer, sizeof(beefy));
to write data to the Flash and
Code:
    w25n01g_readBytes(4000, buffer, 20);
. Still have the problem on power up running the sketch you get the fail on Check42 Cant figure it out why.
 
Got the latest just now ... Busy a couple hours here ...

I see the startup issue compare fail? Maybe something different here ?? Below - only on POWER UP? The Comapre loop and err print needs touched up.
I did TyComm RESET and got this?
Code:
ECC Error (addr, code): 8800, 2
	 took 5122940 elapsed us
Good, 	Found 42 in NAND 0x8000000 Times
...
Then TyComm UPLOAD and all is good? And GOOD again on the next 'RESET' ????
Code:
Begin Init
Found W25N01G Flash Chip
    NAND ============================ check42() : COMPARE !!!!
	 took 5122776 elapsed us
Good, 	Found 42 in NAND 0x8000000 Times
...
Then THIS on THREE in a ROW COLD Power ON:
Code:
Begin Init
Found W25N01G Flash Chip
    NAND ============================ check42() : COMPARE !!!!
  0=     255
	+++ NOT 42 Good Run of 1 {bad @ 0}

ODDLY :: WRITE up to 5.15 secs and COMPARE at 5.122 seconds?

line 58 should be :: for(uint8_t j = 0; j < sizeof(beefy); j++) buffer[j] = beefy[j];



Edit to TSET working here for the Multi INO folder :) WOnder now if I could pass folder_name and add '.ino' - and wonder if it would work with 'main.cpp' to build for "those folks"
 
defragster said:
ODDLY :: WRITE up to 5.15 secs and COMPARE at 5.122 seconds?
Actually not really. Corrected errors in data block transfers for mulitpage and fullpage writes/reads. Only about half or less of block was being written or read before. Probably affects full chip reads/writes more than anything else right now.
 
Such a change from prior results though on the same chip? If code is now actually doing more complete read write perhaps?

Should add each block timing recording to track average and extreme?

Just in to eat and have access to 'open face wood chipper' on a track hoe arm and some forest to trim back so off I go ... oppps for for Post - back in now ...
 
@defragster
You went directly to the point. "code is now actually doing more complete read write" and reads.

For instance, lets take the simple case of writing an array of 4096 bytes. Before was actually only sending the first 2048 since the read/transmit buffer is only 2048bytes wide with ECC turned on. Anyway back to the 4096byte array:
1. Send array with the start address you want and the length of 4096, in the simple case, with a start address of 0 next is
2. break 4096 byte array into 2 -2048 byte arrays and transmit with the 1st array at start address and the 2nd 2048 byte array at startAddress + 2048.

The function to do that is as follows:
Code:
 for (ii = 0; ii < numberFullPages; ii++) {
    // https://stackoverflow.com/questions/11102029/how-can-i-copy-a-part-of-an-array-to-another-array-in-c
    std::copy(data + transferLength + W25N01G_PAGE_SIZE * (ii), data + transferLength + W25N01G_PAGE_SIZE * (ii + 1), dataTemp);
    w25n01g_programDataLoad(newStartAddr, dataTemp, W25N01G_PAGE_SIZE);
    w25n01g_programExecute(newStartAddr);
    remainingBytes = remainingBytes - W25N01G_PAGE_SIZE;
  }

Now what happens if you are doing a partial buffer (2048) transfer or say 2 pages go across three pages in the chip. Something like this call:
Code:
w25n01g_pageProgramDataLoad(1024, buffer, 4096)
which is saying started in the middle of the first page write 2 pages of data (4096) so in effect you go from address 1024 to 5120. This eguates to pages 0, 1 and 2 :) then you are getting more complex:
Code:
  //Check if first page a full if not transfer it
  if (bufTest > 0) {
    if (length < bufTest) {
      transferLength = length;
    } else {
      transferLength = W25N01G_PAGE_SIZE * (startPage + 1) - startAddress;
    }
    w25n01g_randomProgramDataLoad(Address, data, transferLength);
    w25n01g_programExecute(Address);
    remainingBytes = remainingBytes - transferLength;
  }
  newStartAddr = Address + transferLength;

  for (ii = 0; ii < numberFullPages; ii++) {
    newStartAddr = newStartAddr + W25N01G_PAGE_SIZE * ii;
    // https://stackoverflow.com/questions/11102029/how-can-i-copy-a-part-of-an-array-to-another-array-in-c
[B]    std::copy(data + transferLength + W25N01G_PAGE_SIZE * (ii), data + transferLength + W25N01G_PAGE_SIZE * (ii + 1), dataTemp);[/B]  [COLOR="#FF0000"]//Had to add this so its probably eating more time[/COLOR]
    w25n01g_programDataLoad(newStartAddr, dataTemp, W25N01G_PAGE_SIZE);
    w25n01g_programExecute(newStartAddr);
    remainingBytes = remainingBytes - W25N01G_PAGE_SIZE;
  }

  //check last page for any remainder
  if (remainingBytes > 0) {
    //transfer from begining
    std::copy(data + transferLength + W25N01G_PAGE_SIZE * numberFullPages , data + remainingBytes, dataTemp);
    w25n01g_randomProgramDataLoad(newStartAddr, dataTemp, remainingBytes);
    w25n01g_programExecute(newStartAddr);
  }
that is essentially the whole write function now! Did test fringe cases as well so think its correct.

Right now only doing buffered reads so logic is similar. Wanted to create a function so the user didn't have to figure it out within in code. Is this the most efficient way - knowing me probably not, but will leave that for others. Remember though you are reading and writing 128Mb of data in roughly 5.1 seconds.

For reads there is another option and that is go to continuous read (BUF=0) as opposed to buffered read (buf = 1). Will test that tomorrow - needed a break and had a few other things to do.

Hope all this makes sense.
 
Now that the task is done ;)

IMG_1176.jpg

Now to go wash and dry it before I try it. And of course I broke my own typical rule of try the Teensy before I modify it ;)
 
Back
Top