I been thinking alot and tried something new. To read 128 GPIOs with digitalRead earlier was resulting in 740 uS total. I have got that number down to 71 uS from the same function. I'm using a little trick of swapping the actual port of a unique chip with local register.
Basically there are 8 micros() counters in digitalRead now:
Code:
static uint32_t pinScan[8] = { 5000 };
what happens now, is if you are reading multiple pins, the port of the given calculated chip, the micros() counter is checked. If it is expired, the port is read once and the micros is updated. When the 2nd read from the same port happens within 50uS, the local register is read instead. If another chip register is checked, it's own counter will be validated to be within 50uS before checking register, else the new port will be read into local register while pinstate is returned, and counter reset:
Code:
if (micros() - pinScan[i] < 50) {
return ( chipData[i][8] & (1UL << pin) );
}
This was the demo code:
Code:
tim = micros();
for ( uint8_t i = 0; i < 128; i++ ) {
mcp.digitalRead(i);
}
Serial.printf("digitalRead 128 pins: %llu uS\n", micros() - tim);
Output:
Code:
digitalRead 128 pins: 71 uS
740 uS before optimisation
71 uS after
What do you think? Any flaws in this or is it unique? 
The beauty is the function is self contained, no code bloat or register dumps elsewhere needed.
If this is good, I will make the register micros counters as part of the class instead of static local, so calling 2 different functions on the same register will validate with common timer to update and check.
I might implement a circular array for the interrupt checks, as updating a single local register on several attempts before accessing it in events can throw off a toggle state. Any interrupts enabled should be checked via local register to see if its enabled when the circular queue pushes an event. This would allow many pins to be checked (via local registers only) before the INTCAP rewrites itself, giving events() time to process the handlers
EDIT: Did the same to digitalWrite:
Output of back to back writes and reads:
Code:
digitalWrite 48 pins: 224 uS (previously, 359uS for 47 pins!)
digitalRead 128 pins: 64 uS
writeGPIO: 4 uS
digitalWrite 14: 9 uS
This is my method:
Code:
MCP23S17_FUNC bool MCP23S17_OPT::digitalRead(uint8_t pin) {
if ( pin >= (__builtin_popcount(detectedChips) * 16U) ) return 0;
for ( uint8_t i = 0; i < 8; i++ ) {
if ( !(detectedChips & (1U << i)) ) continue;
if ( pin > 15 ) {
pin -= 16;
continue;
}
if (micros() - counter_GPIO[i] < 50) {
return ( chipData[i][9] & (1UL << pin) );
}
bus->beginTransaction(SPISettings(speed,MSBFIRST,SPI_MODE0)); /* read port register */
::digitalWriteFast(chipSelect, LOW);
bus->transfer16(((0x41 | (i << 1)) << 8) | 0x10);
chipData[i][8] = __builtin_bswap16(bus->transfer16(0xFFFF)); /* Dump INTCAP */
chipData[i][9] = __builtin_bswap16(bus->transfer16(0xFFFF)); /* read GPIOs */
::digitalWriteFast(chipSelect, HIGH);
bus->endTransaction();
counter_GPIO[i] = micros();
return (chipData[i][9] & (1UL << pin));
break;
}
return 0;
}
MCP23S17_FUNC void MCP23S17_OPT::digitalWrite(uint8_t pin, bool level) {
if ( pin >= (__builtin_popcount(detectedChips) * 16U) ) return;
for ( uint8_t i = 0; i < 8; i++ ) {
if ( !(detectedChips & (1U << i)) ) continue;
if ( pin > 15 ) {
pin -= 16;
continue;
}
if (micros() - counter_GPIO[i] > 50) {
bus->beginTransaction(SPISettings(speed,MSBFIRST,SPI_MODE0)); /* read port register */
::digitalWriteFast(chipSelect, LOW);
bus->transfer16(((0x41 | (i << 1)) << 8) | 0x10);
chipData[i][8] = __builtin_bswap16(bus->transfer16(0xFFFF)); /* Dump INTCAP */
chipData[i][9] = __builtin_bswap16(bus->transfer16(0xFFFF));
::digitalWriteFast(chipSelect, HIGH);
bus->endTransaction();
counter_GPIO[i] = micros();
}
chipData[i][9] = (chipData[i][9] & ~(1UL << pin)) | (level << pin);
bus->beginTransaction(SPISettings(speed,MSBFIRST,SPI_MODE0)); /* write port register */
::digitalWriteFast(chipSelect, LOW);
bus->transfer16(((0x40 | (i << 1)) << 8) | 0x12);
bus->transfer16(__builtin_bswap16(chipData[i][9]));
::digitalWriteFast(chipSelect, HIGH);
bus->endTransaction();
break;
}
}