Woo, that looks somewhat tricky.
Here I ran another test, i didn't print data. For some reason returning GPIO6_DR is slower than creating a local variable and returning it.
It may make sense because local variables initialized dynamically will be created in ram. It might be executed faster...