Sorry, I am not sure anyone has a simple full answer for you. You might have to do your own homework.
Sometimes the best answer is to look at the reference datasheet, which you can download from the PJRC website:
https://www.pjrc.com/teensy/datasheets.html
That is timers are typically 32 bits... So how to get 64 bits? you may have a couple of options. Have an overflow interrupt, which increments a counter, which you combine with the current value to get your 64 bits.
Or as mentioned on the mentioned thread, you may be able to chain two timers to each other. But as mentioned I am not sure how many have tried it, except I believe @manitou has an example sketch that may do it, as he mentioned on that other thread.
Likewise @luni - had some example code that might work on that again mentioned thread.
As for how to get a hardware timer to run at 500mhz, I am not sure.
If it were me, I might look at some of the stuff that @luni did for the library:
https://github.com/luni64/TeensyTimerTool
Otherwise you would need to go through the different timer subsystems and see if they can be configured in a way that works for you, without screwing other stuff you may also need.
Example if you look at the GPT timer, you will find that there are a few different hardware clocks that can be mapped into this sub-system. I usually go to the CCM chapter to the CCM Clock tree which in my PDF is about page 1072 and not far from top right of screen you will see where GPT/PIT timer (base clock can be defined).
And you will see there are at least 4 different clocks that can feed into this, which is also mentioned in the clocks section (51.4) of the GPT chapter P3078.
And with most of these are are options for pre/post divisors. Which are set insome of the CCM registers and in this tree you can see how some of those setting will effect other systems as well.
Now if you can configure one of these clocks to work for you, you can then configure the actual clock registers to do counting...
Or more likely you might look at PIT timers, which again I believe uses the same clocks sources as GPT, although maybe it can also count by IP bus, by some divisor (1, 2, 4, ...) ...
But it also has cascade stuff...
So as I mentioned, I am not sure anyone has a simple answer for you. And again I would probably start off looking at the stuff @luni and @manitou have done and was mentioned in that other thread.
But good luck, and hopefully you will find a good answer and post it.