For my thinking, your approach has the advantages of: less CPU demand (avoid continually scanning); simplicity (...there must be a simple way...); and letting the hardware do the heavy lifting (...like a discrete switching signal...).
The only piece of the puzzle that you needed was the "sound to MCU" thingy. First, I must say how I admire the skills/wisdom of the posters who gave advice:"...biasing and signal conditioning ..." and "...FFT object..." are topics that make my head spin! I'm not doubting them, because they are over my head! But sometimes less is more. So being a "less" kind of poster, let me focus on this aspect: "...latch the sound to a digital output..." That is all you want to do, so avoid anything concerning analog such as MEMS microphone, or really all audio in general. I'm in love with the Teensy Audio Library (really!), but you don't need it here. All you want is (IIUC): to know if the trigger sound in heard or not heard (latching to digital). For this task the hardware already exists. Google "arduino sound detection module" and the
first pictured device (when I did the google on 2018-02-11 @ 6:11PM) will find its way to your snail-mailbox for less than a buck (USD $1).
I 'm only going by googled info, not from experience. Presumably this thing can be made to work; if so, you are golden. Meaning that the issue of
...defining what constitutes a valid trigger... is eliminated, or rather, is replaced by a precision trim-pot (dial-in your threshold the easy way). What I can say based on experience, is that the ESP8266 is reliable, Arduino IDE friendly, and has snazzy connection range. My project ran for 8 months without crashing before I re-purposed it.
Also, googling "arduino sound detection module" gives a
link to a tutorial <<<one very minor and trivial gripe with the tutorial is where it says, "Sound is detected via a microphone and fed into an LM393 op amp." Ahem, the LM393 is a Dual Differential Comparator, NOT an op amp! ;+D >>> on using the module, which includes oh-so-short-and-sweet-code which takes care of keeping a GPIO in real-time agreement with the detection module. This makes your "...hoping to not have it continually scanning..." a matter of attaching an interrupt to a pin (I've never done this, but I
think this is right!)
So my (remedial) opinion is: arduino sound detection module attached to a GPIO on the ESP8266 for the detect end. Then another ESP8266 to make the Teensy aware of the trigger sound (using a GPIO). (Although the ESP8266 is a WiFi device, linking 2 ESPs doesn't require a router if the
ad-hoc mode is used.) Then the Teensy can use its USB and the Keyboard Library to send the space bar key-press to your PC. Time/Date stamping could be done at any point in the chain. Teensy would be my choice. Use EEPROM or SD depending on how much data is logged. A
breadboard power supply lets you use a junk-drawer power adapter (6.5 Volts min, 12 Volts max). Total cost (not including power adapters, SD card, or PC) would be less than 24 bucks (delivered), using a Teensy LC, 2 ESP8266-12e, 2 power supply modules, and the detection module.