I understand the capacitor impact to stabilize readings, however I am puzzled by the difference in raw values! Average values should be similar with or without capacitor I think.
Not necessarily. Consider a set-up where the ADC is fed with a constant DC voltage plus a high frequency sine-wave signal. The average voltage read will be the DC voltage plus the RMS value of the sine wave. Remember that these are single-ended signals between GND and Vcc. Its not the same as a twin-rail +Vcc, GND, -Vcc set-up where you can assume high frequency signals have no DC component.
Add a capacitor that shunts this frequency to ground, and not only does the variation reduce but the mean value changes also.
Not saying this is what happens in your case, but it is at least an explanation of how it is possible.
A second possibility - if you are switching between multiplexed ADC inputs, and the ADC has a single sample and hold capacitor (rather than one per multiplexed input) and the output impedence of the sensor is high (which is likely, for a thermistor) then the sample and hold capacitor may not have time to fully charge before being read. Adding an external capacitor means the sample and hold capacitor can charge more quickly and get the right reading. For slowly varying signals, a way around this is to discard the first reading after switching inputs; then average the following ones. A better way is to use a unity gain, rail-to-rail op-amp buffer between the sensor and the ADC. The ADC then sees a low impedence source and can charge the sample-and-hold capacitor rapidly.
And of course if the sensor is (part of) a potential divider between some voltage and ground, that voltage needs to be stable. Vcc, especially if it comes from the USB power of a computer and is not being regulated, can be very noisy. Using a separate, clean analog Vcc power supply is probably the best way to reduce noise and variability. On Teensy 3.0, the USB voltage goes through a 3.3V regulator so will be cleaner than using the USB voltage directly as Teensy 2.0 and ++ 2.0 do. If you use a separate power supply, be sure to tie the ground to the Teensy ground (either directly or through an inductor).
These approaches can be combined. Some experimentation is in order to see what is the largest source of variation.
When I did this for a multi-analog-input project (using an Arduino Mega2560, so 5V Vcc) the largest single improvement was using a simple regulated power supply instead of Vcc from the computer USB. Comparatively minor gains from using buffers. In that set-up I did not need additional capacitance on the ADC inputs. In contrast, even the most elaborate software averaging schemes (discard first two readings, discard highest and lowest of the next 16 and average the 14 remaining) had little impact, as the main problem was noisy Vcc.