I've been told some audio testing gear doesn't handle the moderate amount of ultrasonic content that is often left over on the output of modern oversampling DACs, like the one in the SGTL5000. If you get really bad results, that might be one of the possible causes.
If ulaw encoding will be part of this process, and if you care about whether the results honestly measure real-world performance, you might consider how ulaw affects sounds. It will of course add some harmonic distortion, but that's rarely the big issue. The main problem with ulaw is how it affects multiple non-harmonically related frequencies, especially when one is lower amplitude than the other (the smaller signal is often treated quite horribly). This is the reason ulaw is mostly used for telephone voice, where the content is a single vocal source. If you try using it for music containing vocals and multiple instruments, the results are usually terrible. It really only sounds acceptable when the source is a single vocal or instrument or something that emits primarily one sound.