FWIW: the physiological aspect to this phenomenon is often referred to as "bone conduction". Basically what your ears pick up is somewhat distorted by the reverbations of the jaw, teeth, etc. and the cavities in one's noggin'
A good study in HCI will set you straight on the issue