Quote:
Originally Posted by
Dirac28
But that's not just an optinion from Floyd Toole, these problems exist. You can measure that with a "dummy head microphone" in order to have the effect of crosstalk and HRTF like humans have when they listen to sounds, and compare the response from a real source in front of it to the phantom source produced by a stereo setup in an anechoic chamber. In an anechoic chamber a recording of a single voice reproduced from a center speaker will produce a different response than the same recording being reproduced from a stereo system as a phantom source.
The question is how relevant that is from a listener perspective and if reflections that make these problems less obvious cause more harm than good by introducing other problems. And maybe a single floor reflection can mask these stereo problems so that all other reflections aren't necessary anymore...
Yes it will show a difference, but you have to look at the parameters closely and correlate to statistically relevant data from test subjects to know if it does have an effect or not. If it does, is it relevant?
The brain actually relies on
synthetic data, so not on the many direct values acquired by the auditory system 1:1 but on their global interpretation at a secondary level. Depending on the conditions, it will prioritize one subset over another.
We basically have 3 main "sensors" for localization:
- ITD (Inter-aural Time Difference) ears are physically separated, so there is a timing difference within a certain limited bandwidth
- ILD (Inter-aural Level Difference) the difference in loudness and frequency distribution between the two ears
- Pinna effect (ear pinna filtering effect mostly allowing better elevation perception)
And a few secondary sensors that contribute to the synthetic data, among others:
- ITDG (Initial Time Delay Gap)
- Distance to source (i.e Loudness, bandwidth, HF content, movement etc)
- Body/Torso/Head movement as a
disambiguation tool
All other things being equal, if you have a single source right in front of you in real life, ITD + ILD + Pinna + head shadow etc will have a certain set of values.
ITD=0, ILD=0, Pinna etc have your specific value at the given source elevation.
Mainly, the time and level alignment will be 'perfect' between the two ears. So the source will be perceived as coming from right in front of you.
Stereo reproduction of a single source can approximate that very well for two main factors: ITD and ILD.
Eventhough ITD will indeed have a subset of information, the head shadow + time delay of +/- 0.5ms due to the cross talk created by two sources +/-30Β° instead of a single 0Β° source.
But the general timing remains the same between left and right ear.
ITD t(0)=0 and ITD t(+0.5ms)=0.
ILD t(0)=0 and ILD t(+0.5ms)=0.
Hence there is no delta in time or level. There is no image shift.
The synthetic data gathered by the brain still tells you the sound comes from right in front of you.
It can also works well for quite a few secondary sensors (recording technique dependant):
- ITDG
- Distance to source
- respond quite well to Body/Torso/Head movement
Though depth perception can be influenced by the addition of the 0.5ms ITD. But as it is not the only factor at play, it quite easy to compensate if needed.
Adding a center speaker (LCR system) can solve some of these minor issues, but adds a number of other.
Adding room ER to that set of data does imho only add confusion and masking, not provide any further relevant information. It will only provide the brain with info about your direct environment and superimpose it over the intended presentation of the recording. That is distortion of the intended presentation of the recording.
Using a dummy head for recording content meant to be played back on a stereo speaker system makes little sense if you look at it all closely. It has artistic value for sure, but as a coloring/effect tool. If you are to playback such a recording on a set of stereo speakers, you're just adding a layer of distorsion.
I don't think we can ever create a true reproduction of a source "as if we were there". But by understanding how mics and rooms behave, the recording process, audio processing tools and recording/mixing/mastering in an environment that allows you to hear these variables we can get pretty darn close.
That's where all these high level audio engineers have such tremendous value to me. They know how to do that. It's fascinating to see them work.
That's also why it's good to always process a recording thinking many steps ahead understanding how further processing down the line will influence it all.
Just like when mixing analog, you listen through the whole chain, these days it's DAW - Analog console & FX - master ADC - master DAC -> monitoring. If you don't you almost always end up having to rework your mix.
But then 99.9% of the music is not intended to be particularly true to the original source. It's an artistic take on the sum of sources. It's meant to convey an emotion. To me, that's in the ether between the real original source and the intent.