[EDIT: Oh, heavens, I REALLY STEP INTO THE STINKY STUFF HERE. Ignore all this nonsense, I really did NOT know what I was talking about!]
Here's my best attempt to explain the issues around bit depth and "uneven" sample rate conversion:
__________________________________ Bit Depth (digital word length) Definitely work at the greatest bit depth feasable during production.
Even though you'll eventually be going to 16 bit for CDs or Mp3s, do all your production, processing, etc, at the highest bit depth practical (typically 24 bit for most of us) helps preserve as much of each track's individual accuracy/detail as possible. Mixing, EQ, compressing, and other processing is also best performed at the highest bit depth.
Look up bit depth on google and pay attention to the fact that each additional bit added to the word length of a digitally stored value allows a doubling of the possible number of values
So a 16 bit number can store something over 64,000 possible values (actually 65,536). A 20 bit word length can store over 1,000,000 possible values. A 24 bit format can store over 16 million
values (actually 16,777,216) -- thats's a 256-fold increase in potential dynamic resolution by using 24 bit over 16 bit format audio
Lowering bit depth can be as easy as simply truncating bits off the digital word.
But we typically add a very, very small amount of noise (before truncation) to dither
the sonic image to soften any sonic "jaggies" that may be revealed by the truncation.
Interestingly, this aspect of audio is very parallel to working with bitmapped images, particularly resizing such images.
Because of the nature of what we're doing with bit depth, truncating bit depth is analogous to resizing a picture by an even amount. Going, say, from 16,000 x 16,000 pixels to 8,000 x 8,000 pixels is as easy as throwing out every other column and every other row of pixels. The resolution is not as high -- but you haven't remapped any pixels or the 'shapes' they tend to form.
________________________ Sample Rate Conversion
And downsampling to a given sample rate from a sample rate which is an even multiple of the target -- iow, downsampling from 192 kHz to 96 kHz or 48 kHz is also parallel to this -- we lose resolution -- but we don't have to remap any values. In essence, we simply discard every other sample. (The same applies to, say 88.2 kHz down to 44.1 kHz, etc.) BUT when we downsample from an uneven multiple of the target frequency -- say from 96 kHz down to 44.1 kHz -- THEN we end up having to remap ALL our values -- guessing (interpolating) what those values MIGHT have been, often introducing far greater alias error (distortion, perhaps subtle, perhaps not so) than if we had started from 88.2 or even 44.1 in the first place (and not downsampled at all.
To go back to the graphics example: Img 1
- The image above was originally created as a 100 x 100
pixel graphic. Img 2
- This image was the result of downsizing that same image down to a 50 x 50
pixel image. Note that resolution goes down -- but the general shape is retained fairly well... or at least comparatively well. Take a look at this: Img 3
image was what happened when we downsized the image from 100 x 100 down to 57 x 57
Despite the fact that the resolution of Img 3 above was 14% greater than Img 2, Img 2 creates a much more faithful representation of the image, despite its lower resolution.
[You may have to back away from your monitor quite a bit to see what I mean.
I apologize for the size of these, they're already up on my server. But I think they clearly show what's going on when you have an 'uneven' downsample, whether it's a bitmap or a PCM recording.]
And audio's just like that...
Or at least close enough for us to analogize...
Anyhow, the increase in dynamic resolution offered by increasing bit depth is as close to win-win as we'll be getting here. That 256-fold increase in dynamic
resolution only costs us about an extra 50% in processing and storage overhead.
OTOH, doubling our sample rate -- or quadrupling it -- increases overhead by double (or quadruple).
And, while most folks can fairly easily discern 24 bit sound from 16 bit sound, listening tests have been considerably less persuasive that higher sample rates result in the same kind of perceived improvement.
It's obviously far too complex to discuss in depth, here, but I have found in the last 10 years I've been dealing with computer based recording that these seemingly
parallel but very different issues continue to confuse large numbers of people.
So, to summarize:
- use the greatest bit depth practical for production and reduce bit depth of your finished mix for output
- if you want to work at high sample rate resolution, it's best to work at an even multiple of your target rate
- this only applies if you are 'mixing in the box' -- keeping your audio in the digital realm -- for that reason, you may actually get better results running your (for instance) 96 kHz mix out into the analog realm and then back into another digital interface running at 44.1 -- try it that way and compare it with a full ITB downsample.