Gearslutz.com
All Advertisers

Go Back   Gearslutz.com > The Forums > Mastering forum

Similar Threads
Thread Thread Starter Forum Replies Last Post
Micing a dynamic 'experimental' guitar part enroper So much gear, so little time! 7 9th January 2008 07:17 PM
Dynamic Range steve maglio High end 1 14th December 2007 06:52 AM
The death of dynamic range... HIGHENDONLY The moan zone 4 21st April 2007 04:30 PM
Building Music- Experimental Spatial Music Performance fifthcircle Remote Possibilities in Acoustic Music & Location Recording 3 5th June 2004 05:15 PM
dynamic range on converters frist44 So much gear, so little time! 10 20th March 2004 01:48 AM

Reply
 
Thread Tools Search this Thread Rate Thread Display Modes
Old 15th January 2008, 12:08 AM   #1
Axon
Gear interested
 
Axon's Avatar
 
Join Date: Jan 2008
Location: Austin, TX
Posts: 15
Send a message via ICQ to Axon
pfpf: An experimental dynamic range estimator for music

Hello all I had already posted this on HydrogenAudio, but I noticed this forum and figured I would post here too (the people here certainly seem more clued than I). Apologies in advance for tooting my own horn on my first post.

Audio engineering is an amateur hobby of mine, and I was sufficiently enthused with the whole topic of dynamic range that I went out and wrote an algorithm to estimate it. The TurnMeUp website mentioned that a dynamics estimator was in the works, and while I will say that my contribution is by no means authoritative, hopefully it's superior to most other schemes out there (notably ReplayGain and peak-to-average measurements). So I humbly submit it for commentary and experimentation. This is a free app, but for now I'm licensing it for non-commercial use only; lemme know if this causes any trouble.

Paper: pfpf: An Experimental Estimator of Dynamic Range in Music. A link to a prototype .exe is included.

I'd like to make clear that I'm not trying to stomp on TMU's toes with this. Up until less than an hour ago I was oblivious to TMU's recent programming efforts. My app is largely trying to solve a different problem than TMU: pfpf is about estimating dynamic range through loudness variations, and I'm hoping to use it to make accurate mastering comparisons between media (CD, vinyl, high-res, radio, etc). It's also fairly rough around the edges.

Anyways here's the abstract:

Quote:
The dynamic range of a selection of music is dependent on both estimating the time-varying loudness of the music and the timescale used for loudness evaluation. I propose a numerical method of estimating dynamic range that satisfies those dependencies using a modified ITU-R 1770 loudness filter and three moving windows to estimate loudness across three different timescales. The goal is to more accurately measure and compare dynamic range between different music genres and different masterings and processing techniques for the same music.

Summary of algorithm:
  1. Apply ITU-R 1770 filters to convert amplitude to instantaneous loudness.
  2. Estimate loudness across three different timescales by computing 10ms ("short term"), 200ms ("medium term") and 3000ms ("long term") windowed RMS power.
  3. Decouple timescales by scaling 10ms loudness by 200ms loudness, and 200ms loudness by 3000ms loudness.
  4. Threshold loudness at each timescale to remove silence (optional)
  5. Compute histogram for each loudness estimate
  6. Dynamic range = range between 50th and 97.7th percentile, for each timescale
Quote:
This is a better way to measure dynamic range, for the following reasons:
  • It measures dynamic range as a ratio of loudnesses. Peak-to-average cannot claim this (it is fundamentally a comparison of two different units). ReplayGain comparisons cannot claim this.
  • It uses a real loudness model (flawed though it is) for the basis of loudness estimation. Waveform comparisons (especially for loudness-war-related discussions) are fundamentally flawed for this reason - what you get out of Audacity has a relatively tenuous connection to real perceived loudness.
  • Dynamic range is estimated across three different timescales - 3000ms, 200ms, and 10 ms - and each scale is fully decorrelated from each other. So pfpf can tell between when a quiet passage has a loud transient, or when a loud passage has a sudden pause. The timescales are configurable.
  • It uses a percentile approach on a histogram for estimating dynamic range, instead of min/max/avg. This makes the technique much more resilient to differences in mastering and medium; pops and ticks should not affect results, nor should small bits of digital silence.The percentiles are configurable.
  • Background noise (when no music is playing) can be masked with a fixed threshold, so that silence won't pile up on one side of the histogram distorting the numbers, and the results should be invariant of any extra silence padding before/after music (this should make CD/vinyl comparisons a lot easier). The threshold is configurable.
Comments and testing welcome.
Axon is offline   Reply With Quote
Old 15th January 2008, 12:57 AM   #2
jamsmith
Lives for gear
 
jamsmith's Avatar
 
Join Date: Dec 2007
Location: Atlanta, GA
Posts: 1,044
Never mind. I just read up on ITU-R BS.1770 standard.
__________________
Screamin' Michael Jamsmith - www.jamsmith.com

"You CAN polish a turd, but you just end up with a shiny turd."

Last edited by jamsmith; 15th January 2008 at 01:07 AM. Reason: Did some research!
jamsmith is offline   Reply With Quote
Old 15th January 2008, 01:09 AM   #3
Axon
Gear interested
 
Axon's Avatar
 
Join Date: Jan 2008
Location: Austin, TX
Posts: 15
Send a message via ICQ to Axon
EDIT: Replied before you edited :) But my reply should still help.

I'm using Leq(RLB), because of its excellent performance in the tests I've seen published. (It's obviously not ideal, but since HEIMDAL didn't test much better, why bother with anything more complicated until it is provably better?) After the filtering I'm computing 10ms RMS blocks and using those to create sliding windows at 200ms and 3000ms lengths. Each of those lengths represents a loudness estimate at a different timescale. 10ms is roughly the lower limit of transient loudness detection, 200ms is roughly the upper limit, and 3000ms is roughly the lower limit for relatively "steady-state" changes in loudness (ie how fast a listener might reasonably turn a volume knob)

I haven't calculated what a "reference" value should be, and I'm not sure what the best way to do that is. In theory, any sine wave at any fixed frequency and amplitude should evaluate to a dynamic range of 0, and any white/pink noise should evaluate to a dynamic range that's fairly close to 0. Because the dynamic range computation is self-referred - it's subtracting one loudness from another - no reference is required to ascribe meaning to the results. They aren't easily comparable to other metrics.
Axon is offline   Reply With Quote
Old 19th January 2008, 04:37 AM   #4
Axon
Gear interested
 
Axon's Avatar
 
Join Date: Jan 2008
Location: Austin, TX
Posts: 15
Send a message via ICQ to Axon
Bump? Is anybody interested in this?
Axon is offline   Reply With Quote
Old 19th January 2008, 05:17 PM   #5
bob katz
Mastering
 
Join Date: Mar 2006
Posts: 1,826
Quote:
Originally Posted by Axon View Post
Bump? Is anybody interested in this?
Yes, of course. Just no time to absorb right now. But I do not think the day when artficial intelligence can make an accurate dynamic range judgment is even close to arriving.

The problem is the emulation of the psychoacoustics of the ear itself. For example,

The business of the histogram doesn't tell us, for example, that a song with a soft beginning may sound too soft after it follows a song with a loud ending because of the ear's "accomodation". The histogram doesn't tell us that the same soft passage in the middle of a song may be perfectly acceptable, for example.

The best measure I've seen of dynamic range and loudness and center of gravity is the as-yet-unreleased TC Electronic loudness meter with a history circle built in.

BK
__________________
Bob Katz DIGITAL DOMAIN http://www.digido.com
"There are two kinds of fools. One says-this is old and therefore good. The other says-this is new and therefore better."

No trees were killed in the sending of this message. However a large number of electrons were terribly inconvenienced.
bob katz is offline   Reply With Quote
Old 20th January 2008, 10:24 PM   #6
Axon
Gear interested
 
Axon's Avatar
 
Join Date: Jan 2008
Location: Austin, TX
Posts: 15
Send a message via ICQ to Axon
Thank you very much for the reply.

Quote:
Originally Posted by bob katz View Post
Yes, of course. Just no time to absorb right now. But I do not think the day when artficial intelligence can make an accurate dynamic range judgment is even close to arriving. The problem is the emulation of the psychoacoustics of the ear itself.
Yes, I fully agree that there is a very long way to go with estimating dynamic range. The (simpler) problem of loudness estimation, of course, is still an extremely active subject of research - if we really knew how to estimate loudness, would we be using Leq(RLB) for ITU-R 1770? And one cannot estimate dynamic range without first estimating loudness.

But I do not believe that this should preclude us from taking advantage of what we already have available, because that is still a worthwhile improvement. People are already using dynamic range estimators - they just happen to be extremely poor. Every time somebody throws up a brickwalled PCM waveform plot as evidence that the loudness war is killing dynamic range, they are implicitly using waveform peaks from an audio editor as an estimate for the actual variation in loudness in the music.

Clearly, whatever scheme that is discussed must submit to some sort of testing process, and developing one is on my agenda.

Quote:
For example, The business of the histogram doesn't tell us, for example, that a song with a soft beginning may sound too soft after it follows a song with a loud ending because of the ear's "accomodation". The histogram doesn't tell us that the same soft passage in the middle of a song may be perfectly acceptable, for example.
Actually I do try to take exactly that effect into account. I mentioned earlier that I calculate RMS power at 10ms, 200ms and 3000ms windows as loudness measures at three different timescales. After that, I attempt to "decouple" the loudness variations at different timescales by dividing shorter-term loudness measures by longer-term loudness measures. So I divide the 10ms loudness by the 200ms loudness and the 200ms loudness by the 3000ms loudness.

In your example, a quiet song following a loud ending would lead to a gradual reduction in 3000ms loudness, and a transient negative spike in 200ms loudness, slowly tending towards 0 as the long-term loudness tends towards the new level - hopefully, this behaves just like how the ear accomodates to long term changes in loudness while remaining sensitive to shorter term relative changes.

I'll admit the timescales are subject to considerable revision, and I'll gladly take any input on potential flaws with this scheme.

Quote:
The best measure I've seen of dynamic range and loudness and center of gravity is the as-yet-unreleased TC Electronic loudness meter with a history circle built in.

BK
Um... wonderful! ;) Knowing that I wasn't the first one to come up with this idea is a good sign I'm not totally off my rocker.
Axon is offline   Reply With Quote
Old 21st January 2008, 09:06 AM   #7
Bobro
Gear Head
 
Join Date: Sep 2002
Posts: 30
Hey, that's great, Axon! Let us know if you do make an .exe of this.

Strangely enough, I've done a fair amount of thinking and some crude tinkering with related things, working on interactive music using Csound, ie. how can a program "listen" to sound with some degree of intelligence?

A couple of thoughts- as far as I know and have read (can't remember the source)the window of comparison and analysis when it comes to music varies immensely from person to person, with "trained" listeners having a much bigger field of view/memory. As a side note, I believe that the Leitmotif is so little found in film music today for the simple reason that many people no longer "get" it because the listening window is too short to appreciate the fact that the same melody was playing twelve minutes ago when the same character appeared.

Another idea is to scale the whole works to an appropriate (moving) Fletcher-Munson curve.

Anyway, just to let you know that others appreciate what you're doing- and how hard it is!

-Cameron Bobro
Bobro is offline   Reply With Quote
Old 21st January 2008, 06:20 PM   #8
jamsmith
Lives for gear
 
jamsmith's Avatar
 
Join Date: Dec 2007
Location: Atlanta, GA
Posts: 1,044
I was thinking about this over the weekend. If we could truly arrive at usable figure for dynamic range, I would like to do a different type normalization. Rather then just finding peak and adding the difference between that and zero (or -x dB), I would like scale the normalization number by the dynamic range. Sort of normalization with expansion - not just maximizing the bits, but increasing the dynamic range in the process
__________________
Screamin' Michael Jamsmith - www.jamsmith.com

"You CAN polish a turd, but you just end up with a shiny turd."
jamsmith is offline   Reply With Quote
Old 21st January 2008, 08:14 PM   #9
Axon
Gear interested
 
Axon's Avatar
 
Join Date: Jan 2008
Location: Austin, TX
Posts: 15
Send a message via ICQ to Axon
Quote:
Originally Posted by Bobro View Post
Hey, that's great, Axon! Let us know if you do make an .exe of this.
I already did! ;) Check the article.

Quote:
A couple of thoughts- as far as I know and have read (can't remember the source)the window of comparison and analysis when it comes to music varies immensely from person to person, with "trained" listeners having a much bigger field of view/memory. As a side note, I believe that the Leitmotif is so little found in film music today for the simple reason that many people no longer "get" it because the listening window is too short to appreciate the fact that the same melody was playing twelve minutes ago when the same character appeared.
Listening perception does vary based on the person, and based on the listening environment, but I believe that in terms of the physiological aspects of hearing, the listening environment matters much more. That's reflected in the attempt to capture the dynamic range of the listening environment with a masking threshold - so silence in the recording does not cause the dynamic range to increase beyond what the environment allows.

Quote:
Another idea is to scale the whole works to an appropriate (moving) Fletcher-Munson curve.
Adaptive loudness filtering! Touchy ;)

The equal loudness contours are tuned for steady-state tones; dynamic range estimation requires accuracy for transient loudness, which the F-M curves don't accurately estimate (due to the loudness contributions of spectral spreading across critical bands). The FFT-based loudness estimators like HEIMDAL are believed to handle those situations better.

Quote:
Anyway, just to let you know that others appreciate what you're doing- and how hard it is!

-Cameron Bobro
Thanks for the kind words.

Quote:
Originally Posted by jamsmith View Post
I was thinking about this over the weekend. If we could truly arrive at usable figure for dynamic range, I would like to do a different type normalization. Rather then just finding peak and adding the difference between that and zero (or -x dB), I would like scale the normalization number by the dynamic range. Sort of normalization with expansion - not just maximizing the bits, but increasing the dynamic range in the process
Well, normalization still wouldn't
increase dynamic range - a simple gain would never do that. Is that what you meant?



A couple other notes. I've been plugging away at a couple bugs. Notably, the app chokes on mono wavs, and it also happens to almost completely ignore the long term dynamic range configuration parameter - it's really set to the short term number X 100 (or 1000ms). Woops. I hope to get a v0.2 out with fixes like that in place.

-- Rich
Axon is offline   Reply With Quote
Old 22nd January 2008, 03:27 AM   #10
jamsmith
Lives for gear
 
jamsmith's Avatar
 
Join Date: Dec 2007
Location: Atlanta, GA
Posts: 1,044
Quote:
Originally Posted by Axon View Post

Well, normalization still wouldn't increase dynamic range - a simple gain would never do that. Is that what you meant?
No, when you normalize, you are apply the same gain to every sample. In this scheme we would be apply the maximum gain to the peak of the file and zero gain for the sample at the bottom of the "estimated" dynamic range. Of course this is just a starting point for my experiment. I would have to decide how linear or logarithmic the scale is and would probably want to make the lower bound adjustable. Plus figure out what to do with samples that fall below the lower bound. The outcome is similar to using an expander where the maximum output is 0db. However rather than using instantaneous RMS, we are using a preset scale based on the "estimated" dynamic range of the entire file.

The idea is to fully maximize the dynamic range. Now I am sure people reading this might consider this a bad idea, and I am sure in many cases it would be especially where they want to decrease the dynamic range. But of course the same scheme could be adapted to that end also. But right now its just a thought experiment and for all know someone is already doing this!
__________________
Screamin' Michael Jamsmith - www.jamsmith.com

"You CAN polish a turd, but you just end up with a shiny turd."
jamsmith is offline   Reply With Quote
Old 22nd January 2008, 07:35 AM   #11
Axon
Gear interested
 
Axon's Avatar
 
Join Date: Jan 2008
Location: Austin, TX
Posts: 15
Send a message via ICQ to Axon
Quote:
Originally Posted by jamsmith View Post
No, when you normalize, you are apply the same gain to every sample. In this scheme we would be apply the maximum gain to the peak of the file and zero gain for the sample at the bottom of the "estimated" dynamic range. Of course this is just a starting point for my experiment. I would have to decide how linear or logarithmic the scale is and would probably want to make the lower bound adjustable. Plus figure out what to do with samples that fall below the lower bound. The outcome is similar to using an expander where the maximum output is 0db. However rather than using instantaneous RMS, we are using a preset scale based on the "estimated" dynamic range of the entire file.

The idea is to fully maximize the dynamic range. Now I am sure people reading this might consider this a bad idea, and I am sure in many cases it would be especially where they want to decrease the dynamic range. But of course the same scheme could be adapted to that end also. But right now its just a thought experiment and for all know someone is already doing this!
Ah. I think I know what you're talking about, and I had a very similar brainwave today. I think I can take the idea to its logical conclusion.

What you're basically describing, I believe, is level/curve control for loudness. In an analog to how histogram level control (and color curves) work in Photoshop - you make a histogram, and adjust the quietest/avg/loudest points on the loudness histogram just like you'd adjust the dark/mid/light points on an image histogram, then you scale the short term loudness by a lookup table.

That is, um, a very powerful tool, if it could be constructed. It could, in theory, yield a completely reversible dynamic range compression (or expansion). It could be used as an acid test for the loudness estimation model of the dynamic range estimator - if you squash everything to exactly the same loudness, any audible loudness deviations ought to be bugs in the estimator. It could be used as a particularly strange artistic effect, if the loudness curve is drawn to be of negative slope in places. It could, as you mention, be used as a slightly more intelligent noise gate.

The biggest difficulties I see with the scheme, besides finding a good loudness model, are that the rapid gain changes necessary may have an extremely audible effect, and that particular care would need to be taken to not introduce significant delay into the gain operation - gain reduction during transient spikes shouldn't wait until after the delay has passed. More abstractly, it's not strictly true that applying 10db of gain to a signal raises the loudness by 10db, because of the aforementioned change in equal loudness curves with changing loudness.

This is such an eminently plausible idea that I wonder if anybody else has come up with it first?
__________________
http://www.audiamorous.net
Axon is offline   Reply With Quote
Old 22nd January 2008, 04:17 PM   #12
jamsmith
Lives for gear
 
jamsmith's Avatar
 
Join Date: Dec 2007
Location: Atlanta, GA
Posts: 1,044
Quote:
Originally Posted by Axon View Post
This is such an eminently plausible idea that I wonder if anybody else has come up with it first?
Probably, I am the Elisha Grey of audio. I have long given up even persuing any of my ideas commericially because by the time I get a prototype done, someone shows up at NAMM with whatever I am working on! I do have one item up my sleeve (a guitar footpedal) that I have had for years and am surprised no one else has done.

And if you are successful creating that, at least let me have a copy!
__________________
Screamin' Michael Jamsmith - www.jamsmith.com

"You CAN polish a turd, but you just end up with a shiny turd."
jamsmith is offline   Reply With Quote
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off


All times are GMT +1. The time now is 04:05 PM.


Powered by vBulletin®
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.0.0