Quote:

Originally Posted by

**bogosort**
Excellent, then let's make even more progress.

(…)

Anyway, I've run out of energy. I fear I haven't been clear enough on this admittedly eye-glazing subject. Hopefully there's enough to at least get us on the same page.

Wow. *applause* we ARE making progress! That is a really thorough outlining of your terms, and it's much easier to understand where you're coming from. Thank you.

We might need to get into your variance concept, because the thing about very large samplesets is that subsets cease to matter. We don't care if there's 100 heads in a row if we have an infinite series, BECAUSE we have an infinite series and it soaks up all such departures from the norm. Indeed, any pattern (including heads-tails-heads-tails-heads-tails) is equally likely and unlikely. The subsets don't matter, only the total results.

The key point here is that your model for a statistical distribution is a coin flip. This gives us our results, which converge either to 50% (for a normal coin) or to 100% (for, say, a coin with two heads). So, working from that, I can say that your model for human perception is indeed a situation where a result is either there, or it's not.

Let's try to incorporate some of the lossyness of human perception into this, but in a way that's still mechanical, with no humans involved. Consider that the coin flip's being recorded by a camera, so that if the camera sees heads, it returns a result of heads.

We can model human fallibility by doing our coin-flip, but then interposing something like a spinning fan between camera and coin. In this way we can introduce an element of data loss that we know is the case for human perception, and begin to simulate humans' failure to return reliable results: but the model's still purely mechanical. All it does is ensure that sometimes, the camera does not get good data. The coin's obscured, in part or entirely, by a fan blade, and the data isn't there (neither heads nor tails, though the coin still exists and hasn't gone anywhere).

I'm going to suggest the behavior of humans isn't so unlike this obscured-by-fan condition: any number of things can happen to damage what would otherwise be pure, robot-like perception. In fact to perform on ABX well, we have to calm ourselves and focus and try to free ourselves of distraction (as one who has done well on ABX testing I can assure you this isn't optional: you MUST chill and just take in the data calmly and attentively)

The thing is, with this coin-fan-camera rig, we can draw conclusions over extended trials of the combination of the coin and the fan, and we can expect infinite series to converge on values that are neither 50% or 100%, EVEN THOUGH the base test is and will always be a 50% test. The introduction of a partially obscuring factor (and anything, even environmental background noise in a place that's not an anechoic chamber, can work as such an obscuring factor) leads to distinct and valid results you wouldn't expect to converge. Just mechanically, you can work out what it does to the statistics. Essentially, it renders some of the tests to always return 'guessing' even under conditions where you could otherwise identify the coin flip.

In audio ABX testing, the result is also binary. You correctly heard a difference and identified it just like you were calling 'heads!', or you called wrong thus establishing that for that iteration of the test, you did not perceive the difference correctly. On the surface, it looks like it will therefore always converge either to 50% or 100%.

But with the real world and real humans we ALWAYS have this 'obscuring fan' in so many ways, whether it's attention or background sounds or feeling challenged upon missing a guess for the first time (they'll make fun of meee! oh noes!). Part of the purpose of gathering a larger statistical sample is to show what the real value is: this is why we talk of confidence levels. That's why I'm raising the idea of infinite series: if there is a condition where an infinite series on a binary test returns an intermediate value such as 70%, that establishes it's not wholly a binary test!

And that means: statistically, the larger the sample set you have, the more likely it is that you're being led not towards 'variance' but to an accurate representation of the underlying condition. In fact, it becomes likely that you can quantify the condition. You can SAY, 'according to this test, I can only hear this thing that bugs me one time out of ten. I've done a billion trials and the probability of me hearing any given occurrence of the event is one time in ten, no more, no less. AND IT STILL BUGS ME.'

In short: there are times you can hear the truth of a sound, and times that it gets by you. Might be only one time in ten that you can hear the 'grunge' or unpleasant artifact, but if it exists, every ten times or so, there it is again. Can you pin it down, can you prove it? Nope. But you complain vociferously, all the more when told you are imagining things!

Those who fuss over things you can hear only one in a hundred are getting awful picky, but they are legion. You'll see them all over, insisting things like no digital format can ever capture a cymbal, or something else like that, and they'll be ignoring huge and obvious errors in whatever they like: because they are singling out something in the format they dislike that bugs them, and they will not forgive even one in a hundred incidence of such a problem. The instant they pick up on an artifact, it's all they hear and they imagine ten times as much of it as there is.

Such is human nature.

Can we agree that humans hearing 'is X like B, or like A' and picking one, is not really a binary test because extraneous factors will spoil the human's otherwise legit perception some of the time? Or do we have to assume that every human on every ABX test is infallible up to the point that they're deaf to further information?

Further reading:

http://en.wikipedia.org/wiki/Monte_Carlo_method
http://en.wikipedia.org/wiki/Probabi...isk_assessment
http://en.wikipedia.org/wiki/Common_...e_(statistics)
http://en.wikipedia.org/wiki/Stochas...ng_(insurance)
http://en.wikipedia.org/wiki/Human_reliability