Quote:
Originally Posted by
Rumi
➡️
I would like to know!
I am not into spending time for flawed tests.
First of all, you have to ask yourself the question, "Do I want this test to be scientific or not"? And, what exactly is "scientific?" Is it worth putting the extra effort into it?
It is the general consensus that unless a test implements statistical significance utilizing known probabilities and null hypothesis with a minimum number of trials (10 - 25), and a minimum confidence level of 95%, the test would not be deemed as scientific. Of course, significance testing goes far beyond the field of audio.
A very brief description of ABX testing can be found
here, which touches on what I'm referring to. There are very large charts available that list confidence levels achieved based on the number of trials performed and answers given.
IMO, if a test will produce an outcome different than its scientific equivalent, then it is flawed. If you disagree with this, then you're effectively arguing against science.
Is a test that implements less than ten trials, for example, flawed? Based on scientific reasoning, yes. What if, for example, a subject performed ten trials, but chose 7/10 as being their preference. In other words, three times they chose preamp 'B' as their preference, and seven times they chose preamp 'A' as their preference? A person not familiar in the area of statistical significance would likely say that preamp 'A' is definitely their preference. But according to science, it is not! In fact, they do not have a preference according to science (without a minimum degree of certainty). If someone says otherwise, then they should back up your reasoning with science.
In significance testing, we want to be relatively certain that the subject is not guessing, factoring a minimum confidence level of 95%. In other words, if a subject chose preamp 'A' 9/10 times in a
randomized scientific test, then we can be highly confident that he/she is indeed not guessing (9/10 trials has a confidence level of over 95%).
Not following the above guidelines is just one of many ways a test can be flawed, but it's not that hard to come up with a test that is reasonably flaw-free.
As this thread goes on, you'll start to see how other flaws will be called out and/or questioned. But, how many tests even follow the above guidelines? 1, 2, 3 maybe on this entire board?