pull down to refresh

That's shocking, indeed. People really really suck at statistics.
reply
To be fair, the question didn't state its own assumptions very clearly. It didn't say that a person is randomly sampled from the population and given the test.
Typically, when a test is administered, it's because a patient has requested it, or come in with symptoms.
So to really answer the question, we'd need to know the rate at which people with the disease receive a test and the rate at which people without the disease receive a test.
Under real world scenarios, the real answer is probably closer to 95% than it is to whatever the "correct" answer is, which I guess was supposed to be:
\frac{\frac{1}{1000} \times 95\%}{\frac{999}{1000} \times 5\% + \frac{1}{1000} \times 95\%}
However, the real answer is:
\frac{\frac{1}{1000} \times \alpha \times (1-FNR)}{\frac{999}{1000}\times \beta \times 5\% + \frac{1}{1000}\times \alpha \times (1-FNR)}
where \alpha is the rate at which diseased people take the test, \beta is the rate at which non-diseased people take the test, and FNR is the false negative rate, which we also weren't told!
So... the badness at statistics goes all the way around it seems
reply
Thanks for writing it all out. I probably would have contributed to the bad look this test gave~~
reply
Statistics is genuinely hard, and I didn't realize all the nuances until I started working with real life data generated by real life people.
reply
Probability is not easy
reply
I'm too stupid to do it the correct way
my way would have been simple and wrong but...
false positive is 5 percent or 1/20
actual prevalence is 1/1000
20/1000 = .02 or 2 percent which is close to the actual answer, sort of
reply
one of the comments to the article (substack) addresses your question/assumptions
P(T|D) = probability of testing positive given the disease (sensitivity)
(I'll assume this is 100% since it wasn't specified)
update: Step 3: Calculate each component:
  • P(¬D) = 1 - 0.001 = 0.999
  • P(T) = 1 × 0.001 + 0.05 × 0.999 = 0.001 + 0.04995 = 0.05095
Step 4: Calculate the final probability:
P(D|T) = (1 × 0.001) / 0.05095 ≈ 0.0196 or about 1.96%
reply
yeah you need P(test|disease) to solve the problem.
The one that even fewer people appreciate is that you also need the false negative rate, which is not necessarily calculable from the false positive rate.
FNR = P(test=negative | disease=true) FPR = P(test=positive | disease=false)
They aren't the 1-minus of each other!
reply
My friend wrote to me...
Turns out it’s not too hard to find the source. The bad news is twofold:
  1. The paper dates from 1978 so it’s almost a half a century old.
  2. It only involve a sample of 20 students.
The secondary source gets it wrong, reporting that there were only 10 students.
reply
haha, another good example of how urban legends spread.
I mean, I guess the overall message is true: people are bad at statistics. But the problem seems to run even deeper than what's implied... including how the original progenitors of this trial didn't give proper instructions, and how the results of the trial got propagated in incorrect ways, and then how it developed into this myth.
statistics or probability which are related but not the same thing imo
reply
Bayes Theorem
I can never remember the formula
reply