pull down to refresh

That's shocking, indeed. People really really suck at statistics.

To be fair, the question didn't state its own assumptions very clearly. It didn't say that a person is randomly sampled from the population and given the test.

Typically, when a test is administered, it's because a patient has requested it, or come in with symptoms.

So to really answer the question, we'd need to know the rate at which people with the disease receive a test and the rate at which people without the disease receive a test.

Under real world scenarios, the real answer is probably closer to 95% than it is to whatever the "correct" answer is, which I guess was supposed to be:

However, the real answer is:

where is the rate at which diseased people take the test, is the rate at which non-diseased people take the test, and is the false negative rate, which we also weren't told!

So... the badness at statistics goes all the way around it seems

reply

Thanks for writing it all out. I probably would have contributed to the bad look this test gave~~

reply

Statistics is genuinely hard, and I didn't realize all the nuances until I started working with real life data generated by real life people.

reply

Probability is not easy

reply

one of the comments to the article (substack) addresses your question/assumptions

P(T|D) = probability of testing positive given the disease (sensitivity)

(I'll assume this is 100% since it wasn't specified)

https://open.substack.com/pub/boriquagato/p/why-jay-is-the-right-guy-for-nih?r=2slf6a&utm_campaign=comment-list-share-cta&utm_medium=web&comments=true&commentId=99767215

update: Step 3: Calculate each component:

  • P(¬D) = 1 - 0.001 = 0.999
  • P(T) = 1 × 0.001 + 0.05 × 0.999 = 0.001 + 0.04995 = 0.05095

Step 4: Calculate the final probability:

P(D|T) = (1 × 0.001) / 0.05095 ≈ 0.0196 or about 1.96%

reply

yeah you need P(test|disease) to solve the problem.

The one that even fewer people appreciate is that you also need the false negative rate, which is not necessarily calculable from the false positive rate.

FNR = P(test=negative | disease=true) FPR = P(test=positive | disease=false)

They aren't the 1-minus of each other!

reply

My friend wrote to me...

Turns out it’s not too hard to find the source. The bad news is twofold:

  1. The paper dates from 1978 so it’s almost a half a century old.
  2. It only involve a sample of 20 students.

Primary source: https://sci-hub.ru/https://www.nejm.org/doi/full/10.1056/NEJM197811022991808

The secondary source gets it wrong, reporting that there were only 10 students.

https://www.sciencenews.org/blog/context/doctors-flunk-quiz-screening-test-math

reply

haha, another good example of how urban legends spread.

I mean, I guess the overall message is true: people are bad at statistics. But the problem seems to run even deeper than what's implied... including how the original progenitors of this trial didn't give proper instructions, and how the results of the trial got propagated in incorrect ways, and then how it developed into this myth.

reply

I'm too stupid to do it the correct way

my way would have been simple and wrong but...

false positive is 5 percent or 1/20

actual prevalence is 1/1000

20/1000 = .02 or 2 percent which is close to the actual answer, sort of

reply

statistics or probability which are related but not the same thing imo

reply

Bayes Theorem

I can never remember the formula

reply