Some statistical insight: I notice these are all time-series. All the time series with high spurious correlations are going to be ones with enough Y-axis variation that just happens around the same time for both series.
Different timing in Y axis variation = low measured correlation Not enough Y axis variation = harder for the two series to line up in such a way as to cause a high R-squared number.
That being said, it's often the case that even if the correlations are not causal, there is some underlying driver which affects both series.
Belongs in ~math and ~econ too!
Some statistical insight: I notice these are all time-series. All the time series with high spurious correlations are going to be ones with enough Y-axis variation that just happens around the same time for both series.
Different timing in Y axis variation = low measured correlation
Not enough Y axis variation = harder for the two series to line up in such a way as to cause a high R-squared number.
That being said, it's often the case that even if the correlations are not causal, there is some underlying driver which affects both series.