I do think some assumptions need to be made about the missing data for the procedure to be valid though. Like, if the data is missing at random then I think the procedure would work great. If missing-ness is non-random, but only depends on observed variables, then the procedure could also work.
But if missing is non-random and also depends on unobserved correlates, especially if it depends on unobserved correlates of the outcome variable, then I think the procedure is likely to yield biased results.
In the first scenarios, where it seems ok, are you introducing measurement error?
If so, you're going to have attenuation bias.
reply
Hmm, good point. I’ll admit I haven’t thought carefully about imputation. Why wouldn’t any procedure that imputes data without the outcome variable lead to attenuation bias, and why wouldn’t any procedure that uses the outcome lead to endogeneity?
I’m assuming there’s a good answer if I read the literature. But it’s possible I’d be disappointed as well
reply
I hadn't thought about imputation leading to attenuation bias until just now, but it seems like it would (if I understand why measurement error has that effect).
I'm also sure this has been discussed at length in the literature. It surprises me a little that none of my advisors or econometrics professors mentioned it, though.
reply