pull down to refresh
67 sats \ 2 replies \ @Undisciplined 29 Nov \ on: Research in Public #08: Diamonds in the Rough (Analysis of Post Quality) econ
Why don't you include the extraneous things, like time of day or where it appeared, in the model? Are you just assuming that they're uncorrelated noise?
I could have included those factors in the model, and then used only the content-related variables to rate quality.
Y'know, now that you mention it, I realize it didn't really occur to me to go that route, once I settled on the XGBoost approach. It's almost like I have an economist mode, which is focused on primarily linear models with a strong concern for endogeneity, omited variables, and the interpretation of the error term, and a machine-learning mode, which doesn't worry about any of those things.
I remember back when I first started working with industry ML folks, I wanted to include company fixed-effects in a model, and a computer-science trained ML guy got made at me and told me it was inappropriate. I tried explaining what the purpose was, but he couldn't really understand. They're not trained to think about endogeneity much, it seems.
reply
I don't even have a story in mind about what my suggestion would address. It just felt like it might give you a cleaner estimate to take out whatever contribution those things are making.
Not entirely relatedly, I was wondering if reader tastes are different at different times throughout the day. Maybe the same post isn't as good later in the day as it is earlier.
reply