pull down to refresh
10 sats \ 0 replies \ @mudbloodvonfrei 31 May 2023 \ on: A Mechanistic Interpretability Analysis of Grokking tech
It's an interesting article. I didn't understand most of it, but still. It seems that the author wants believes that interpretability can lead to better alignment of superintelligent AIs but my question is this: if you can interpret the behavior of a system, wouldn't that mean that the system is not superintelligent compared to humans? We have a lot of experts that try to interpret human behavior, or to manipulate it (i.e. align it) with certain goals, but we still don't know all that much about the human brain and people can't always interpret their own behavior much less someone else's.