pull down to refresh

This is awesome! I really like the favorite word analysis. If it's not too time consuming, it would be pretty fun to expand that to the top 20 stackers.
Very interesting, too, to see the hottest discussion stats. I wonder what kind of stats could be pulled around comments (longest average reply chain, highest average words per comment...)
And stats about lurkers (can we create a top lurker category? Highest number of zaps by lowest number of items or something).
It seems like the stackers really like talking about stackers...
Always open to suggestions on new metrics! Lurker is a great idea, i'll think about adding it next time.
It seems like the stackers really like talking about stackers...
"stacker" is also the proxy for "stacker news", since "news" was a generic term that I left out (if I didn't leave it out, both stacker and news show up too frequently---I have to figure out how to stop the algos from doing that)
reply
30 sats \ 2 replies \ @optimism 6h
Do you use something like SpaCy? I think you can force it to see "stacker news" as a full token.
reply
i'm using sklearn's CountVectorizer, which allows bigrams. I didn't like the results with full bigrams, so i need to figure out how to make "stacker news" the only bigram in the vocabulary
reply
30 sats \ 0 replies \ @optimism 6h
lazy solution: s/stacker news/stackernews/gi lol
reply