pull down to refresh
20 sats \ 3 replies \ @nikicat 6 Dec 2022 \ parent \ on: Building a new Web Search Engine. Just for you, Stackers! bitcoin
I far as I know Google as well as other search engines stopped to use PageRank like 10 years ago (or significantly lowered this factor weight), because it's pretty easy to fake.
They don't use the exact PageRank formula, that's for sure, and they stopped publishing pagerank values, but links are still the major factor. And they're not that easy to game, all the SEO link farms are trivial to detect and the real impact is impossible without the actual content marketing and promotion. The whole 'we use 200 ranking factors' is mostly public relations story - attempts to mislead SEO industry and make search look more defensible as a business. Number of factors don't matter, their weight matters, and links are still at the top.
reply
This is a bit strange because of both click-factor and link-factor are user-behavior based factors, the difference is that links to webpages are posted by users much rarely than users search for the same webpages, and, moreover, click-factor provides a relation information between search query the page, and links do not.
So, in my opinion, PageRank is a good tool to make initial draft of ranking algorithm and as soon as you have enough auditory you should use user behavior data as much as possible, but it's a classic chicken-and-the-egg problem.
What do you think about buying user data from companies that collect it or directly from users willing to sell it?
reply
Links have anchors (link text) - it relates target page with a query.
I think the catch is with cost - it's very easy to abuse click-factor, and the only way to fight that is to track and filter very aggressively and still not be sure how much you were 'gamed'. Link-factor is much costlier - you have to buy a domain, a server, get your own site promoted first, before you can signal anything. Link farms that don't have trusted in-links are worthless, no matter how big.
In general, 'user data' might be costly and thus higher quality, or cheap - noise, so whether it's worth buying depends on that. Anonymous free click streams are noise. If we get to scale with an anonymous click-stream paid for with LN then maybe we might get a good signal, but I'm not sure about that yet.
Google probably started paying attention to click-stream only when they've managed to get many people log-in to their gmail accounts in their browsers. At least now they could filter out anonymous stuff and also normalize the effects of 'heavy-clickers'. Privacy suffered though. I hope that anonymous paid click stream could work too, but we'll see.
reply