Good intention, but web search works well only if you already have A LOT OF users to train it's ranking algorithm on their data. So I don't believe that general web search is possible this way, and to make it possible we need, as community, in a first place start sharing our browsing histories (anonymized of course) to some public service, so that projects like this could use it to make a good algorithm.
Thank you for your feedback @nikicat! I will disagree though. There are at least two proxies for 'browsing histories' - web links and social mentions (the more external links a page has, the more 'popular' it is). It was used by Google from the very beginning (PageRank), and I use it too already. In fact, without the link data, ranking the web is plain impossible - way too much spam. Of course, Google now has access to huge volumes of clickstream data, but having worked in SEO industry for many years I can tell that PageRank is still by far the biggest ranking factor.
reply
I far as I know Google as well as other search engines stopped to use PageRank like 10 years ago (or significantly lowered this factor weight), because it's pretty easy to fake.
reply
They don't use the exact PageRank formula, that's for sure, and they stopped publishing pagerank values, but links are still the major factor. And they're not that easy to game, all the SEO link farms are trivial to detect and the real impact is impossible without the actual content marketing and promotion. The whole 'we use 200 ranking factors' is mostly public relations story - attempts to mislead SEO industry and make search look more defensible as a business. Number of factors don't matter, their weight matters, and links are still at the top.
reply
This is a bit strange because of both click-factor and link-factor are user-behavior based factors, the difference is that links to webpages are posted by users much rarely than users search for the same webpages, and, moreover, click-factor provides a relation information between search query the page, and links do not. So, in my opinion, PageRank is a good tool to make initial draft of ranking algorithm and as soon as you have enough auditory you should use user behavior data as much as possible, but it's a classic chicken-and-the-egg problem. What do you think about buying user data from companies that collect it or directly from users willing to sell it?
reply
Links have anchors (link text) - it relates target page with a query.
I think the catch is with cost - it's very easy to abuse click-factor, and the only way to fight that is to track and filter very aggressively and still not be sure how much you were 'gamed'. Link-factor is much costlier - you have to buy a domain, a server, get your own site promoted first, before you can signal anything. Link farms that don't have trusted in-links are worthless, no matter how big.
In general, 'user data' might be costly and thus higher quality, or cheap - noise, so whether it's worth buying depends on that. Anonymous free click streams are noise. If we get to scale with an anonymous click-stream paid for with LN then maybe we might get a good signal, but I'm not sure about that yet.
Google probably started paying attention to click-stream only when they've managed to get many people log-in to their gmail accounts in their browsers. At least now they could filter out anonymous stuff and also normalize the effects of 'heavy-clickers'. Privacy suffered though. I hope that anonymous paid click stream could work too, but we'll see.
reply