275 sats \ 0 replies \ @cointastical 2 Oct 2022 \ parent \ on: Meta SN: Should we raise posting costs? bitcoin
The AMP duplicate issue is easy to fix. There's already Issue #138 in SN's Github repo for that. And for the case where there's a Description post with a link in it which was shared in a prior Link post was acknowledged as a possible problem, see Issue #150 on that. And there are some other issues on SN's Github related to improvements on dup detection (e.g., ignoring upper/lower case in URL, ignoring www. subdomain, ignoring m. subdomain (e.g, for youtube videos), etc.
That could be done ... exact duplicate is easy. Simply store a hash of the response. If the hash for a new post matches the hash for a prior post, include the earlier as a potential dupe. That solves the case where there are two URLs that respond with identical content. That doesn't solve for something like where ZeroHedge reprints a Bitcoin Magazine article. In that instance they have a different hash but are essentially the same content.
There are anti-plagiarism services and duplicate content detection tools that could identify occurrences of that. But I don't think that a duplicate could be detected in real-time -- as SN doesn't know what prior post(s) should be looked at when there is a new post. Maybe doing the anti-plagiarism search against prior posts with similar words from the title would find duplicates but that would take longer than is reasonable to expect the user entering a post to wait.
Or simply there could be a mechanism for users to flag a post as being a duplicate, and some handling be done for that. Reddit does this, but their approach involves moderators who investigate -- something SN is designed to not need.
The duplicate posts is a minor inconvenience that I suspect bothers mostly only a small number of heavy users of SN. I can imagine there are other things to be done that are at a higher priority.