pull down to refresh

Here is an interesting couple posts by Giacomo Zucco. He talks about the history of embedding data in Bitcoin and particularly about whether it is encoded or not and how he understands that.
So, about abuse/design distinction.
Encoding some information in Hex, that a Bitcoin node could not view, but an external tool could, was possible *by design" since the famous "Chancellor" reference by Satoshi in the very first Coinbase: https://mempool.space/tx/4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b. Now, it's important to say that that was not spam, since it was a small amount one-off (spam requires by definition to be repeatable "in bulk"), it was hardcoded (not a message consuming any additional bandwidth), it was hardly "unsolicited" (it was inserted by the very creator of the system), it was not wasteful (the field was necessarily including some data, and including a newspaper headline was a way to prove a fair mining launch).
With this, some level of legal risk, in jurisdictions where speech is not free, was arguable already possible by design: just like the author had encoded some text in hex in the first coinbase, other users may encode other information (indeed mining pool eventually ended up doing exactly that, routinely, to claim blocks, and most tools and explorers show that decoded). Since the very beginning, people could encode insults towards the Chinese Communist Party, increasing legal risk in China, or copyrighted books or transcriptions of NDA-covered documents. Direct CSAM was hardly an issue, since very few jurisdiction would jail people for ASCII-drawings, but many impose takedown for onion or magnet links to similar material. It was always possible, though, to encode other media, until the rate permitted by consensus rules (maxblocksize every 10 minutes on average). It's just that the way to decode them did't have the same "by design" precedent.
By the time of the famous Len Sassman tribute in 2011, people started using other techniques, like in that case PUSH_BYTES: https://bitcoin.stackexchange.com/questions/3370/in-which-block-was-len-sassaman-memorialised. It's noteworthy that this was less "by design" than coinbase HEX decoding, and even now most tools and block explorers would not show you Len's and Bernake's faces. The legal risk was still there, btw, never gone away: for every block, you could have an average of up to 1Mb of illegal media every 10 minutes to be decoded in a "non sanctioned" way, or up to 1Mb of illegal HEX text (including CSAM onion and magnet links, or other illegal things) every 10 minute to be decoded in a "sanctioned" way.
An additional "sanctioning" of data encoding by design was accepted by most devs, including Luke who committed the very change, with the introduction of the customizable datacarriersize option: https://github.com/bitcoin/bitcoin/commit/2aa6329. The rationale was, explicitly, allowing nodes to set their strategy of harm mitigation from spam (one theory was that allowing a limited "venting valve" in a prunable output would have limited the abuse in non-prunable ones). Now you could allow by design some level of encoding not only by miners, but also by transacting users: up to 40b as default per transaction via p2p mempool broadcast, or up to 1Mb via direct non-standard sumbission to miners (or via sub-networks with datacarriersize increased from default). Still, the only "sanctioned" way to decode it was HEX text (still possibly illegal in some jurisdictions, as explained), not more flexible multimedia formats.
The default standard size was increased to 80b a few months later (with Luke's ACK, even if he kept the previous default of 40b in Knots): https://github.com/bitcoin/bitcoin/pull/5286. Still the very same legal profile:
  • up to 1Mb per 10 minutes of arbitrary data can make it into blocks by design (but now it's realistic that a bit more >40b op_returns could due to reduced p2p friction, and maybe a cultural signal of legitimization),
  • the Core (or Knots) node itself will not decode anything at all, by design: Bitcoin is just money,
  • some widespread practice that may even be connected with Satoshi would "sanction" HEX decoding, including "ASCII art", not other multimedia formats,
  • the decoded HEX data could still be illegal in many jurisdictions (and include links to censorship-resistant media distribution systems, like onion or magnet links).
With the block-size increase via witness discount (which overwhelmingly privileges spam over real transactions), the hype wave around the "inscription envelope" exploit and the following spam attack (using fake op_if constructs in the inputs and not op_returns in the output) many tools spread that could actually decode multimedia. These tools are actually not very common for Op_return. One could argue that this common practice may be legally interpreted as a "by design" sanction? Maybe. Core devs closed Luke's PR to fix the exploit, considering it controversial: https://github.com/bitcoin/bitcoin/issues/29187. That would make the use of the exploit as much "by design" as the current big op_return. With the difference that there are many, well-maintained and up-todate tools to view multimedia content in "inscriptions", and not in Op_return.
Now, what changes, under this aspect, for Core V30? Literally nothing:
  • up to 4Mb per 10 minutes of arbitrary data can make it into blocks by design (only up to 1Mb via op_return, 4 only via "inscription envelope")
  • the Core V30 node itself will not decode anything at all, by design: Bitcoin is just money,
  • some widespread practice that may even be connected with Satoshi would "sanction" HEX decoding, including "ASCII art", not other multimedia formats,
  • the decoded HEX data could still be illegal in many jurisdictions (and include links to censorship-resistant media distribution systems, like onion or magnet links),
  • some widespread practice focused on inscriptions will allow to decode multimedia files.
Now, about BIP444 being poorly designed, in order of importance:
  1. it has clearly no consensus, so if enforced via flag-day (or via content-trigger) by a reckless sub-network, it would split Bitcoin consensus, creating chaos and potential fund losses. This is very bad.
  2. it is an "emergency" change to address a potential legal risk that was always there and that is not meaningfully increased by V30 (see the breakdown above), with all the negative consequences on consensus-creation and incentives to split that this "emergency" narratives create (it also consists of rushed and long-term-unsistainable rules that, if not temporary, would take a hard fork to undo, so it moves into the "temporary soft fork" territory, that while possible and legit in itself is clearly suboptimal in terms of communication and education: many users don't understand it),
  3. it "wastes" something scarce like consensus-change coordination effort, which is very hard and slow to establish, to tackle something as changeling and dynamic as data encoding (mempool policy are a better place spam mitigation, for example), and it "steals" mindshare from other more organic forks that could serve Bitcoin better (CISA+blocksize reduction, LNHance, cleanup, etc.),
  4. it directly concedes that legal risk minimization should drive cypherpunk software development, which could attract unwanted legal attention in itself and which could create the precedent of illiberal laws being passed just to drive change into Bitcoin (and even accepting this debatable premise it includes dubious legal claims about running it or not, that are not supported by strong legal opinions, but seems more like a "scare tactic" to increase adoption),
  5. it may interact non-trivially, in ways that I am not yet confident about, with potentially legitimate uses of Tapscript, that may find interesting applications in inheritance use-cases, etc. (this objection could be dispelled by more research on my side, and I may be convinced to drop it, but I'm not convinced yet as of now).