pull down to refresh

The sudden concern with illegal data in blocks and mempools feels like pearl clutching to me (#1263156), but if it is of concern to folks, here is an interesting proposal by Roasbeef to use fountain codes to create a new kind of pruned node that can still help other nodes with IBD.

What are fountain codes?

Fountain codes provide a way for data to be transmitted across a lossy network connection from a single source to many users. The name "fountain code" arises because fountain codes behave analogously to a fountain. A file being transmitted is analogous to a glass of water from the fountain. To fill the glass (or reconstruct the file), you need to catch enough droplets from the fountain such that your glass becomes full. It isn't important which droplets of water you catch in your glass; once you have enough, the glass is full (and the file can be reconstructed). Fountain codes work in the same way, but with information instead of water. source

How could they work in Bitcoin?

In a Bitcoin context this means a node could specify a buffer number of the most recent blocks that it will always keep. For blocks prior to these, the node could assemble them into chunks and encode these blocks into "droplets." After encoding the node could delete the actual blocks, only keeping the encoded droplets. It would repeat this all the way back to genesis.
droplet nodes are the nodes that can save space by storing the encoded chain, while still being able to help nodes with IBD
droplet nodes use a random seed to select d of the k blocks in each epoch to encode, this is a decentralized process
bucket nodes are new nodes that join the network, which are able to recover the original chain by sampling K public nodes with overwhelming probability
γ is a system param, which represents the fraction of blocks that nodes actually keep
if we set γ to 100, then we reduce the required storage from ~700 GB to ~7GB, but we need to contact 110 nodes to do IBD
if we set γ to 1000, then each node stores ~700 MB, but you need 1k nodes to sync
γ can be tuned to navigate a tradeoff between storage savings, and the number of bootstrapping nodes required
There is a paper about this from 2019 and a GMax post on the Bitcoin Wiki from 2015.
I don't know if such a scheme is very practicable, but perhaps the pearl clutchers would do well to spend some time on things like this rather than screaming about things on X.