well this looks neat. Need to read the paper. I wonder what kinds of lookup semantics this technique supports. Is it (the equivalent of) hashkey-lookups? I wonder if the encryption preserves things like range lookups, etc.
Would be KILLER if we had a successor to electrum's protocol that used this or something similar.
Hey, creator here! Happy to answer any questions.
Yes, today this is (essentially) private hashkey-lookups. Supporting range or batch queries (for example, to fetch data on all the addresses in your wallet) is something we are still working on.
We've had a few people suggest doing something like "private RPC". Do you know what metadata in particular (which RPC calls) is sensitive? Is the idea that miners and other full node operators would like to hide what blocks they look at etc? Or is it more for light clients that want to hide metadata about what they look at? Would people be willing to pay for a "private" (via homomorphic encryption) RPC endpoint?
Cool to hear about the range queries! I need to read the paper but was just generally curious about the shapes of database queries you could make with this encryption scheme.
In terms of bitcoin endpoints, the usecase I had in mind was privacy preserving light clients. Right now if you want to build a light client, you basically have two options for fetching block data: electrum servers or compact block filters (aka neutrino filters). Fetching from electrum servers is really efficient but you leak the things you care about to the electrum server operator. Compact block filters are way more private, but take a lot more bandwidth so are a tough tradeoffs for people on metered mobile data plans. From the abstract of the paper, it sounds like this scheme could take 1.9x the request bandwidth of doing it electrum-style but be way more private (what about response bandwidth? Again, haven’t read the whole paper yet). That would be super cool. Would people pay for that? Maybe, but you’d want to really think about metering dimensions/pricing model. Personally, if I could pay a couple cents a month over lightning to have a bandwidth-efficient, privacy-preserving endpoint for BlueWallet (or something), I’d do it in a second.
Hey @blintz, question for you: I'm reading the paper and there are the query-independent public parameters that in SPIRAL are larger than in other PIR schemes. Can those parameters be shared among clients, or do they need to be client-specific? I'm looking at the 125MB parameter size for the StreamPack construction and thinking it would be nice to just bake that into the client software. I'm only on the first section of the paper, so maybe that's explained later.
No, one downside of our scheme is that the public parameters are larger, and they must be per-client and stored on the server.
For reference, both our Bitcoin and Wikipedia demos actually use the SpiralPack scheme, which has much smaller public parameters (~8 MB for btc.usespiral.com).
This is a really helpful summary, thanks. I didn't realize people were already trying to achieve privacy using a kind of costly solution (block filters). We achieve better privacy at much lower bandwidth, and we would also allow you to query addresses other than your own.
The paper outlines a lot schemes with various tradeoffs, since it was published at an academic conference, so not all the numbers apply to this particular application. I would estimate that the response bandwidth overhead if we tried to make a 'private Electrum' would be about 2-5x over Electrum (still orders of magnitude less than the neutrino way). The request bandwidth would be negligible, I think. Probably the largest roadblock will be server cost, since the server needs to do an expensive computation; of course, if users pay for the service, then this isn't an issue.
If you're looking for a higher-level summary of the paper, there is also a conference talk I gave available online: https://www.youtube.com/watch?v=bI7lmKCAmA0.
Would people be willing to pay for a "private" (via homomorphic encryption) RPC endpoint?
Would people be willing to pay for a "private" (via homomorphic encryption) RPC endpoint?
I think so. When/if it is possible to homomorphically and privately interface with a bitcoin-core node, I can imagine light clients paying to interface with a set of such nodes.
Great idea. How will the average user know that you aren't eavesdropping on their queries?
I run my own node and Mempool.Space instance so I haven't looked up any of my own tx info on a block explorer in quite some time but that's not easy for the average user.
It might be a good idea to add a test address like the Genesis address or maybe some famous addresses with large balances just for fun and to test using the platform.
Keep building. We need more innovative tools like this.
We will have to do some good communication and marketing :)
If you don’t mind me asking, do you run your own full node in your own house? Is the bandwidth reasonable to your ISP etc?
Adding a test address is a good idea. Some of the exchange hot wallets make good examples.
I run both MyNode and Umbrel on 4GB RAM Ras Pis. I have no idea if my ISP is ok with the bandwidth but I haven't noticed any spikes in the price of my bill or anything.
I think exchange hot wallets would be a great idea. There are also some TxIDs here on https://kycp.org/. You might be able to get some good address ideas from that list.
Keep it up.
Here's the idiot version for anyone who isn't too clear: when you ask for the balance at your address on a block explorer online, they can correlate your IP address with the bitcoin address, so it's really bad for your privacy.
Instead, you could download the entire blockchain (which of course is what you do, when you run a node). That keeps your privacy since whoever you got the blockchain from, doesn't know exactly which addresses are yours, from that.
The obvious downside of that is, while private, it scales horribly.
Here, advanced cryptography is used to find some of the best of both worlds: the privacy is perfect, and the scalability is not perfect (it's going to be a bit slower/more bandwidth than an ordinary query), but way better than just downloading the whole database (this field of applied cryptography is called "Private Information Retrieval").
Question to the authors ( @blintz ?), concretely how much bandwidth is used in the simplest query, let's say a single address query (or tx query), compared with a non PIR based explorer?
This is a great summary! We really should write something like that in a whitepaper. Thanks for this.
Great question - this takes about 14 KB upload and 128 KB download per query. This is much more than for a normal address query, but on an absolute basis, it’s really small (less than just loading a page with a photo, for example). There is also a one-time upload for the first query of about 8 MB. As far as bandwidth, it would be usable even on a slow 3G connection. The main cost is really on the server side, since we have to maintain a fairly large server to answer queries. This is why we’re interested in building a service people would be willing to pay for, so we can cover these server costs.
The problem with any non-self hosted website like this though is it ultimately still relies on trusting that the maintainer or anyone breaking in without the maintainer knowing, is logging.
You're absolutely right! For now, the best way to defend against this is to just save the webpage (lol), and use that local page as your client. We will also be releasing some kind of Electron app or Chrome extension (open to suggestions) to mitigate this kind of attack. That way, we can do code signing, have people audit the code, etc. A mobile app will also be a good way to ensure that you are running a secure client.
Not to be overly negative, I agree, but it raises a question. If a user has to download and run specific code to make queries more safely, why not have that code be the more general Tor Browser? It has more eyes on it still, which is useful since running any extra code in your browser or otherwise adds many other risks.
The user can then visit any explorer, disregarding whether it logs or not, though care should be taken not to make multiple queries using the same identity to avoid linkage. It's a neat thing and all, but what's the actual value over visiting this or another explorer via an anonymizing tool?
That’s a totally reasonable question. The biggest reason is that Tor is just not making any kind of cryptographic guarantee of your privacy; it’s just kind of ‘statistically mixing’ your behavior with others.
The privacy guarantee we provide is categorically stronger. It’s a cryptographic guarantee, like the one that underlies ECDSA signatures or SNARKS. Tor is more analogous to going to a library and using their WiFi to make queries, whereas Spiral truly cannot learn your queries. You could of course always use both, if you’d like.
As you point out, in both cases, you need to run code on your machine. We hope that, over time, we get lots of eyeballs on our client code, and in fact, it would be cool to get it integrated into Brave, Tor, or as an extension for Chrome or Firefox.
Awesome and mind-blowing. @blintz could you expand on the possibilities?
Could a web server serve pages without learning which ones is it serving?
(EDIT: https://spiralwiki.com/ ooooh!)
Can a distributed storage system like Arweave serve content chunks in this way?
Can a video hosting stream video without learning which video is it streaming?
What are the performance hits?
Yeah, the possibilities are extensive! This is what makes me really excited :)
Yup, our https://spiralwiki.com demo shows how to do this for webpages (Wikipedia).
We’ve looked into using Spiral to make distributed protocols like IPFS more private. Especially for content-addressed systems like IPFS, privacy is a huge roadblock to adoption: who wants to announce the exact content they are looking for to (almost) the entire network? Spiral could definitely help, though ultimately, the IP layer will still leak your IP to the final target server. At the very least, using Spiral to do Kademlia for IPFS, your IP and desired content will not get broadcast to every node on the network. It seems cool, but we’d like to figure out if people would pay for that (so we could grow / pay our server costs).
Yeah, it works surprisingly well for video. Something like ‘private Netflix’ is pretty feasible.
The performance hit is significant, since to answer a single query, the server needs to do a computation involving every element of the database. Still, as you can see from the demo, this can be pretty darn fast.
Suppose you have a list of key-value pairs and the hash of this list is well-known. Is it possible to not just fetch the value but also prove that the value comes from the list?
A weaker version of this question, is it possible to scramble the database in such a way that the server can still serve queries but can't delete a row without discarding the whole thing?
The case I'm thinking about is DNS. The "Unstoppable" domains, for example, are very much stoppable as all Polygon queries go through polygon-rpc.com (this is exactly the sort of thing that makes Bitcoiners roll their eyes and claim that Bitcoin isn't crypto). It would be super cool to resolve domains without revealing which domain you're looking for, but the problem of censorship isn't immediately solved by Spiral yet.
This is definitely gets more into the realm of zero-knowledge. Basically, as it stands, it is not easy to prove that the server is not censoring or tampering with the database.
DNS is a great application, especially for various decentralized naming systems, like ENS or Unstoppable. Certainly we can at least mitigate the privacy disaster that is status-quo today (people just using eth.xyz, and broadcasting every .eth domain they resolve to a random third party). Censorship-resistance is, like you said, a whole new can of worms.
Cool! @blintz, it would be great if you would add a quick intro with diagrams about how this works. E.g. take Figure 1 from the paper and update it specifically to Bitcoin :)
Does the blockchain need to be indexed in a special way ahead of time? Or is this somehow accessing the usual blockchain data?
FYI: There was a delay ~7 minutes for the transaction to show up after the block was mined.
We are working on a whitepaper that's specific to the Bitcoin application, written in a less academic style that explains our threat model etc.
Nothing too special - we dump all the UTXO's from a full node that we run, and then concatenate and gzip data about each address (balance and recent txns).
Yeah, the delay is because the dumping actually takes quite some time. It's not as easy as you'd think to quickly dump UTXO's from bitcoind. I'm sure we can reduce the delay, especially if we're able to fundraise, hire folks, etc.
Sounds great! Let me know if you need help reviewing whether the paper/post is dumbed down enough so even people like me can understand it :)
This is compelling as an idea but it seems complicated to understand relative to the “complications” of running your own node and block explorer.
I could see how this might be useful outside of consumer contexts. Like for internal services at a large company or something.
Communicating the threat model (especially to non-technical folks) is indeed tough. We'd like to explain how our server never learns anything about the addresses you look up; they are always encrypted under a key that is only on your client.
One thing to highlight is that the security of our system is really analogous to hosting a node in your home, on your own hardware. When you run a full node in the cloud, you are still leaking data (like your address) to the cloud provider itself. Even if you use Tor and pay for the cloud service anonymously, that is still a weaker guarantee than the one we offer.