Highly available LND clusters (etcd, PostgreSQL, Ceph) with benchmarks \ stacker news ~lightning

This is an updated version of my original post "Highly available Lightning node cluster setup guide" #652690

Running a Lightning node, like any server, always carries the risk of failure. However, unlike simple web or file servers, a Lightning node has much stricter requirements. Lightning nodes are particularly sensitive to data integrity. Restoring even a slightly outdated backup can result in irreversible loss of funds. Therefore, it is critical to preserve the most recent state of the database at all times.

How do we prevent loss of funds, when the hardware fails?

This can be somewhat solved with static channel backups. But restoring a static channel backup requires a certain level of trust in the channel partners. That requirement of trust can of course be avoided with the additional use of a watchtower.

There is still one remaining case in which you would not be able to recover your funds: if the channel partner is permanently offline.

In this situation you need the channel database of your node to force close the channel. So to prevent loss of this database you can replicate the channel database over multiple disks with RAID.

This is a viable solution, but what if the entire system fails?

How do we keep the node online?

To keep the Lightning node operational during the outage of an entire server, you can run a cluster. A well-configured 3-node cluster remains operational even if one of the nodes fails, minimizing downtime and achieving high availability. Additionally, the replicated database safeguards against data loss.

To ensure high availability and prevent data loss, I've created a repository with detailed guides for setting up a highly available LND cluster, along with benchmarks to evaluate the performance of various configurations.

In the repository I provide guides for two example setups:

LND cluster with etcd backend
LND cluster with PostgreSQL backend

And I benchmark six different LND cluster setups:

LND with a bbolt database on Ceph RBD
LND with an SQLite database (without the db.use-native-sql flag) on Ceph RBD
LND with an SQLite database (with the db.use-native-sql flag) Ceph RBD
LND cluster with an etcd database and etcd leader election
LND cluster with a PostgreSQL+Patroni database (without the db.use-native-sql flag) and etcd leader election
LND cluster with a PostgreSQL+Patroni database (with the db.use-native-sql flag) and etcd leader election

Example setup 1: LND cluster with etcd backend

Example setup 2: LND cluster with PostgreSQL backend

Benchmark: Receiving a 100-shard multi-part payment

The first and most important benchmark focuses on receiving payments. In this test, an invoice is paid from the single LND node as a multi-part payment, split into 100 HTLCs along a predefined route. The benchmark measures the time from the initiation of the payment until the settlement of all HTLCs. Notably, the process of writing the preimages to the database on the single LND node is excluded from the measurements, since it is not relevant to the performance of the clustered node. The test was repeated several times to obtain an average result.

The benchmark results indicate that setups using a PostgreSQL database backend provide the fastest performance for receiving payments.

Regarding storage consumption, setups using Ceph consumed significantly more than the other setups. Receiving payments on Ceph-based setups caused the metadata of the Ceph OSDs to grow rapidly, quickly filling up the disks. This is likely due to unnecessary replication of uncompacted bbolt/SQLite database file contents.

Benchmark: Sending a 100-shard multi-part payment

The second benchmark involves paying an invoice of the single LND node from the clustered node as a multi-part payment split into 100 HTLCs along a predefined route. The time measured spans from the initiation of the payment until the successful writing of all preimages into the database of the clustered node. This process was repeated several times to obtain an average result.

The benchmark results indicate that sending payments is currently fastest with an etcd database backend. The slow performance of SQL-based databases like SQLite and PostgreSQL is expected to improve in the future as more parts of the database migrate from serialized key-value pairs to native SQL schemas.

The storage consumption of the Ceph based setups is also very high when sending payments. What is really remarkable though, is the very low storage usage of the PostgreSQL-based setups.

Benchmark conclusion

The benchmarks clearly indicate that a database on top of a Ceph RBD is not suited for operating an LND node due to its high storage consumption.

Currently, etcd serves a reasonable middle ground. While it doesn't excel in any specific area, it performs adequately across the board.

PostgreSQL has the lowest storage footprint of all the tested options. Its performance for receiving payments is unmatched, and although sending payments is currently quite slow, this will likely improve as more parts of the LND database migrate to native SQL schemas. With this outlook in mind, using a PostgreSQL database in an LND cluster appears to be a sensible choice.

Additionally, PostgreSQL benefits from optimizations enabled by its architecture, where it is writable only on the primary node and read-only on replicas. In contrast, etcd can be written to from any node, facilitating faster failover times since the LND leader does not need to be on the same instance as the etcd leader.

etcd	PostgreSQL	Ceph
medium storage consumption	✅ low storage consumption	❌ high storage consumption
✅ easier setup	manageable setup	❌ complicated setup
key-value pair database	✅ relational database with outlook for performance improvements	❌ replication of everything (also irrelevant parts of the database file)
✅ fast failover	slower failover	(failover times not evaluated)

If you're interested in setting up a highly available Lightning node cluster or want to dig deeper into the benchmarks, check out the GitHub repository linked above.