In the last post, we learnt how to extract basic statistics from the lightning network graph in a few lines of python.
We can now begin our journey to build a tool that will advise us which channels would be usefull to open. Let's be clear, you will not have a fully funcitonnal channel advisor after reading this. This is more an incremental exploratory analysis to learn about the network

First node selection strategy

For this exercise, let's say we want our lightning node to be a routing node. And that we want to find which channel would create good 'shortcuts' that will create new / cheaper paths.
In this article, we will first try to look if there are some groups of nodes that are not connected yet to the main part of the network.
There are already 50000+ channels on mainnet, but the network is still relatively new, and there is no central organization that sees all the traffic to optimally allocate satoshis in channels. So there might be some opportunity to find nodes that are not yet connected at all. Before trying to create shortcut channels that are 'cheaper' than the current existing paths, if you could create paths that simply don't exist between nodes, then all the traffic between those 'islands' of nodes would go through your channels!

Looking for the islands

We first build a basic script that will loop on all the channels, forming groups of all the nodes that can already reach each other
We start by reusing the code to read graph.json
import json import sys f = open('c:/graph.json', encoding='utf-8') file_content = f.read() graph = json.loads(file_content) nodes = graph['nodes'] edges = graph['edges']
Islands will be saved as a list of the the nodes pubkey they contain Island are stored in a dictionary with a unique name for each island (we choose to use the name of the first node in the island as island name here for simplicity) For each node, we also will store the island they belong to in another dictionary
#Islands are lists of lists of nodes pubkeys islands = dict() #For each node, we mark what island it is on node_island = dict()
This method will merge the two islands if we find a channel between nodes of those two islands
def merge_islands(island1, island2):    global islands    #mark nodes from 2nd island as inside the first one    for node in islands[island2]:       node_island[node] = island1       islands[island1].append(node)    del islands[island2]
This is the main loop on all channels, that populates the islands data
for edge in edges:     n1 = edge['node1_pub']     n2 = edge['node2_pub']     # Find/register node's island before this edge     n1_island = node_island.get(n1, None)     n2_island = node_island.get(n2, None)     if n1_island == n2_island and n1_island is not None:        continue     # create island for both     elif n1_island is None and n2_island is None:         island_id = n1 #use first node id of the island as island id         islands[island_id] = []         islands[island_id].append(n1)         islands[island_id].append(n2)         node_island[n1] = island_id         node_island[n2] = island_id     elif n1_island is None:        islands[n2_island].append(n1)        node_island[n1] = n2_island     elif n2_island is None:        islands[n1_island].append(n2)        node_island[n2] = n1_island     elif n1_island is not None and n2_island is not None:        merge_islands(n1_island, n2_island)
Finally, we print the result. We will list the islands that contain at least 10 nodes to avoid having too many results (every new node that has not yet open any channel is an island!)
for island in islands:    island_size = len(islands[island])    if island_size <10:      continue    print(island+": "+str(island_size)+" nodes ")
The first result is as follows:
0298906458987af756e2a43b208c03499c4d2bde630d4868dda0ea6a184f87c62a: 152 nodes 022e87e1ed372a69df284aa97f720ca260d5d748599552481e7369cbf2dbebc4ce: 15 nodes 020b3fcaca400d03dfbc1f92b9ce081fc84d367c3b9c3c85695670ea52ebab6798: 16109 nodes 029fd1c2f1a309ddb0d2258d58a1e110059b6a0e5a2330104f39a5d020b472178f: 26 nodes
Ok.. Here, we can see that there are actually not really much islands that are not yet connected to the main part of the network. In a sense, that is logical. Every new node will at some point want to connect to nodes it wants to use the services of. Or will connect to a big known node to reach multiple other nodes more easily. As the network grows, every ends up connecting the the main graph group, very few nodes would use the lightning network, but only connect between a separate group of peers.
Maybe that island of 152 nodes connected together could be the exception of an intersting group of peers to connect to the main net? Let's add the calculation of the capacity for each island to the previous script, to see how many BTC are on the channels of tha island. We can do this by also taking into account the 'capacity' attribute of each edge and adding it to the sum of the capacities of the edge sof each island (i will let you write the modifications, they are simple):
0298906458987af756e2a43b208c03499c4d2bde630d4868dda0ea6a184f87c62a: 152 nodes, 0.09092302999999985Btc 022e87e1ed372a69df284aa97f720ca260d5d748599552481e7369cbf2dbebc4ce: 15 nodes, 0.02176696Btc 020b3fcaca400d03dfbc1f92b9ce081fc84d367c3b9c3c85695670ea52ebab6798: 16109 nodes, 4951.325717620041Btc 029fd1c2f1a309ddb0d2258d58a1e110059b6a0e5a2330104f39a5d020b472178f: 26 nodes, 0.005000000000000001Btc
Well, it does not seem so. Those 152 nodes together only make for 0.09BTC, sot that makes at best an average channel size of 60k (at least 151 channels are neded to connect 152 nodes). It's probably someone who needed the LN for a specific application. And, as they reached a size of 150+ interconnected nodes without needing to connect to the rest of the network, they probably also do NOT need to be connected to it.
This confirms that there is not really any island of nodes do not have at least one channel to reach the nodes they need to reach.

So what now?

One obvious caveat of the previous results is that all channels are considered as being equal. Even considering only the capacity , and not even the current capacity to route in any direction, not all paths can route all sizes of payments. There might be islands of 'big' channels connected by only a few smaller channels. And if you are the one creating that bigger channel that links them, you would be the one routing ALL the big payments between them?
You can easily adjust the previous code to only consider paths of a certain size to build the list of islands. For this, in the loop on the edges, simply ignore the ones below the tested minimum capacity to route!
Try it, and run it for various minimum channel sizes between 1M and 10M sats... ...
Did it work? Ok, the result is still deceiving. Basically, for any minimum channel size >= 2M sats, there is already a single island.
This actually also makes sens when you think about it. If you connect to a 'service', it is already part of the 'main network' to be able to be used. And you will not connect with channels bigger than the service provider has, or it would be wasted money. So every new channel whose size is above average will be connecxted to channels of the main part of the graph, and to nodes that have channels that are at least as big.

Can we actually do something?

Personally, i stopped here on this track to have meaningfull way of selecting peers to connect to. But who knows, maybe you could find your 'secret sauce' of a criterium to find unconnected islands?
Some random ideas:
  • Looking only on TOR nodes, or only on clearnet nodes?
  • Considering the max_htlx_msat params of the channels to try to account for the actual liquidity on each side?
  • Considering the channel fees, to try to guess if some channels only flow in one direction?
  • Islands that don't have at least N bridge channels with each other?
  • something else?

What's next?

In the next article, i will talk about my next direction i took for a lightning channel advisor, what questions it raised, and what else was learnt from the results.
I like LN operators with programming skills :), keep this post, there is value here.
reply