I wasted way too much time on this, but my second attempt works—I just needed a fifth transaction.

Arrow points from child to parent. Dotted line with ball and socket, the socket is on the side of the replacement.
You have two confirmed UTXOs C1 and C2. Let’s say 20 s/vB is the bottom of the first block.
  1. You create a large low-feerate transaction tx_LL with 100,000 vB at 1 s/vB (fee: 100,000 s). It spends the confirmed output C1 and has an output tx_LL:0.
  2. You attach a small low-feerate transaction tx_LS as a child with 100 vB at 1 s/vB (fee: 100 s) by spending tx_LL:0.
  1. You RBF tx_LS with a high-feerate transaction that spends C2 and tx_LL:0 in a new transaction tx_HS. tx_HS has 5000 vB and pays 21 s/vB, but since it spends an output from a low-feerate parent, it’s mining score is only 1.95 s/vB.
  1. You RBF tx_LL and tx_HS with tx_LM that has 100,000 vB and pays 3.05 s/vB (fee: 305,000 s) by spending the outputs C1 and C2. This is permitted, since only tx_LL is a direct conflict, so the feerate of tx_HS does not have to be beat directly.
  1. You use the new RBFr rules to replace tx_LM with a small high feerate transaction tx_RBFr with 100 vB paying 20 s/vB (fee: 2000 s) that spends C2 and makes it into the top block of the mempool. tx_LM was not going to be in the next block, and tx_RBFr pays more than 1.25× the feerate of tx_LM. So this is permitted under the new rules.
  1. You then rebroadcast tx_LL and tx_LS because C1 is no longer being spent.
  1. You immediately replace both tx_LS and tx_RBFr with tx_HS. tx_HS has a feerate of 21 s/vB which is higher than tx_RBFr (20 s/vB) and tx_LS (1 s/vB), and pays more absolute fees than both (105,000 s vs 2000 s + 100 s). But since it’s a child of tx_LL it only has a mining score of 1.95 s/vB.
Repeat 4.–7. to make every node on the network cycle the same five transactions ad nauseam. Roll the locktimes or sequences to make the transaction have a new TXIDs in each iteration, while spending the same UTXOs. The only transaction that is ever in any danger of getting mined is tx_RBFr which costs you 2000 s. If it it does get included in a block, just start over with a new confirmed UTXO as your c2'.
Is replacing tx_LS and tx_RBFr with tx_HS at step 7 really allowed? To me it's clear that it should not be allowed. tx_RBFr has an effective fee rate of 20 s/vB, while tx_HS has an effective fee rate of 1.95 s/vB. If current RBF code allows this, that is IMO obviously a bug that should be fixed regardless of RBFr.
I tried to look into how RBF currently works. This comment from the original RBF pull request seems relevant:
Don't allow the replacement to reduce the feerate of the mempool.
We usually don't want to accept replacements with lower feerates than what they replaced as that would lower the feerate of the next block. Requiring that the feerate always be increased is also an easy-to-reason about way to prevent DoS attacks via replacements.
The mining code doesn't (currently) take children into account (CPFP) so we only consider the feerates of transactions being directly replaced, not their indirect descendants. While that does mean high feerate children are ignored when deciding whether or not to replace, we do require the replacement to pay more overall fees too, mitigating most cases.
reply
Is replacing tx_LS and tx_RBFr with tx_HS at step 7 really allowed?
Personally I consider it a design flaw of RBF. The whole point of the BIP 125 Rule #2 unconfirmed inputs rule is to avoid this type of situation where an unconfirmed input causes the replacement to be less valuable. As this example shows, the rule didn't go far enough - limiting unconfirmed inputs to coming from the same replaced transaction would fix this I believe.
reply
Sure, the current RBF rules have numerous issues. Still, an improvement proposal should be based on the status-quo and take note of related work.
reply
RBF as it is currently implemented compares the feerates of transactions, not their mining scores, and the replacement does increase the feerate: tx_HS has a higher feerate than tx_RBFr. The overall fees increased in the mempool, as well as the overall feerate in the mempool. While your expectation that the mining score (or effective feerate) of all transactions should increase, that’s not how it works at this time. It turns out that calculating a transaction’s mining score is a non-trivial amount of computational work that requires the entire cluster of the transaction as context. See e.g. the work Gloria Zhao and I did here last year: Implement Mini version of BlockAssembler to calculate mining scores. Also, you will probably really like the ClusterMempool project, since it will allow us to only accept replacements that strictly improve the mempool including in the manner that you describe.
As far as I can tell, tx_HS is a valid replacement of tx_RBFr and tx_LS as it:
  1. Only includes unconfirmed inputs that were included by one of the directly conflicting transactions:
    Yes, tx_HS only uses the unconfirmed output of tx_LL that was previously spent by tx_LS.
  2. The replacement transaction pays an absolute fee of at least the sum of the replacements:
    Yes, 105,000 sats ≥ 100 sats + 2000 s.
  3. The additional fees (difference between absolute fee paid by replacement and originals) pays for the transaction’s bandwidth at or above incremental relay feerate:
    Yes, 105,000 sats – (2000 sats + 100 sats) = 102,900 sats ≥ 5000 vB × 1 s/vB.
  4. The number of original transactions does not exceed 100:   Yes.
  5. The replacement transaction’s feerate is greater than the feerate of all directly conflicting transactions:  Yes, 21 s/vB ≥ 20 s/vB ≥ 1 s/vB.
reply
RBF as it is currently implemented compares the feerates of transactions, not their mining scores, and the replacement does increase the feerate
That's not really true though. It's increasing the fee-rate of a transaction in isolation. But from the point of view of miners it is not increasing the fee-rate of the immediately mineable mempool, because you have to mine a low fee-rate transaction first to access the high fee-rate.
In my replace-by-fee-rate post, I called this kind of concept the highest minable fee-rate. The situation itself is similar to the argument as to why replace-by-fee-rate is miner incentive compatible in the first place: we'd almost always rather mine a high fee-rate transaction now, than have a higher total fee transaction that can't be mined any time soon.
As I mentioned on bitcoin-dev, IIUC Suhas identified this issue with RBF and has a draft pull-req intended to fix it. An even simpler fix could be to require that all unconfirmed inputs to a replacement come from the same transaction; currently they're allowed to come from different transactions. The whole point of the unconfirmed input rule was to avoid this type of situation in the first place; turns out we needed a stronger version of that rule.
reply
In my replace-by-fee-rate post, I called this kind of concept the highest minable fee-rate.
Yes, and the people that have been working on this topic in the past few years speak of "mining score" and "feerate diagram comparison" in this context.
It's increasing the fee-rate of a transaction in isolation. But from the point of view of miners it is not increasing the fee-rate of the immediately mineable mempool, because you have to mine a low fee-rate transaction first to access the high fee-rate.
While that might have been the intention, it is not how RBF is implemented today. I would expect you to know this, given your involvement in the original work and your recent advocacy for mempoolfullrbf. If you have not read them yet, you may find Suhas’s overview of Cluster Mempool and Gloria’s numerous write-ups for RBF, package relay, v3 transactions, etc. interesting.
reply
Yes, and the people that have been working on this topic in the past few years speak of "mining score" and "feerate diagram comparison" in this context.
I know. If you'd gone to the link I provided you'd see that I used a different term because for use in replace-by-fee-rate, I proposed an simpler algorithm that is not identical to mining score.
While that might have been the intention, it is not how RBF is implemented today. I would expect you to know this
I'm aware. I was responding your confusing way of explaining this stating that "the replacement does increase the feerate" by explaining how from a miner's perspective the feerate has not been increased. Your reply was glossing over the fact that RBF as implemented right now is clearly broken in the circumstance that your replacement cycle exploits, as @shafemtol pointed out.
reply
Yes, it doesn’t improve the best mining score among the transactions in the example. It does increase the overall fees per vsize across the set of transactions we are examining. It depends on the size of the mempool and the overall position in the mempool of the examined set of transactions to determine which of those two are more attractive to miners. If there are few transactions available for mining, increasing the total fees may be preferable over a higher feerate transaction with a lower total fee.
What you are describing in the context of "highest minable fee-rate" is equivalent to the concept of mining score. If the (ancestorless) parent has a higher feerate, the parent’s mining score matches its own feerate as does the child’s mining score match the child’s feerate. If the child has a higher feerate, they share a mining score that falls between the parent’s feerate and the child’s feerate.
reply
Note to the reader: this exploit doesn't actually work due to RBF Rule #6.
tx_HS is considered to be a direct conflict, and its raw fee-rate does have to be beat directly. While ts_HS does spend an unconfirmed output, it appears that the fee-rate PaysMoreThanConflicts uses to calculate if ts_HS can be beaten is ts_HS's raw fee-rate. So looks like your understanding was incorrect on these two points.
reply
Note: I had an unintended direct conflict in step 4. tx_HS needs to be evicted indirectly as a child of tx_LL. As I described it above, tx_LM would be a direct conflict with tx_HS and therefore would need to beat the individual feerate of tx_HS. It can be easily fixed by introducing a third confirmed input c3 that is spent in addition by tx_RBFr and by tx_LM instead of c2.
I‘ve updated the charts here: https://bitcoin.stackexchange.com/a/121542/5406
reply