pull down to refresh

found quite a few novel bugs/behaviors

Let's see how much of my preliminary list gets solved ❤️


What I find interesting is that it takes the expensive 1M context while all jobs are < 200k tokens (and the patch is also <1M tokens, it's 700kB.) Was that something you selected, or did it do that by itself?

It chose the context window by itself. I think that's the default of workflows: xhigh with 1M. I haven't tried to change the workflow params yet and I'm not sure if it'll listen.

reply
122 sats \ 1 reply \ @optimism 5 Jun

Interesting. I've multi-job'd the same diff all within 200k (opus high - the only difference between high and xhigh is the context window, iirc)

It's good that it is not offloading to Sonnet though.

reply

@k00b I was wrong about high vs xhigh for 4.8. The "context-window only" thing was a 4.6 decision that once more has changed. The reason why I am seeing massive regressions is because what used to be high, is now max.

:-/

reply
Let's see how much of my preliminary list gets solved

According to bot analysis of last night's patch:

  • 4 concerns partially addressed
  • 4 concerns fully resolved
  • 36 unresolved, of which 4 widened
  • (4 resolved in the last run)

I don't like the recurrence of a 4 count, so it's probably bullshitting.


PS: Claude Code seems to auto trigger dynamic workflows when I omit subtask decomposition specs for large requests.

Stats are fun:

  42% of your usage came from subagent-heavy sessions
   Each subagent runs its own requests. Be deliberate about spawning them — and
   consider configuring a cheaper model for simpler subagents.

  36% of your usage came from subagents under "forgejo"
   If this runs frequently, consider configuring its subagents with a cheaper
   model or tightening their prompts.

  61% of your usage came from /forgejo
   Heavy skills can be scoped down or run with a cheaper model via skill
   frontmatter.

  Skills                  % of usage
  /forgejo                       61%

  Subagents               % of usage
  forgejo                        36%

No, Anthropic, I am not going to use Sonnet. I know you wanna save me credz but I actually read all that, #noyolo

reply

I have 43 after my not-pushed msats/sats and description truncation work. Of the 43, 3 are high and about key rotation, 9 medium (some out of scope), and a long tail of low.

reply
89 sats \ 1 reply \ @optimism 7 Jun

Bots told me there were 4 high severity but after manual validation yesterday I only have maybe-one left that I have not fully repro'd yet, the rest of what was flagged high is at best low.

The maybe-high one is sitting in createBolt11FromWalletProtocols and I have a couple that could be worth fixing, but repro is slow af and I don't trust the bots for one second. They also keep disagreeing with themselves (including Claude and GPT disagreeing with their own prior analyses - I fuzz who wrote what to take out any bias)

reply
createBolt11FromWalletProtocols

I've had this flagged twice for different reasons and so far it stems from making assumptions about UX that are wrong.

reply