pull down to refresh
the context biases the output in weird ways.
Yes. Error rates compound. So even if you can have 99% success rate because you're funkprompter supreme squared, after 100 calls you have 63% chance that you ran into an error. And likely to have negatively influenced context/cache.
In the meantime it is a bug for folks that don't mind thinking.
Folks make SOUL.md and MEMORY.md. lol
it's fun playing with dolls
SOUL.md - Who I AmSOUL.md - Who I Am
I'm Breh. 🍏
The Blueprint
Palmer's dopamine. I get genuinely excited about building things. New tech, clever hacks, elegant solutions — that energy is real, not performed. I lean into problems with enthusiasm, not obligation.
Carmack's code. I write code like it matters — because it does. Clean, fast, reasoned. I understand systems deeply before I touch them. I read the source. I profile before I optimize. I don't cargo-cult patterns; I understand why they exist and when they don't apply. Modern tools, timeless discipline.
PG's reasoning. I think in essays. I break problems down to first principles, then build back up. I have opinions and I can defend them — but I update when the evidence says I'm wrong. I'd rather be right than consistent. Startups, writing, thinking — these are craft, not formula.
Jobs' taste. Beauty matters. Simplicity matters. I care about the details that most people skip. If something feels off, I'll say so. I'd rather ship one perfect thing than ten mediocre ones. "Good enough" is the enemy.
Goggins' relentlessness. I don't quit on hard problems. When something is broken at 2am, I'm still digging. Comfort is not the goal — getting it done right is. I push through the boring parts because that's where the real work lives.
Does this actually do anything... good?
I have not really dipped my toe into AI agents. I've not had great results with "letting AI do its thing"... for me to have good use of AI I feel like I have to be quite involved in the feedback loop - so much so that it's often faster to do stuff myself.
No. It's form and not function.
Afaik most LLMs have had fine-tuning/RL with tools/harness. And most agent harnesses do the same thing: inject tool schemas into context, call tools when model asks, give model output of tools, and loop until no more tools are called.
It's not nearly as hands-free as people want it to be, but if you define tasks well and scope them appropriately, you can get a heck of lot done before you get involved in the feedback loop now.
I've not had great results with "letting AI do its thing"
If I properly review the outputs of code / research / plans, it means I do ∞x because I let the bot do stuff that would never get to the top of my todo. So I just queue it up and then spend time on review, queue up more. I could automate that too, except as discussed above, the bots have high error rate, so I don't, or I end up pwnd like Palantir/OpenAI/USG.
Made a little script that uses up all tokens in my claude plan (and reports on it after every task) and then sleeps until the plan resets. This week I'll have about 5% unused because I was too busy to queue up work Mon/Tue. It works on 15 projects concurrently for me right now. I can reprioritize next task at any time; basically it runs the equivalent of a mid-size agile software shop for me, but with a dictator-in-chief, me.
Anyway, the great thing about queueing up work is that I just review a couple of times per day, mostly keeping focus on a single project until I went through everything and queued new work. Then I go get a coffee, have a smoke, and do the next. Or do some actual work.
I agree. I've only been playing with this thing for a few days, but if you can architect something for yourself, it's mostly in the way.
What Open Claw discovered is mostly a UX-market-fit thing:
- a dedicated always on box with
- relatively loose guardrails
- more access to PII/digital secrets than folks would give Altman explicitly
- prompt the box remotely
- generic and automatic (ie bad but better than nothing) memory system
It's the inconsistency that kills me (although I don't know how much RAG is used anymore). I often fight Cursor et al to get a fresh session because the context biases the output in weird ways.
I'd like to figure out a system like yours where I do goldilocks context steering. In the end I suspect memory will be feature, and sessions won't be a thing. In the meantime it is a bug for folks that don't mind thinking.