Why Not?
SECURITYSECURITY
- Claw got cute and wrote a python script
- I didn't review said python script (it also didn't ask me to review it)
- Unreviewed scripts, especially when some bot decided to code it, are a liability
- I run this bot in a distroless docker image + chroot - and no, python is not installed, of course
- The bot's first suggestion was for me to give it python
- Nice try, clanker. REQUEST DENIED
Did you install your claw with access to python or nodejs or perl, fren? Have you reviewed the scripts it built?
Are you in the know? Or are you just yolo-ing away your life so that even your morning mumblings to your bot can get forever stored in some Palantir database?
First take-away from my botched experiment with it was to just DIY something purpose built
Stumbled across this yesterday and it does a good job articulating why
So I have a whole bespoke coding system that is fit for purpose and nothing more (and doesn't have any claw stuff). All this was just caused by me saying "yo, do self-improvement" by (a) reading all traces (which claude, in my real workflow, bespoke developed) and (b) either tune your prompt files or put shit on a wishlist to inform me of daily.
After some investigation I found that it developed the python script because it only has
read_file(), notcat, so it can only read one trace at a time and it thought that it would be more efficient to just analyze multiple files at a time. which is already cute thinking that should be guarded against, because it deteriorates the goal. But I assign that one to me being too lazy to write a 200 line instruction; i didn't write enough "IMPORTANT: NEVER WRITE CODE" in there. It did create a wishlist, and it did do some prompt tuning.FWIW, the point in the video about token overhead is real. All I do is let it analyze traces and the only thing I used was some little "tell me the weather" skill I built for it, to have something to trace. Yet today alone it used some 6.5M tokens. That'd be over $20 on Sonnet, what everyone recommends using, 3/4 in (because UTC). 20 bucks a day for nothing is awful.
However, the self-improvement process where I HitL between it and coding tasks on my main pipeline and the deployment does work well. So that part is great and I think it only speaks more for what you're saying, and the vid too: just build your own agent. Don't even try to reuse any of the sloppy products out there. From scratch, design event/comms channels, an event loop (I've literally started out with running
while true; do ... sleep 5; doneon my main pipeline) and a way to do prompt templating. And improve it.I somehow spent $6 on gemini flash just fucking with it for a few hours that day... madness
I was thinking... if I were to design an agent (I may, for me) I'd probably design an agent-making agent.
So you teach the top level thing to pick the right agent for the task at hand, and you start with a single agent that can build agents. Like skills but execute the skill(set) straight into the system prompt without the overhead of souls and identities and what not.
Instead of all the bloat, expose a local knowledge base that every agent can get access to (with RBAC if you plan on doing nasty private thoughts shit) and do proper process isolation. And if staying with go (is actually fun), I may have finally found a great use for
go-plugin. Never thought I'd get to say that, lol. An agent is then an isolated plugin (easier to secure than a dll, and let's not speak to a .md file in userspace) and you just expose an interface over gRPC, and done-ish.🤔 imma burn so many tokens and be obsolete before I'm even done... lol
I was wondering the same thing yesterday: what is the minimal, "boostrappable," agent I can use to build a bespoke swarm.
Then I came to the same conclusion: this is all bound to be obsolete in eight weeks. It's still, likely, a worthwhile study of how these systems operate before LLMs internalize it all.
Also, I overheard someone in the lab (I'd tag them but they didn't know I was involuntarily eavesdropping) describe achieving their desired bespokeness wrapping Codex, adding a few more tool schemas, auxiliary memory, and some multi-agent collaboration primitives. And TIL Codex is open source.
We really just need removal of the file system, save for maybe
tmpfsif you need to stream in large amounts of remote bytes. Remove the multitool, build the real tool. Yes, a swiss army knife has a saw. Now go cut a tree with it.I think we have the tools. Most importantly, structured output: the most under appreciated precision scalpel we have since forever. Using that instead of text output is what allows us to move,
from
func execute(prompt string) (err, string) {}to
func execute[T interface{}](prompt string) (err, T) {}That means
MEMORY.mdwill become the faint memory (hah) of an old nightmare it is supposed to have been since 1984. After all, I can confidently state that if the "heart and soul" of your system is markdown, it means you have no system.Wrapping is the key word though. It's what I did with Claude Code and what I intended to do with a claw base. Because I really don't want to use broken 3rd party tooling that I have to fix. Unfortunately the design principles of the claw softwares - all of them I've looked at thus far as they've just copied each other - are awful and this makes it bottomline unusable, because of the amount of tokens that I have to burn on fixing bugs and debt [1]. 14k github stars in a week... lmao. nope, you're not getting my star of approval. I have jotted a note (somewhere, lol) to look at
CoPaw(#1446238) - maybe that is more structured.With Claude Code, Anthropic has become a victim of their own success though... I'm 99.99% sure that they didn't expect having to block software (openclaw) from their subscription tier. I suspect that they're not capable of making enough money off their model-as-a-service offering if the majority of the usage budget is constrained by running a toy (and they're thus not selling API credz.) With that comes disruption for me. I don't want to move away from Opus 4.6 at the moment though, simply because the results are good and the friction is just in the integration with their inference service. If I have to though, I can without too much trouble. My entire agentic loop that interfaces with Claude is hand crafted, I own it 100%, and I would only need to replace that single component - maybe re-tune some prompt templates.
on picoclaw alone I found a week worth of Claude credits of bugs and logic errors that were negatively affecting outcomes - prompt template errors, context poisoning as a feature, tooling logic issues, retry/error handling poverty, missing feedback in agent spawning, no permission management, no good logging, default skill installs that are extremely poor and incompatible with the proposed framework.. to just name yesterday, lol. The yolo factor is awful. ↩
Thank youuuu