pull down to refresh

My relationship with testing is complicated. I used to write a lot of them.

My main gripe with testing is that attention is near zero sum. Tests are great when they prevent regressions, freeing up attention to focus on the task at hand, or when failure means customers losing money or privacy. Tests are bad imho when devs use tests as the primary source of validation, correctness, and begin designing code for tests rather than customers, themselves, and other developers. It's all contextual though. As I said, it's complicated. (And I'm complicated.)

Grappling with this, realizing I should begin writing tests again, I had the probably dumb idea of writing tests in markdown, more explicitly spec-like, now that we have LLMs to produce the boilerplate and tests themselves.

I imagine writing tests like this:

---
suite: auth
target: web
adapter: playwright
mode: strict
base_url: http://localhost:3000
tags: [smoke, login]
timeout: 15000
context:
  pages: ["app/login", "app/dashboard"]
compiler:
  model: gpt-5.4
  temperature: 0
---

# Authentication

## Background
- Seed a user `ada@example.com` with password secret(`auth.password`)
- Start from a clean browser session

## Scenario: successful login
Given I am on `/login`
When I fill "Email" with "ada@example.com"
And I fill "Password" with secret(`auth.password`)
And I click "Sign in"
Then the URL should be `/dashboard`
And I should see a heading "Welcome, Ada"

```bindings
label("Email") => [data-testid="email-input"]
label("Password") => [data-testid="password-input"]
text("Sign in") => [data-testid="sign-in-button"]

We probably wouldn't want to actually JIT tests, but the source of truth for functionality would be these natural language documents that, through some kind of pipeline, are progressively concretized into real tests. This would alleviate most of my gripes with tests and I can't complain about specs.

For anyone more familiar with testing, how dumb is this idea?

I don't know why the future wouldn't just look like this:

You are an expert UI/UX engineer and pentester. Go to stacker.news test the login process. For every test you run, log your inputs and expected vs. actual outputs. Identify edge cases and vulnerabilities. Adopt the roles of new user, returning user, and adversary. If you find bugs and vulnerabilities, reason from the codebase on what may be causing it, then create a github issue to address that bug/vulnerability.
reply
139 sats \ 10 replies \ @optimism 7h
why the future wouldn't just look like this

That's what it looks like now on the bleeding edge. The problem with it is that because you didn't spec what you want, but just put some generic slop, the test will be generic. And since no one specs what they want, everything will be generic. And generic is boring; it will cost you customers.

I was clicking around on the link I shared in my other comment a bit to make sure I was sharing the right tool, and I found this gem:

Remember pip install numpy does not mean you will get the latest version;
it means you are OK with getting whatever version you get 😉

So if you wanna just let the AI decide, then it means you don't care what you get. Over time (and at 1000x dev speed that means in about 3... 2... 1... now) the only thing remaining will be slop. There will be nothing because you didn't spec anything. You're okay with whatever.

reply

By the way, I got my clawbot set up on a VPS.

It's... cool, I guess. At least now I understand what the thing is , and what it's doing under the hood. Can't think of a major use case for it right now though, that claude code and claude chat couldn't solve.

Kinda cool that I can message it on Discord and it'll operate on files in a machine though.

reply
24 sats \ 8 replies \ @optimism 7h

You know what it's doing under the hood? That makes you the only person in the known universe to have that knowledge, lol.

Kinda cool that I can message it on Discord and it'll operate on files in a machine though.

We may differ in opinion whether discord is cool, but yes. Since Wednesday I was facing a dilemma of being on a Claude budget overrun so I was tempted to instead just give instructions to the claw bot. But then I was like naw - too much work - and just took the bitter pill of paying some extra cash into Anthropic. I heard they can use it, lmao.

reply

Haha "under the hood" from my previous perspective of zero knowledge.

I now understand that it is a process that runs a gateway service exposed on a port. That port can receive messages, which it routs to a LLM using an API key you set up. It doesn't route the message directly to the LLM though. It probably does a couple of things first, like get the LLM to read the prompt to figure out what files it needs to read for more context and what commands it might need to run on the machine. So openclaw is the software that orchestrates between these calls to the LLM and what's happening on the machine.

It does other stuff in the background too. It runs a regular heartbeat check which is a call to a LLM using whatever is in your HEARTBEAT.md. It's got other markdown files (IDENTITY.md, SOUL.md, etc.) that sets up its personality. It keeps some working memory of your sessions so that past interactions persist. It's got a bunch of config files that let you set permissions, access tokens, and other access/restrictions.

It's equipped with instructions for how to operate certain APIs, like a Discord API. You have to set it up with your Discord bot token and manage some of the connections.

I'm not sure if there are other projects out there like this, but probably one reason it got popular is that it has a pretty comprehensive setup and onboarding wizard. So getting started is easy. I did encounter a few bugs along the way, but nothing that a little googling/LLMing couldn't figure out.

reply
16 sats \ 6 replies \ @optimism 7h
It probably does a couple of things first

It merges your first message into a massive superprompt with system, tooling, memory, personality, user preferences and then appends to it. They say it's somewhere in here haha.

it has a pretty comprehensive setup and onboarding wizard.

This is what I think they did pretty well. User-first attitude is a winner.

reply

And the superprompt does seem to work pretty well to get the bot to be self-actualizing.

For example, if I ask regular Claude to tell me how to edit the openclaw config files to do X, it won't do as good of a job as if I ask the clawbot to edit its own files to do X.

That self-sufficiency aspect of knowing how to operate itself is pretty interesting. It's like the brain is actually connected to the hands and feet, so to speak.

139 sats \ 0 replies \ @optimism 8h

Maybe you'll like this: https://robotframework.org/ - I've used that in large product orgs in the past and they've been keeping up.

reply
16 sats \ 0 replies \ @satring 4h -50 sats

Honestly I don't write tests anymore. Offload all tests writing and updating to AI, it's great at it! Frees you up to focus on more interesting work.