pull down to refresh

I was watching this which led to watching this, and I'm interested in alternative testing.

In practice, most software systems have state spaces so large they’re effectively infinite. But if autonomous testing generates end-to-end tests that exercise the whole system to explore the state space, a lot of interesting behaviors can be discovered in a reasonable span of time.

This also highlights another difference between autonomous testing and LLM-driven testing frameworks. Current LLM-driven testing frameworks tend to focus on generating example-based tests to ensure that a particular piece of code functions as intended, whereas autonomous testing generates tests to see if the system ever doesn’t work.

I read through their docs a bit and it seems they run autonomous testing based on hints you leave in your code in the form of properties, aka assertions. I gave up early when trying to find information about how their autonomous testing product works under the hood, but I'd guess: an agentic model with very good orchestration and crazy environmental controls.

In practice, most software systems have state spaces so large they’re effectively infinite.

They're naturally bounded by a cartesian product, , and then (hopefully) you have constraints to lower it. To the "effectively" part, yeah, it's not fun to write down a state transition table for even a small webapp. It's not even fun to let an LLM generate it and read it.

What I like about their deterministic hypervisor is that this is less brute force than fuzzing. But it still sounds expensive (in terms of time / compute), so I wonder: is this a gate (like a CI process) or a separate continuous process (like fuzzing.)

If the former, cool, but at what cost, especially when you need an LLM due to the volume[1]? If the latter, you'll still want to test for the most important regressions in your CI (and are still writing unit and integration tests.)

  1. Note #1492941, so I'd pose that if it doesn't run on Gemma4/Qwen3.6 or the next couple iterations, it doesn't run.

reply
(hopefully) you have constraints to lower it

In the Jane Street interview (which I think was fantastic), they describe this as the other half of their product - narrowing the search space intelligently, finding the right probability distribution of input/environment sequences to break a piece of software, using genetic algorithms to evolve pathological input.

is this a gate (like a CI process) or a separate continuous process (like fuzzing.)

They're designing for more of a CI process based on my 2am research. Early versions would turnaround results in a day, but whether they are able to generalize their customer base will depend on whether they can shorten it.

it still sounds expensive (in terms of time / compute)

And whether they can get costs under control.

you'll still want to test for the most important regressions in your CI (and are still writing unit and integration tests.)

The CEO states that even when testing their own work, there are some circumstances where a normal test suffices and deterministic/autonomous testing is overkill.

reply

you should participate in my bitcoin math puzzles, it's all about state space and state transitions right now

reply

I'm too slow to participate - am reading though.

reply