pull down to refresh

Self-generated skills don't do much for AI agents, study finds, but human-curated skills do

Teach an AI agent how to fish for information and it can feed itself with data. Tell an AI agent to figure things out on its own and it may make things worse.

AI agents are machine learning models (e.g. Claude Opus 4.6) that have access to other software through a CLI harness (e.g. Claude Code) and operate in an iterative loop. These agents can be instructed to handle various tasks, some of which may not be covered in their training data.

When lacking the appropriate training, software agents can be given access to new "skills," which are essentially added reference material to impart domain-specific capabilities. "Skills" in this context refer to instructions, metadata, and other resources like scripts and templates that agents load to obtain procedural knowledge.

For example, an AI agent could be instructed how to process PDFs with a skill that consists of markdown text, code, libraries, and reference material about APIs. While the agent might have some idea how to do this from its training data, it should perform better with more specific guidance.

Yet according to a recent study, SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks, asking an agent to develop that skill on its own will end in disappointment. The "intelligence" part of artificial intelligence is somewhat overstated.

At least that's the case with large language models (LLMs) at inference time – when the trained model is being used as opposed to during the training process.

...read more at theregister.com
asking an agent to develop that skill on its own will end in disappointment.

What I do is:

  1. Bot builds skill
  2. I review it
  3. Install it
  4. Use it interactively a couple of times
  5. Shit goes very wrong
  6. Suggest improvement
  7. Cleanup
  8. Review
  9. Spank the bot, more cleanup
  10. Install the new version
  11. Goto 4.

Continuous improvement, and this is why you want to have bespoke tools.

reply