pull down to refresh

First some details about the study:

To understand what was going on, we conducted a study of 1,488 full-time U.S.-based workers (48% male vs 51% female; 58% independent contributors vs 41% leaders) at large companies across industries, roles, and levels. We asked them about patterns and quantity of AI use, work experiences, and cognition and emotions.

And about their findings:

the most mentally taxing form of AI engagement was oversight, or the extent to which the AI tools required the worker’s direct monitoring.
As employees go from using one AI tool to two simultaneously, they experience a significant increase in productivity. As they incorporate a third tool, productivity again increases, but at a lower rate. After three tools, though, productivity scores dipped.

we found consistent predictive relationships between AI brain fry and self-reports of both major and minor errors at work. We defined minor errors as “small errors that are easy to catch or correct, such as coding or formatting errors” and major errors as “errors with more serious consequences, such as those that could affect safety, outcomes, or important decisions.” Among participants using AI at work, those experiencing brain fry reported making mistakes significantly more often— scoring 11% and 39% higher on the minor and major error frequency measures, respectively—than those who did not.
Contrary to the promise of having more time to focus on meaningful work, juggling and multitasking can become the definitive features of working with AI.

It sounds like AI tools are requiring people who were not managers to be suddenly put into the position of managers who have a potentially very high number of reports. It's not news that people who have skills in one realm are not necessarily going to be great managers of teams working on that realm (or any other).

For those of you who use many agents, how do you feel working with a team of agents is different from managing a team of people?

217 sats \ 4 replies \ @fourrules 4h

I have 5-10 Claude sessions open at any moment, and another 5 Codex sessions, and then Gemini and Codex conduct code reviews automatically on each PR.

I have it structured so that there are a couple of seasons that work high level, then each sprint gets an orchestrator, and each task in a sprint gets an implementation session.

I find my biggest problem is working on something complex, then waiting for it to finish, moving to another sprint to move it along and losing my focus.

I have ADHD and moving between different unrelated complex tasks is difficult, easy to forget what the sprint was about entirely.

I have multiple staging environments and protocols for testing and deployment, different sessions can test autonomously, and they can use Chrome and SSH, so they can test autonomously.

Right now I'm struggling with regression testing and making sure I'm implementation sessions don't duplicate code we already have. I'm not a developer so it's not natural for me to go snooping around the code looking for architectural issues.

I think these problems are solvable with some reasonable UI principles, e.g. recognition over recall. Although I'm working in the terminal, I use descriptive branch names that match dedicated worktrees.

If I could configure my setup so that Claude Code in the terminal would get labelled as "Ready" or "Working" and labelled using status definitions that I can set then I'd be able to keep track of the state and the progress of agents with less cognitive overhead on he operational side and more attention to the details of the tasks.

reply
178 sats \ 3 replies \ @optimism 4h
I'm not a developer so it's not natural for me to go snooping around the code looking for architectural issues.

Fix this first thing. So for every implementation PR just trigger Gemini to compare against design specs and at a threshold of more than a couple deviations, adjust task design to highlight what not to do, and throw the original. Always cheaper to catch things before merge.

Although I'm working in the terminal, I use descriptive branch names that match dedicated worktrees.

I use feature branch trees too, but even though I HitL (bots don't get to have merge rights) this works pretty well. I do review everything tho. If some bot gives me a massive PR I just reject it and tell it to not be a sloppy biatch.

reply
217 sats \ 2 replies \ @fourrules 4h
Fix this first thing

I don't have design specs. I just keep sprints as tight as possible and constantly smoke test my way towards what I am trying to build. I don't know what design specs would look like to be honest, as in how extensive and detailed, whether diagrammatic or markdown.

I did try to be spec driven but after a few sprints it became apparent that the specs generated were directionally correct but woefully inadequate, and that I just had to feel my way forward. It would be nice to start with a lot of boiler plate that covered every possible design pattern, but I'm using Wordpress and don't want to spend time figuring out how to do it right rather than just getting it done as proof of concept. I can worry about perfect architecture when it's working end to end with real traction.

reply
138 sats \ 1 reply \ @optimism 4h
I don't have design specs.

Then how do you know if something is architecturally right or wrong?

I don't know what design specs would look like to be honest, as in how extensive and detailed, whether diagrammatic or markdown.

I most often just let it plan to implementation, so it's like:

  1. I want something, I write down a bunch of acceptance criteria
  2. I let a bot identify gaps and confusing things in my criteria, I clarify
  3. When I'm happy with that spec (can literally just be a 2-level markdown list) I let a bot plan the design at high level, break up into stages
  4. For each stage, detailed design -> stage design doc
  5. When I'm happy with that, "execute the plan"
  6. On PR, dispatch a context-free bot for each of:
    • compliance with acceptance
    • Analyzing CI - which includes lint, dep tracking, unit tests + coverage, integration tests (if possible at that point) and security audits
    • ----- synchronize, make decision to keep or throw ----
    • robustness review
    • completeness review
    • analyze all newly written tests for omissions
    • integration issues
    • logic errors
    • ---- synchronize, at this point there are on average 50-80 findings ----
    • Prioritize, build new task list, goto 1.
woefully inadequate

k00b reminded me to not be impatient on spec phase. I spend about 50% of my time in spec, 40% in review, 10% having a beer

I can worry about perfect architecture when it's working end to end with real traction.

I get that btw. I don't understand why you'd need so many agents for a PoC though. Just throw it away.

reply
115 sats \ 0 replies \ @fourrules 3h

It's a very complex proof of concept. I try to break everything down to the narrowest detail so that it can do is tasks without compaction.

reply

What does it mean to use 2, 3, 4, 5 AI tools "simultaneously"?

I use a variety of AI powered things for various tasks. I don't know what counts as simultaneous though.

reply

They did not define it. I often have a gemini tab open as well as a chatgpt tab and sometimes a claude tab. I mostly use gemini for image generation (I like it a lot better than chat, but also sometimes for quick answers to something). I use chatgpt for longer research tasks, and sometimes when it has produced an output, I don't want to get distracted with a sidequest, so I'll use claude or gemini to contain the sidequest. I assume this counts as simultaneous in a very naive way.

But I imagine most of use are using AI tools simultaneously in a variety of ways.

reply
117 sats \ 1 reply \ @optimism 5h
I don't want to get distracted with a sidequest, so I'll use claude or gemini to contain the sidequest

This sounds very interesting. Mostly because I wonder, how do these contain it?

reply

It is sadly not that interesting: the actual air client doesn't contain it at all other than by virtue of being different.

It resides in a separate tab and I am able to be highly confident that none of the context will leak to the other ai tool I'm using.

I think this is really more about not wanting to disturb the context of the tool I'm using to do longer trajectory research.

reply
don't want to get distracted with a sidequest, so I'll use claude or gemini to contain the sidequest

A tool I wish the chatbots had was letting you highlight a portion of text, opening a new popup, and letting you ask specifically about that piece of text, without adding context to the main chat.

Sort of like a little, explain this to me on the side thing.

reply
117 sats \ 0 replies \ @optimism 5h

I think of it like this: you're the teamlead. 1 AI tool = 1 team. You can manage 1, if you have basic skills, 2 if you are very experienced and maybe, if you're super skilled and tireless, 3. I haven't met anyone in my life that could manage 4 teams. At that point you need another layer.

So the art is in either letting your Agent fleet sitting idle, or having an orchestrating AI on top, so that you're back to only 1 simultaneous tool. I personally do this through algorithmic orchestration rather than adding another layer of fuzzy logic, but there are definite cases I can think of where another layer might help, like picking quick wins from a list and turning those into assignments.

reply