pull down to refresh

Okay so I checked that. I gave both claude and qwen3.6 the same task concurrently.

Claude Code on one hand, OpenCode on the other.

I took a really simple one: extract some templates out of an issue comment and
make it into a yaml template. (I already have the templates but I thought this
would be a good simple test to start with.)

Same prompt. Same input. Different outcome though:

PromptPrompt

/forgejo (context: internal/ideas) Implement the feature request from issue 37 
in a new feature branch. when done, push to our fork and open a PR upstream.

Issue contentIssue content

### Objective

Convert the three generic templates from Issue 20, issuecomment 2205 into forgejo issue templates in yaml format, store the new templates in a new directory `templates/generic`.

### Context

- templates: Issue 20, issuecomment 2205
- target: `templates/generic` (new subdirectory)

### Constraints

- Must preserve all text as `value`
- Must preserve all comments as `description` and `placeholder`

### Acceptance Criteria

- 3 new yaml files in `templates/generic` that can be manually copied into forgejo repositories

### Agent Instructions

Scope: only add new files under `templates/generic`

Comparison between Qwen3.6-36B and Opus 4.7Comparison between Qwen3.6-36B and Opus 4.7

Just took one of the files to show what's what. In the diff, - is qwen, + is opus:

diff --git a/templates/generic/task.yaml b/templates/generic/task.yaml
index 20c8b51..284b160 100644
--- a/templates/generic/task.yaml
+++ b/templates/generic/task.yaml
@@ -1,47 +1,50 @@
 name: Structured Task
-description: Template for implementing a well-scoped task
+about: Well-scoped task for an agent to implement
+title: "[Task]: "
+labels: ["bot/implement"]
 body:
   - type: textarea
     id: objective
     attributes:
       label: Objective
       description: "One sentence: what should be true when this is done?"
-      placeholder: "Convert the three generic templates from Issue 20 into forgejo issue templates in yaml format"
+      placeholder: "One sentence: what should be true when this is done?"
     validations:
       required: true
-
   - type: textarea
     id: context
     attributes:
       label: Context
-      description: Links to related issues, files, or docs. The agent needs these to avoid hallucinating context.
-      placeholder: "templates: Issue 20, issuecomment 2205\ntarget: `templates/generic` (new subdirectory)"
+      description: "Links to related issues, files, or docs. The agent needs these to avoid hallucinating context."
+      placeholder: "Links to related issues, files, or docs. The agent needs these to avoid hallucinating context."
     validations:
       required: true
-
   - type: textarea
     id: constraints
     attributes:
       label: Constraints
-      description: Limitations and requirements that must be met
-      placeholder: "- Must preserve all text as `value`\n- Must preserve all comments as `description` and `placeholder`"
+      value: |
+        - Must not break existing tests
+        - Target repo: `owner/repo`
+        - Target branch: `main`
     validations:
       required: true
-
   - type: textarea
     id: acceptance-criteria
     attributes:
       label: Acceptance Criteria
-      description: Conditions that must be met for the task to be considered complete
-      placeholder: "- 3 new yaml files in `templates/generic` that can be manually copied into forgejo repositories"
+      value: |
+        - Criterion 1
+        - Criterion 2
     validations:
       required: true
-
   - type: textarea
     id: agent-instructions
     attributes:
       label: Agent Instructions
       description: "Optional: override default behavior"
-      placeholder: "Scope: only add new files under `templates/generic`"
+      placeholder: "Optional: override default behavior"
+      value: |
+        Scope: only modify files under `src/`
     validations:
       required: false

As clearly visible:

  1. Qwen got confused between the issue it was solving and the template it was implementing
  2. Qwen was unable to follow the instruction to create value attributes

ConclusionConclusion

Qwen's result is of unacceptable quality for me - on a really easy job, which is
why I asked if you'd been using anyone's finetunes, because this needs tuning for
instruction following and separating concerns.