pull down to refresh
Okay so I checked that. I gave both claude and qwen3.6 the same task concurrently.
Claude Code on one hand, OpenCode on the other.
I took a really simple one: extract some templates out of an issue comment and
make it into a yaml template. (I already have the templates but I thought this
would be a good simple test to start with.)
Same prompt. Same input. Different outcome though:
PromptPrompt
/forgejo (context: internal/ideas) Implement the feature request from issue 37
in a new feature branch. when done, push to our fork and open a PR upstream.Issue contentIssue content
### Objective
Convert the three generic templates from Issue 20, issuecomment 2205 into forgejo issue templates in yaml format, store the new templates in a new directory `templates/generic`.
### Context
- templates: Issue 20, issuecomment 2205
- target: `templates/generic` (new subdirectory)
### Constraints
- Must preserve all text as `value`
- Must preserve all comments as `description` and `placeholder`
### Acceptance Criteria
- 3 new yaml files in `templates/generic` that can be manually copied into forgejo repositories
### Agent Instructions
Scope: only add new files under `templates/generic`Comparison between Qwen3.6-36B and Opus 4.7Comparison between Qwen3.6-36B and Opus 4.7
Just took one of the files to show what's what. In the diff, - is qwen, + is opus:
diff --git a/templates/generic/task.yaml b/templates/generic/task.yaml
index 20c8b51..284b160 100644
--- a/templates/generic/task.yaml
+++ b/templates/generic/task.yaml
@@ -1,47 +1,50 @@
name: Structured Task
-description: Template for implementing a well-scoped task
+about: Well-scoped task for an agent to implement
+title: "[Task]: "
+labels: ["bot/implement"]
body:
- type: textarea
id: objective
attributes:
label: Objective
description: "One sentence: what should be true when this is done?"
- placeholder: "Convert the three generic templates from Issue 20 into forgejo issue templates in yaml format"
+ placeholder: "One sentence: what should be true when this is done?"
validations:
required: true
-
- type: textarea
id: context
attributes:
label: Context
- description: Links to related issues, files, or docs. The agent needs these to avoid hallucinating context.
- placeholder: "templates: Issue 20, issuecomment 2205\ntarget: `templates/generic` (new subdirectory)"
+ description: "Links to related issues, files, or docs. The agent needs these to avoid hallucinating context."
+ placeholder: "Links to related issues, files, or docs. The agent needs these to avoid hallucinating context."
validations:
required: true
-
- type: textarea
id: constraints
attributes:
label: Constraints
- description: Limitations and requirements that must be met
- placeholder: "- Must preserve all text as `value`\n- Must preserve all comments as `description` and `placeholder`"
+ value: |
+ - Must not break existing tests
+ - Target repo: `owner/repo`
+ - Target branch: `main`
validations:
required: true
-
- type: textarea
id: acceptance-criteria
attributes:
label: Acceptance Criteria
- description: Conditions that must be met for the task to be considered complete
- placeholder: "- 3 new yaml files in `templates/generic` that can be manually copied into forgejo repositories"
+ value: |
+ - Criterion 1
+ - Criterion 2
validations:
required: true
-
- type: textarea
id: agent-instructions
attributes:
label: Agent Instructions
description: "Optional: override default behavior"
- placeholder: "Scope: only add new files under `templates/generic`"
+ placeholder: "Optional: override default behavior"
+ value: |
+ Scope: only modify files under `src/`
validations:
required: falseAs clearly visible:
- Qwen got confused between the issue it was solving and the template it was implementing
- Qwen was unable to follow the instruction to create
valueattributes
ConclusionConclusion
Qwen's result is of unacceptable quality for me - on a really easy job, which is
why I asked if you'd been using anyone's finetunes, because this needs tuning for
instruction following and separating concerns.
reply
I don't understand what that means. I find gemma/qwen to work fine for most programming tasks