WP
Writing Plans Skills Review: Constraining AI Coding with TDD
bestskills rank team
2026-04-15

A deep review of writing-plans skills in the openclaw/hermes agent environment. We break down how it forces AI into micro-task breakdowns through zero context assumptions and TDD, teaching Prompt techniques to prevent architecture decay.


Skill Quality Report: writing-plans

Evaluation Time: 2026-04-15
Evaluation Mode: Item-by-item review

Overall Score

DimensionScoreStatus
Standards (20%)14/20WARN
Effectiveness (40%)37/40PASS
Safety (30%)28/30PASS
Conciseness (10%)7/10WARN
Total86/100Good

Level guide:

  • 90-100: Excellent - ready to use
  • 70-89: Good - small but meaningful room to improve
  • 50-69: Fair - needs important revisions
  • <50: Not qualified - requires substantial rewrite

Skill Strengths

  1. [Effectiveness] It forces planning before implementation with an explicit startup announcement - Evidence: Announce at start: "I'm using the writing-plans skill to create the implementation plan." (Overview section).
  2. [Effectiveness] It prevents scope sprawl by requiring subsystem-level decomposition - Evidence: suggest breaking this into separate plans - one per subsystem (Scope Check section).
  3. [Effectiveness] It operationalizes TDD into executable micro-steps - Evidence: the fixed sequence Write the failing test -> ... -> Commit (Bite-Sized Task Granularity section).
  4. [Safety] It reduces execution ambiguity through concrete commands and expected outcomes - Evidence: each run step requires both Run: and Expected: outputs (Task Structure section).

Skill Improvement Areas

  1. [Standards] Governance metadata is incomplete for maintainability at scale - Evidence: the current header only presents name and description; Impact: weak version traceability and weak policy enforcement across repositories.
  2. [Standards] Naming convention does not follow verb-ing guidance from the same framework - Evidence: name: writing-plans; Impact: lower discoverability and naming inconsistency in mixed skill catalogs.
  3. [Conciseness] The main document is dense and carries policy, templates, and examples in one body - Evidence: large sections from Scope Check to Execution Handoff are all inline; Impact: higher token cost in repeated runtime loading.

Insights

  1. Constraining task size to 2-5 minutes is a practical way to keep execution quality stable. - Application: long implementation plans where context drift is common.
  2. Requiring explicit expected failure/pass states makes TDD less ceremonial and more verifiable. - Application: teams that struggle with test-first discipline.
  3. A built-in self-review checklist is low-cost and catches plan defects early. - Application: spec-to-plan workflows with multiple contributors.

Issue List

[Medium] Standards - Missing governance metadata

  • Location: top metadata block
  • Description: key fields such as version, author, license, and structured metadata are absent.
  • Suggestion: add complete metadata fields and keep them versioned with skill updates.

[Medium] Standards - Naming convention mismatch

  • Location: name field
  • Description: the skill name is not in verb-ing form, which conflicts with the framework’s naming recommendation.
  • Suggestion: align naming strategy or document why this catalog intentionally departs from verb-ing naming.

[Low] Conciseness - Progressive disclosure can be stronger

  • Location: main document body
  • Description: operational rules, templates, and execution handoff are concentrated in one file.
  • Suggestion: move stable long-form sections to reference/ and keep the main file focused on trigger rules and execution-critical constraints.

Prioritized Recommendations

  1. [Must] Add complete governance metadata to improve traceability and repository-wide consistency.
  2. [Should] Clarify naming policy (either adopt verb-ing or define an explicit exception rule).
  3. [Could] Split long stable guidance into companion files to reduce token pressure in routine runs.

Related Resources

Recommended Reading