ab-test-setup-review
Skill Quality Assessment Report: ab-test-setup
Assessment Time: 2026-04-15 Assessment Mode: Line-by-line Review
Overall Score
| Dimension | Score | Status |
|---|---|---|
| Standards (20%) | 12/20 | WARN |
| Effectiveness (40%) | 25/40 | WARN |
| Safety (30%) | 30/30 | PASS |
| Conciseness (10%) | 4/10 | FAIL |
| Total Score | 71/100 | Good |
Grade Scale:
- 70-89: Good — Usable but has room for targeted improvements.
Strengths
- [Effectiveness] The use of progressive disclosure is excellent. The skill separates detailed data tables and templates into independent reference files (e.g.,
references/sample-size-guide.md), keeping the main document clean. — Reference:For detailed sample size tables and duration calculations: See references/sample-size-guide.md - [Effectiveness] It provides clear initial assessment guidelines and task-specific questions. By requiring the agent to gather context before acting, it prevents blind, low-quality outputs. — Reference:
## Initial Assessmentand## Task-Specific Questions - [Safety] The content is completely safe. There are no risky operations or system-level destructive commands.
Areas for Improvement
- [Standards] The YAML frontmatter is incomplete. It’s missing
author,license, andmetadata.hermes.tags. Also, the skill name doesn’t use the recommended verb-ing format. — Reference: The YAML block at the beginning. Impact: Reduces discoverability and doesn’t follow strict metadata standards. - [Effectiveness] The skill lacks a structured, executable workflow for the agent. It reads more like a wiki article about A/B testing rather than an operational guide. — Reference: The overall document structure. Impact: Agents might output inconsistent formats because they don’t have step-by-step instructions.
- [Conciseness] The document wastes tokens on basic concepts. Explaining what an A/B test is or defining statistical significance (p-value < 0.05) is redundant since LLMs already know this. — Reference: Sections like
## Test Typesand## Analyzing Results. Impact: Burns context window space and increases response latency without adding actionable value.
Key Takeaways
- Context-First Strategy: The
Initial Assessmentsection explicitly tells the agent to read.agents/product-marketing-context.mdbefore asking questions. — Application: Useful for any skill that relies heavily on project-specific business context. - Structured Question Checklists: Grouping necessary user inputs under a dedicated
Task-Specific Questionssection is a smart pattern. — Application: Any interactive skill that requires multi-turn dialogue to gather requirements.
Detailed Issue List
[Medium] Standards — Missing Metadata and Naming Convention
- Location: YAML Frontmatter
- Description: Missing author, license, and tag fields. The name
ab-test-setupis a noun phrase instead of a verb phrase. - Recommendation: Add the complete
metadata.hermesfields and rename the skill tosetting-up-ab-testsordesigning-ab-tests.
[Medium] Effectiveness — Lacks Structured Workflow
- Location: Global
- Description: The text provides principles but no concrete execution steps or output templates.
- Recommendation: Add a
## Workflowsection with an ordered list so the agent knows exactly how to process a request (e.g., 1. Ask questions -> 2. Frame hypothesis -> 3. Calculate sample size -> 4. Output plan).
[Severe] Conciseness — Heavy on Basic Explanations
- Location:
## Test Types,## Sample Size,## Analyzing Results - Description: Spends too much space explaining basic A/B testing concepts and math that the LLM inherently understands.
- Recommendation: Strip out the textbook definitions. Keep only the specific decision criteria, templates, and guardrails relevant to the task.
Improvement Recommendations (By Priority)
- [Required] Remove basic textbook explanations to heavily reduce token consumption.
- [Required] Add a concrete
Workflowsection to turn knowledge into actionable agent steps. - [Recommended] Complete the YAML metadata and standardize the file name.
- [Recommended] Provide an explicit output template so the generated test plans have a consistent structure.