A/B Test Setup Review: Statistical Discipline in Testing

Overall Score

Grade Scale:

[Effectiveness] The use of progressive disclosure is excellent. The skill separates detailed data tables and templates into independent reference files (e.g., references/sample-size-guide.md), keeping the main document clean. — Reference: For detailed sample size tables and duration calculations: See references/sample-size-guide.md
[Effectiveness] It provides clear initial assessment guidelines and task-specific questions. By requiring the agent to gather context before acting, it prevents blind, low-quality outputs. — Reference: ## Initial Assessment and ## Task-Specific Questions
[Safety] The content is completely safe. There are no risky operations or system-level destructive commands.

[Standards] The YAML frontmatter is incomplete. It’s missing author, license, and metadata.hermes.tags. Also, the skill name doesn’t use the recommended verb-ing format. — Reference: The YAML block at the beginning. Impact: Reduces discoverability and doesn’t follow strict metadata standards.
[Effectiveness] The skill lacks a structured, executable workflow for the agent. It reads more like a wiki article about A/B testing rather than an operational guide. — Reference: The overall document structure. Impact: Agents might output inconsistent formats because they don’t have step-by-step instructions.
[Conciseness] The document wastes tokens on basic concepts. Explaining what an A/B test is or defining statistical significance (p-value < 0.05) is redundant since LLMs already know this. — Reference: Sections like ## Test Types and ## Analyzing Results. Impact: Burns context window space and increases response latency without adding actionable value.

Context-First Strategy: The Initial Assessment section explicitly tells the agent to read .agents/product-marketing-context.md before asking questions. — Application: Useful for any skill that relies heavily on project-specific business context.
Structured Question Checklists: Grouping necessary user inputs under a dedicated Task-Specific Questions section is a smart pattern. — Application: Any interactive skill that requires multi-turn dialogue to gather requirements.

Description: Missing author, license, and tag fields. The name ab-test-setup is a noun phrase instead of a verb phrase.
Recommendation: Add the complete metadata.hermes fields and rename the skill to setting-up-ab-tests or designing-ab-tests.

Description: The text provides principles but no concrete execution steps or output templates.
Recommendation: Add a ## Workflow section with an ordered list so the agent knows exactly how to process a request (e.g., 1. Ask questions -> 2. Frame hypothesis -> 3. Calculate sample size -> 4. Output plan).

Description: Spends too much space explaining basic A/B testing concepts and math that the LLM inherently understands.
Recommendation: Strip out the textbook definitions. Keep only the specific decision criteria, templates, and guardrails relevant to the task.

[Required] Remove basic textbook explanations to heavily reduce token consumption.
[Required] Add a concrete Workflow section to turn knowledge into actionable agent steps.
[Recommended] Complete the YAML metadata and standardize the file name.
[Recommended] Provide an explicit output template so the generated test plans have a consistent structure.