I want to write a complex smart test.
But first, let’s discuss this project.
Our end to end slow build process is something like:
————————-
0) Create session
Input: user clicks “Generate Theme”
Action: POST /sessions → status=COLLECTING
Output: sessionId
Gate: none
1) Collect attributes (chat-first)
Input: mock user prompts
Action: triggers the extraction prompt
Commit: on user “Apply”, batch POST /chat/commit (with idempotencyKey)
Output: upserted session attributes, optional new Library entries/presets
Gate: re-compute threshold; stay in COLLECTING if < 50%
2) Collect special instruction (chat-first)
Input: mock user prompts
Action: triggers the extraction prompt
Commit: on user “Apply”, batch POST /chat/commit (with idempotencyKey)
Output: upserted session attributes, optional new Library entries/presets
Gate: re-compute threshold; stay in COLLECTING if < 50%
2B) How we’ll extract instructions from prompts
Extraction categories (UGC → directives)
Tone/voice: “credible”, “optimistic”, “compact spacing”.
Layout intents: “sticky navbar”, “3-column footer”, “avoid carousels”.
Content rules: “include legal disclaimer in footer”, “CTA wording”.
Imagery rules: “photographic”, placehold.co sizes.
SEO: keywords/phrases.
Accessibility preference: “AA minimum”, “visible focus”.
Performance stance: “no large hero videos on mobile”.
3) Live preview (pre-review)
Input: current session attributes
Action: render token swatches, spacing grid, pattern preview (placehold.co)
Output: deterministic preview (no arbitrary classes/values)
Gate: quick checks (schema present, token coverage %) but not strict yet
4) Pre-approval validation
Input: session attributes
Action: server validations:
schema completeness; threshold ≥ 90%
contrast AA (text/bg, primary/bg)
spacing/gutter/containerMax are multiples of spaceUnit
class API keys present (btn/card/section)
Output: pass/fail list + fix suggestions (chat can apply)
Gate: move to REVIEW only when pass + ≥ 90%
5) Human approval & lock
Input: user sees preview + green checks
Action: Approve & Lock → POST /sessions/:id/approve
Output: instructionsVersion created; attributes locked
Gate: status=APPROVED (builds can start)
6) Generate ThemeInstructions (single source of truth)
Input: locked session attributes
Action: build ThemeInstructions JSON:
designTokens, layoutSystem, breakpoints
approved snippet IDs (Navbar/Footer/etc.)
class API namespaces/variants
image placeholder policy/sizes (placehold.co)
a11y/perf budgets
Output: immutable version (e.g., 2025.08.22-001)
Gate: stored & referenced by every page build
7) Queue slow-pass page builds (for each page)
Two inputs per page:
Attributes (tokens/layout numbers) → deterministic styling.
Special Instructions (UGC, tone) → page behavior & composition.
Library stays curated (site-controlled). UGC does not go into the Library.
instructionsVersion is extended to embed the user’s approved special instructions alongside attributes, so builds are reproducible.
Input: instructionsVersion, PageSpec (e.g. index.php, single.php, page.php, archive.php, header.php, footer.php, sidebar.php)
Action: set status=BUILDING; spawn 1 passes per page (below)
Output: per-page PageBundle
Gate: each pass must succeed before next
7.1 Structure pass (per page)
Do: semantic TSX only (landmarks/slots), no styling; must use approved components
Checks: minimal className, correct headings, ARIA roles in place
Fail examples: arbitrary CSS, missing html
7.2 Style pass (per page)
Do: apply Tailwind utilities backed by tokens (CSS vars)
Checks: no raw hex/px; no arbitrary [value]; spacing from scale; token coverage 100%
Fail examples: text-[#AABBCC], p-[5px], ad-hoc classes
7.3 Conformance pass (per page)
Do: auto-fix/validate a11y (axe), spacing rhythm, heading order; perf preflight
Checks: axe = 0 serious/critical; budgets: LCP < 2.5s, CLS < 0.1, TBT < 200ms
Output: preview .png, reports .json (a11y/design-lint/perf)
8) Assemble theme artifact
Input: all PageBundles
Action: create manifest (pages, checksums, reports) pinned to instructionsVersion
Output: build artifacts (TSX, previews, reports, manifest)
Gate: single instructionsVersion across all pages
9) Tests before “Built”
Unit: components render + props + a11y attrs
Integration: donate / endorse / vote flows
Visual: Playwright snapshots for Home/Donate/Campaign
Determinism: rerun with same inputs → identical checksums
Gate: all green → status=BUILT (else FAILED with reasons)
10) Review → Publish (staging → prod)
Input: BUILT artifacts + reports
Action: human review (previews + scores), then publish
Output: deployed theme; manifest archived
Gate: optional “block on budgets” setting (must meet/perf/a11y to deploy)
11) Rollback
Input: previous instructionsVersion
Action: rebuild –version= (immutable)
Output: restored theme
Gate: smoke tests pass
12) Feedback → Library learning
Input: user acceptances/rejections, usage metrics
Action: increment/decrement preset scores; bias future chat suggestions
Output: better defaults next time
Gate: none
————————-
I want a smart test that tests all 10 steps to generating a theme to prod.
ASSERTIONS PER STEP (CIRCUIT BREAKERS)
– Step 1 (Create session): expect 201; save sessionId.
– Step 2 (Collect via chat + commit): expect 200; verify `threshold` increases and `missingKeys` shrinks; validate `ThemeSessionAttribute` rows present with `source=”chat”`.
– Step 3 (Preview): verify URLs match `https://placehold.co/{WxH}` and class API present in skeleton nodes.
– Step 4 (Pre-approval validation): threshold ≥ 0.90; AA contrast; spacing multiples of 4; class API keys present.
– Step 5 (Approve & lock): status becomes APPROVED; attributes locked; 409 if editing afterward is attempted (assert this).
– Step 6 (Instructions): GET by version; assert immutability and that placeholders + class API are recorded.
– Step 7 (Structure pass): TSX has landmarks, minimal className usage, no Tailwind color utilities yet.
– Step 8 (Style pass): no raw hex/px; no arbitrary [value]; spacing utilities are from the configured scale.
– Step 9 (Conformance & assemble): axe = 0 serious/critical; preview.png stored; manifest includes checksums and instructionsVersion; rerun same inputs → identical checksums.
– Step 10 (Publish): deployment returns success; GET manifest shows single instructionsVersion across all pages.
I suggest something like:
TEST SCOPE (10 PHASES → PROD)
The suite must implement these **10 steps**, each with a circuit breaker (fail fast, write checkpoint, exit):
1) Create session → POST /sessions → returns {sessionId, status:”COLLECTING”}
2) Collect attributes (chat-first) → simulate chat turns, then on user “Apply” do POST /chat/commit (batched, idempotencyKey). Validate threshold recalculation.
3) Live preview pre-review → GET /sessions/:id/preview (or generate locally if API not available). Verify deterministic placeholders (placehold.co).
4) Pre-approval validation → GET /sessions/:id/threshold and GET /sessions/:id/validations → require threshold ≥ 0.90 and all checks (contrast, spacing multiples, class API).
5) Approve & lock → POST /sessions/:id/approve → expect {status:”APPROVED”, instructionsVersion}.
6) Generate & fetch ThemeInstructions (single source of truth) → GET /instructions/:version → verify tokens, class API, placeholder policy.
7) Page build – Structure pass → POST /pages/generate {version, pageSpec, pass:”structure”} → verify semantic TSX only (no styling).
8) Page build – Style pass → pass:”style” → verify only token-mapped Tailwind utilities (no hex/px, no arbitrary [value]).
9) Page build – Conformance pass & assemble theme → pass:”conformance” → expect axe zero serious/critical; output previews + reports; assemble manifest pinned to instructionsVersion.
10) Publish → POST /publish {manifestId or version} → verify deployment status + immutable manifest.
CIRCUIT BREAKERS & RESUME
– After each step, write a JSON checkpoint to `./.test_state/theme_pipeline.checkpoint.json`
– The test must read this file at start and **resume from the next step** if it exists.
– If any API call fails (>=400) or a validation fails, **throw**, write the failing context into the checkpoint (`lastError`), and exit fast.
– All mutating calls must include an **idempotencyKey** (uuid v4) stored in checkpoint to prevent double writes on resume.
– Add a CLI/env flag `RESUME=true` to opt into resume mode; default is clean run.
API CALL EFFICIENCY (BE THRIFTY)
– Never refetch the same resource if the checkpoint already has the canonical ID/version. Cache GET responses in memory during the run.
– Use `If-None-Match` (ETag) or `If-Modified-Since` for GETs if supported; otherwise avoid repeated GETs unless the previous step requires fresh data.
– Provide a `DRY_RUN_BUILD_PASSES` toggle: when true, mock the 3 page passes by creating minimal valid artifacts locally but **still** validate schema and version pinning. Default false for a real run of **one minimal page set** (`Home` and `Donate`) to keep cost/time low.
ENV & HEADERS
– Load `.env` and require `OPENAPI_API_KEY`.
Before we even begin do a high level check for any obvious missing code.
Run a sanity pre-check before actually starting this massive test.
——————————————————————————————-
The other thing i am thinking about but dont quite understand is, about possibly a prompt which picks from options we give it based on our tokens and layouts that chooses the best combination based on the combined final user prompt.
For example we could:
Inputs: curated options (TokenSets, LayoutPresets, Snippet variants) + instructionsVersion.userDirectives/pageDirectives.
Process: hard constraints → lightweight heuristic scoring → LLM tie-breaker/rationale (on top-N only).
Outputs: a single SelectionPlan (chosen TokenSet + LayoutPreset + Snippet bundle), with reasons + all constraints satisfied. Store this plan in the build job, and mirror it into the ThemeInstructions snapshot (IDs + rationale), not into the Library.
so outputs are machine-parsable and reproducible.
——————————————————————————————-
Finally through testing we should be saving all results to wordpress_generator database.
Let’s add an openai transactions table to the schema to keep track of our api calls.
——————————————————————————————-
create a test-implementation-plan.md to capture our short and long term goals, this document may change as we proceed, priorities our top priotities and let’s begin
Leave a Reply