Back to Insights
Quality Gates Astro.js Content Validation Slop Detection Web Development

The Portfolio Site That Validates Its Own Content

Every post on this blog passes a 4-dimension quality gate before it goes live. The build fails if the slop score exceeds 30%. This post included.

Every post on my blog passes a 4-dimension quality gate before it goes live. The build fails if the slop score exceeds 30%. This post included.

I publish frequently. When I started, quality was inconsistent. Some posts were sharp. Others read like they were generated by a model trying to sound impressive — because parts of them were. I used AI assistance for drafting, and without a quality check, the AI’s worst habits leaked through.

So I made quality a build-time constraint. The same way TypeScript prevents type errors from shipping, my content validation prevents slop from publishing.


The Astro.js Foundation

The site runs on Astro.js. Astro is a static site generator that ships zero JavaScript by default and hydrates components only when needed. For a content-heavy portfolio site, it’s the right tool.

But Astro’s content collection system is what makes validation possible. Content collections enforce a schema on your markdown frontmatter. Every post must have a title, description, date, tags, and series designation. Missing a required field? The build fails.

I extended this beyond standard metadata:

series: "application-architecture"  # Must match a defined series
seriesOrder: 5                       # Must be unique within series
relatedPosts: ["post-slug-1"]       # Must reference existing posts

The schema validates referential integrity. If I reference a related post that doesn’t exist, the build breaks. If I duplicate a seriesOrder within a series, the build breaks. If I use a series name that isn’t defined, the build breaks.

This catches a class of errors that no linter would find. Broken cross-references, orphaned series entries, and duplicate ordering are content structure bugs, and they get caught at build time rather than discovered by a reader clicking a dead link.


Four-Dimension Slop Scoring

The slop detector runs as a build step. Every markdown file in the content collection gets scored across four dimensions.

Dimension 1: Lexical (0-25 points)

This is the banned phrase list. Twenty-six phrases that signal AI-generated filler:

“Delve” earns 3 points. “Paradigm shift” earns 3. “Cutting-edge” earns 2. “Seamless” earns 2. The full list targets words and phrases that AI models use reflexively and human experts avoid instinctively.

The scoring isn’t binary. “Navigate the complexities” earns more points than “navigate” alone, because the full phrase is more distinctly AI-generated. Context matters — “navigate” in a post about GPS is fine. “Navigate” in a post about business strategy is a flag.

Dimension 2: Statistical (0-40 points)

This dimension catches structural patterns that humans don’t produce.

Sentence length variance: human writing has high variance. Short punchy sentences. Then a longer one that takes its time building an argument across a subordinate clause or two. AI-generated text tends toward a narrow band — most sentences are 15-25 words.

Vocabulary diversity: the ratio of unique words to total words. Below 0.45 for a 1500-word post is a flag. It means the text is repeating the same words instead of using synonyms or restructuring.

Phrase repetition: when the same 3-gram appears more than twice in a post (excluding code snippets), the statistical score increases. AI models have favorite constructions they return to repeatedly within a single generation.

This dimension has the highest weight (40 points) because statistical patterns are the hardest to fake. You can manually remove banned phrases. You can’t easily fix uniform sentence length without rewriting entire paragraphs.

Dimension 3: Structural (0-20 points)

AI text has a recognizable skeleton. Each section follows the same pattern: topic sentence, three elaborating sentences, summary sentence. Every paragraph is 4-5 sentences. Every section is 3-4 paragraphs.

The structural score penalizes this uniformity. It measures paragraph length variance (good writing mixes 1-sentence paragraphs with 4-sentence paragraphs), section length balance (sections should have different depths based on the complexity of their topic), and transition patterns (not every section should start with “However” or “Additionally”).

Dimension 4: Tonal (0-15 points)

Hedging language: “it could be argued,” “one might consider,” “it’s worth noting.” These constructions exist because the model is uncertain but doesn’t want to commit. Expert writing states claims directly and provides evidence.

Corporate voice: “we are proud to,” “excited to share,” “committed to excellence.” This is marketing copy leaking into technical writing.

Passive construction density: some passive voice is fine. More than 20% passive constructions in a technical post is a flag. “The model was trained” is fine. “It was determined that the model was to be trained by the team” is a problem.


How Validation Works at Build Time

The validation runs as an npm script: npm run slop:validate. It’s wired into the build pipeline so it runs before the site compiles.

[PASS]  infrastructure-paradox.md           Score: 12/100
[PASS]  trading-system-13-capital-gates.md   Score: 18/100
[FAIL]  some-draft-post.md                   Score: 42/100
        - Lexical: 8/25 (banned: "delve", "robust")
        - Statistical: 22/40 (low sentence variance: 0.31)
        - Structural: 7/20 (uniform paragraph length)
        - Tonal: 5/15 (3 hedging constructions)

When a post fails, the validator outputs an improvement prompt. This is a structured instruction that can be fed back to the drafting model:

Revise the following post. Address these specific issues:
1. Replace "delve" (para 3) and "robust" (para 7) with concrete alternatives
2. Vary sentence length -- current variance is 0.31, target is 0.50+
3. Break the uniform 4-sentence paragraph pattern in sections 2 and 4
4. Remove hedging in: "it could be argued that..." (para 5), convert to direct claim

This feedback loop means the AI model that drafted the post gets specific, actionable feedback rather than a vague “write better.” The revision usually drops the score by 15-20 points, bringing most posts below the 30-point threshold.


Content Collection Schema Details

Beyond slop validation, the content collection schema enforces structural rules that keep the site coherent as it grows.

Series Management

Posts belong to named series. Each series has a defined order. The schema enforces that every post in a series has a unique seriesOrder value and that the values form a contiguous sequence (no gaps).

On the rendered site, series posts display navigation to the previous and next post in the series. Readers can follow a complete thread — from the introductory post through the advanced material — without hunting through the archive.

Related Posts

The relatedPosts field is a list of post slugs. The schema validates that every referenced slug corresponds to an existing post. This creates a graph of content relationships that the site uses to suggest further reading.

The graph also surfaces orphaned posts — posts with no inbound or outbound relationships. An orphaned post is either genuinely standalone (rare) or a sign that I forgot to connect it to the rest of the content graph.

Coined Terms

Some posts introduce new terminology. The coinedTerms field tracks these. The site uses this metadata to build a glossary and to link term usage across posts. If Post A coins “context cascade” and Post B uses that phrase, the site can auto-link the usage back to the definition.


Why Build-Time Validation Matters

I could run slop detection as a manual step. Open a draft, paste it into the detector, read the report, fix the issues, repeat. That workflow works for one post.

It falls apart at scale. When you’re publishing 3-4 posts per week, the manual check becomes the bottleneck. You skip it once because you’re tired. You skip it again because the draft “felt right.” By the end of the month, two posts have slop scores above 40 and you don’t notice until a reader points it out.

Build-time enforcement removes the human from the quality loop. I can’t publish a sloppy post because the build literally won’t complete. There’s no “I’ll fix it later.” There’s no “this one’s fine.” The number decides.

This is the same principle as type checking, linting, and CI tests. If it matters, enforce it in the build. If it doesn’t matter enough to enforce, stop pretending it matters.


Results

Since implementing build-time validation, the average slop score across all published posts is 17/100. The highest-scoring published post is 28. Zero posts have shipped above 30.

The improvement prompt feedback loop means first drafts are getting better over time. The average first-draft slop score was 42 in January. It’s 35 now. The models are learning what I don’t want, because I feed them specific corrections instead of vague dissatisfaction.

Reader feedback has shifted too. Before validation, the most common criticism was “this reads like AI wrote it.” I haven’t received that comment since enabling the slop gate. The most common criticism now is “this is too opinionated.” I consider that a compliment.


Building content systems with quality enforcement? I help teams implement validation pipelines that catch problems before they ship. Book a call to discuss your content architecture.