How I Built a 47-Signal Website Audit Tool That Runs in 15 Seconds

I was running four tools to answer one question.

Every time I needed a complete picture of a site’s health, I would open an SEO scanner, then a security checker, then an AEO validator, then something else for GEO signals. Each tool gave me a slice. None of them talked to each other. And the gaps between those slices were exactly where the real problems lived.

The moment that made it undeniable was when I ran my own portfolio page through the stack I had been using. I expected a clean result. I had built the thing myself, I knew the codebase, and I had shipped plenty of production sites before. What I found was that I had missed basic header protections entirely. Not obscure edge cases. Foundational security headers that affect trust signals across every category I cared about. A site I had built with my own hands was failing checks I would have caught immediately if I had one complete view instead of four partial ones.

That is where The Canopy Guard started.

Most audit tools are built around a category. SEO tools look at meta tags, crawlability, and keyword signals. Security scanners check headers, SSL, and known vulnerability patterns. AEO tools look at structured data and answer-engine readiness. GEO tools, the newer category, look at how AI assistants and large language models are likely to interpret and cite a page.

Each of these categories matters. The problem is that a site does not exist in categories. A missing security header can suppress a page in AI-generated results. A schema gap can make an otherwise well-optimized page invisible to voice assistants. A slow TTFB affects both user experience and how crawlers assess page reliability.

When you scan in silos, you get silo-shaped answers. The cross-category relationships disappear.

Deciding what to scan

Before writing a single line of code, I mapped the signal landscape. I needed to know which signals actually move outcomes, which ones are diagnostic, and which ones are noise.

I landed on 47 signals across four layers: SEO, AEO, GEO, and security.

SEO signals cover the fundamentals: title and meta description presence and length, canonical tags, Open Graph data, robots directives, sitemap accessibility, page speed indicators, and mobile readiness. Nothing exotic here, but completeness matters. A tool that checks 12 of these misses the other signals that interact with them.

AEO signals focus on answer-engine readiness. Structured data markup, FAQ schema, HowTo schema, breadcrumb implementation, and the clarity of the page’s primary topic. These signals determine whether a page gets surfaced in featured snippets, voice results, and zero-click answers.

Security signals cover the headers and configurations that affect both user safety and search engine trust: HTTPS enforcement, HSTS presence, X-Frame-Options, Content Security Policy headers, X-Content-Type-Options, and referrer policy. These are table stakes for production sites, but a surprising number of live sites are missing several of them.

Why I went deep on GEO and AEO

I want to address the category that gets the most pushback, because I still hear it from legacy SEO writers: the idea that GEO is not important yet, that it is something to think about later, that traditional search signals are enough for now.

I disagree with that framing, and I built The Canopy Guard partly because of it.

AI is not gaining usage on a quarterly curve. It is gaining every hour of every day. The builders and founders who are treating GEO as a future problem are going to look up one morning and realize that the primary source of truth for their audience has already shifted, and their content is not visible in it. I have seen this pattern before. I watched businesses scramble to catch up on mobile optimization years after the signal was obvious. I watched the same thing happen with structured data. The pattern repeats.

Why wait until you are late?

GEO signals are the newest layer and the least understood by most builders. GEO, or Generative Engine Optimization, looks at how large language models are likely to interpret a page when generating answers. This includes entity clarity, citation-friendly content structure, and the presence of signals that help AI assistants attribute and trust a source. Building for GEO is not a separate discipline from building for SEO or AEO. It is an extension of the same discipline, applied with more awareness of where the web is actually going.

The builders who treat these three layers as one connected system are going to have a significant advantage over the ones still running siloed audits in 2026 and beyond.

Building the cross-reference engine

Listing 47 signals is not the hard part. The hard part is making them talk to each other.

A standard audit tool runs each check independently and returns a pass or fail. The Canopy Guard runs each check and then evaluates the relationships between results. That is what I call cross-reference intelligence, and it is the part of the architecture that took the most design work to get right.

Here is how it works. Every signal check writes its result into a shared state object, tagged by layer and severity. After all checks complete, a correlation pass runs across that state object looking for known failure combinations. These combinations are defined as rules, each one mapping a specific pattern of co-occurring failures to a compound insight that no individual check would surface on its own.

A simple example: a page might pass its meta description check, its schema check, and its HTTPS check individually. But if it has no canonical tag, no HSTS header, and no FAQ schema, those three gaps together signal an indexability and trust problem that goes deeper than any single missing element. A siloed report returns three separate low-priority flags. The cross-reference engine returns one high-priority compound finding with a clear explanation of why the combination matters.

The rules are weighted by layer relationship, not just by individual signal severity. A security failure that intersects with an AEO gap gets elevated because the combination has a compounding effect on AI-engine trust scoring. An SEO gap that intersects with a GEO gap gets flagged differently than either would alone.

The data model that made this possible was simple but intentional. Every check result is stored as a structured object with a layer key, a signal key, a binary pass or fail value, and a metadata field for any additional context the check surfaces. The correlation engine queries across layer keys, which means adding new cross-layer rules later is a matter of adding to the rules configuration rather than rewriting the check logic.

Solving for speed

Scanning 47 signals across four categories on an external URL has a natural latency problem. Making HTTP requests, parsing headers, evaluating DOM structure, and running correlation logic across all of it takes time. Done sequentially, a full scan runs well past 45 seconds. For a tool designed to give builders a real-time answer, that is not usable.

The solution was parallelization, but not naive parallelization. Firing 47 independent requests at once creates its own problems: race conditions on the shared state object, inconsistent timeout behavior, and no clean way to handle partial failures without corrupting the result.

The pattern that worked was parallel execution by layer, with a single shared page fetch at the top.

The first step is a single HTTP GET to the target URL, with headers captured separately from the DOM. That fetch is done once and the response is passed to all checks that need it, rather than each check fetching the page independently. This alone cuts the network overhead significantly, since most checks only need a fragment of what a full page fetch returns.

From there, checks are grouped by data source. Security header checks run as a batch because they all operate on the response headers from that single fetch, with no additional requests needed. SEO and AEO checks that require DOM parsing run as a separate parallel batch operating on the parsed response body. Schema checks, canonical checks, and Open Graph checks all read from the same parsed document object rather than each parsing the DOM independently.

The only checks that require additional outbound requests are the ones verifying external resources: sitemap accessibility, robots.txt retrieval, and a lightweight TTFB probe. These run as their own parallel batch with individual timeouts so that a slow or unreachable sitemap does not block the rest of the scan.

With this structure, the total scan time is determined by the slowest batch, not the slowest individual check. The external resource batch is typically the longest running, and it completes well within the 15-second target because the checks within it are lightweight and time-boxed.

What this build taught me about founders

I built The Canopy Guard to solve my own problem, but what I did not anticipate was how it would change the conversations I have with founders.

I work with a lot of early-stage builders, people who are launching their first serious product, trying to get visibility, trying to understand why their site is not performing the way they expected. Before this tool, those conversations started with a long intake. I would ask about their stack, their current SEO setup, what tools they had used, what scores they had seen. It took time, and it still left gaps.

Now I run The Canopy Guard on their site at the start of the conversation. In 15 seconds I have a complete picture. Not to sell them anything. Not to pitch a service. To teach. To walk them through what the scores mean, why certain combinations of gaps matter more than others, and what to fix first. The tool became a teaching instrument, and that changed how I think about what it is for.

The next phase for The Canopy Guard is a learning course built directly around the audit results. The goal is to teach builders how to avoid poor scores from the beginning, before they launch, before the gaps compound. Not remediation. Prevention.

A pattern worth taking from this

If you are building diagnostic tooling of any kind, three principles came out of this build that apply broadly.

Start with categories, but design for relationships from day one. Even if your first version does not surface cross-reference insights, structure your data model so that correlations are possible later. Retrofitting cross-reference logic into a flat results object is painful.

Parallelize by data source, not by check. Group your work around what each check actually needs to read. Checks that share an input should share a fetch. This is the decision that keeps your scan fast as the signal count grows.

Build your output for the person acting on it. The temptation in diagnostic tooling is to surface everything you can measure. The discipline is deciding what a builder actually needs to act on at 11pm before a client presentation. Those are not the same list.

Where this is going

The Canopy Guard is live at thecanopyguard.com. It is free, built for developers and builders, and designed around the conviction that a complete picture of site health should not require four tools and twenty minutes.

The learning course is next. If you want to understand not just what your scores mean but how to build sites that score well from day one, that is what it will cover.

If you build tools that correlate multiple data sources into a single output, the architecture decisions here are worth studying. Not because this is the only way, but because the tension between depth, speed, and signal clarity is a problem every tool in this category faces eventually.

Adam McClarin is a full-stack AI engineer, CISSP, and founder of Meraki Is Love LLC. He builds production AI tools and writes about the decisions behind them. adammcclarin.com

Subscribe to Updates

What's Hot