How do you know if AEO is working?

AEO without measurement is just activity. This post walks through a framework and how to know what to do next.

Animated illustration of the AEO Maturity Model, with a robot following a glowing path through Emerging, Technically Equipped, Unstructured Authority, and Cited Authority quadrants.
Article
9
min read

Introduction

I’ve always had a complicated relationship with marketing metrics. The actions that are easiest to measure are often structured around lead-harvesting and tend to be the metrics closest to revenue and more appreciated by your stakeholders.  The hard part of marketing that builds the path for demand generation to work is the brand development. It’s a lot easier to generate leads when you can connect with a customer that believes that your company solves a particular problem that they are facing, but harder to measure and connect to revenue. But still important.

As I was digging into the measurement of AEO, I experienced the same tension.  I started with a desire to measure outcomes, in this case, how well a company is noticed.  I call this the ‘outside-in’ perspective.  Terms such as ‘share of answer’ and ‘citation rate’ emerged as key indicators. Just like the traditional measures of brand strength, the outside-in perspective is not terribly easy to measure. But it is critically important to figure out so that companies can gauge their AEO performance.

There is also a fundamental approach that is required for AI tools to read your content. Very much like SEO, the content structured in the right way for LLMs to efficiently digest is table-stakes.  You can have the best, most authentic content in the world, but if your content is not structured appropriately, AI will have a harder time hearing you.  This is what I call the ‘inside-out’ perspective.  It measures actions, often technical, that a company can take to have the structure that AI can consume.

Both perspectives (Outside-in and Inside-out) are important and require different actions and signal different needs. The AEO Maturity Model is a framework to measure the readiness and effectiveness of a company in terms of the AEO ability.

The model (or matrix) is a 10x 10 grid, as shown below. The X-axis is the inside-out, the Y-axis represents the outside in.  The analytics behind the model score a company on both measures and provide specific guidance on how to advance a company’s AEO effectiveness.

Each axis runs from 0 to 10. Where your two scores intersect determines your quadrant and defines your starting point for the work ahead. The quadrants indicate a composite view of your readiness and impact.

Emerging: If you are in the lower left you have a clear slate to work from. Many early-stage companies will find themselves there.  And that’s perfectly fine. At this stage in a company’s growth, it may be more urgent to focus on outbound motions and getting the first few customers signed.

Technically Equipped: If you are in this quadrant, thank your technical marketing team. You have a solid foundation to build from. Schemas, structure, and information is presented to AI tools in a moderate to advanced manner.  Your focus should be next on building referencable content, pursuing 3rd party mentions, and strengthening your authority signals.

Unstructured Authority: Companies in this quadrant already have a  strong brand.  They are getting noticed by media and external authorities and are already showing up in AI results just on the strength of their brand.  Now the focus should be on developing the technical foundation, schemas, and content formats that AI tools like to get you to the cited authority quadrant.  It’s almost purely a technical play (in addition to keeping the pedal down on branding).

Cited Authority: You are now showing up in AI searches not only as a viable company to partner with during comparative prompts, you are showing up as an authority to topics that are valued by your ICP. Your content and brand is solid, your infrastructure is working.

AEO Maturity Model showing four quadrants: Emerging, Technically Equipped, Unstructured Authority, and Cited Authority, plotting Technical Readiness against Citation Strength with customer example overlay
AEO Maturity Model

The Inside-Out Perspective

The Inside-Out perspective can be considered the technical side of AEO. The objectives are similar to SEO, and I have found that taking an AEO first approach provides many SEO benefits. Conversely, if you have a strong SEO infrastructure, ensuring AEO readiness is a much shorter journey.  The name of the game here is communicating your content in a way that AI tools can readily consume.

The key dimensions evaluated in the inside-out assessment are:

  • Crawlability and indexability
  • Metadata quality and specificity
  • Heading and semantic structure
  • Schema presence and relevance
  • Entity clarity (company, product, audience, problem)
  • Answer-oriented content structure
  • Evidence and trust signals
  • Internal linking and topical depth
  • Page completeness

Each one of these dimensions are evaluated against a weighted score rubric that produces a score from 1-10.  The multi-faceted model delivers clear guidance on what steps can be taken to improve your score, and thus, the ability for AEO to consume your content. The two most heavily weighted dimensions are schema presence and answer-oriented content structure. These are the fundamentals that tell AI how to read and interpret your content. Get these wrong and the rest of the work is harder.

The inside-out evaluation can be conducted using publicly available sources, so in an engagement, this is usually the first step.  Companies that have a strong SEO foundation usually score well here. But one key issue that I pointed out in the “Dude Where’s My Schema” blog is that schemas that are injected via GTM or some other mechanism are not read by AI tools. This is a big gap that many companies may have when comparing AEO to SEO. The good news is that it only requires a technical fix that can be implemented with minimal disruption.

Most companies that have not optimized for SEO or AEO will start in the 4-5 range.  Companies that have a solid SEO foundation can often start in the 6-7 range.

The Outside-In Perspective

The outside-in model provides a view of how your company is viewed by someone seeking information through an LLM or via an AI preview for browser based search.

When I started researching the best measurement for outside-in, the first advice I received was to simply attach AI webviews to revenue.  And you can indeed look at Google Analytics to see how many clicks you are getting from AI sources.  But this seemed to be a lagging indicator and doesn’t capture the trust signals captured from showing up on AI.

One of the biggest challenges is the zero-click phenomenon.  In a zero-click world, information seekers are far less likely to click on your sitelink, even when you show up at the top of the AI search or AI preview.  There is value in being seen, but measuring the same way we measure SEO is shortchanging the value of appearing.

In traditional SEO, success was often measured as:

ranking → click → visit → conversion

But in AI search, the path is often:

being mentioned → being trusted → being remembered → being searched later → being shortlisted → conversion

The AI journey is a bit more challenging to quantify. The approach I landed on was prompt analysis.  This entails building a representative set of questions that simulate how a real buyer would use AI to find a company like yours, and then systematically tracking how often and how well your company shows up in the answers.

Prompt Analysis - Building the List

The best way that I could see to test how often a company appeared was to use prompt analysis.  Develop a representative prompt map and check how many times a company appears and how often it is cited.

The first step was to build a prompt map that represents a solid cross-section of what individuals are likely to type when looking for solutions from the company in focus. Building the map takes a little time, but it is worth the investment.  Start with well articulated and documented information about the company, including:

  • Messaging and positioning
  • ICP
  • Transcripts from customer calls and interactions
  • Existing SEO keywords (so long as they are current)

If you don’t have #1 and #2 ready, I highly recommend pausing until you do.  They are the core foundation for building a meaningful outside-in perspective.

From these, you can use the help of AI to help you iterate through a series of prompts that can be tested against your favorite LLMs.  Don’t settle on the first list. Test them, challenge them, refine them. Make them yours. I can’t over stress how important this is to get right the first time.  Your prompt map will need to remain stable so that you can test periodically (I recommend monthly) and see the changes in terms of your company’s presence and citations.

When I was describing my process to a colleague, he mentioned that I can’t possibly guess how people will structure their prompts and challenged my methodology. He is right, that this is not perfect. But it is important to note that once you type a prompt into an LLM, it then translates it to its own lingo before it sends the request to the brains of an LLM to get an answer. For example, if I want to know how to find the best hamburger in Budapest, I could ask:

  • “Where is the best hamburger in Budapest”
  • “Who makes the best hamburger in Budapest”
  • “I want the best hamburger in Budapest”

In this case, using the different prompts all pointed me to Smashy Burger on Claude and ChatGPT (I actually prefer Simon Burger, so something to try soon…).  This is because AI tools use their internal ‘translators’ to transform the prompt into a clean instruction by tokenizing (breaking it down into smaller pieces, ignoring typos, etc.), interpreting, normalizing, and eventually rewriting to something it can work with.

The other important consideration (especially when it comes to scoring) is building out the categories so that you have a balanced view of high buyer intent questions with category discovery questions.  Here’s what I use:

  • High Buyer Intent: The buyer is close to a purchase decision and names or implies the company directly. These prompts test whether AI will recommend you when someone is already in the market.
  • Comparative: The buyer is weighing options and wants to see vendors side by side. Tests whether AI includes you in the consideration set when alternatives are being evaluated.
  • Category With Intent: The buyer has a specific problem but hasn't named a vendor. Tests whether AI surfaces you when someone describes the pain you solve, without prompting the category by name.
  • Category Without Intent: Broader industry and trend questions where the buyer is still learning. Tests whether AI mentions you in educational, top-of-funnel contexts where no purchase decision is imminent.
  • Customer Eval: Queries that reference named customers, use cases, or case studies. Tests whether AI can connect your company to the outcomes you've already delivered.
  • Secret Shopper: A single prompt that puts an ICP-representative buyer in evaluation mode and asks AI to assess your company directly. Qualitatively rich; tests AI's overall picture of who you are and whether it would recommend you.

Each one of these carry a different weight in the overall score and each one tells a different story.

The Tedious Work - Testing the Prompts

Your goal should be to simulate customers looking at AI as much as possible. You can:

  1. Do this manually. Type in the prompt and capture the response.
  2. Automate
    1. Write Python (or something similar) to run prompts against the LLMs of interest.
    2. Use a tool like Perplexity Computer to work in agent mode and actually type in the results and give you  a closer approximation to the real user.
    3. Use a commercial tool like Otterly.ai

Here are the pros and cons that I’ve observed. I am sure there are other ways to do it. These are just the ones that I have tried.

Method Pros Cons
Manual Captures real user behavior. Intimate connection with responses gives you a much better feel for how your company is showing up. Takes time. Tedious work.
Automate - Python Script Faster. Responses are not the same as one gets interactively when using LLM APIs. Vendor names do not show up at the same frequency, for example.
Automate - Perplexity Computer Less human time. Not fast. Prone to run-time errors. Expensive.
Automate - Otterly.ai Less human time. Some investment required, but the value seems reasonable.

Which one am I using? I tried the automation, but I reverted back to manual.  I capture the full response and sources in a spreadsheet and then evaluate them with a structured Claude analysis tool that I built.

The analysis tool builds an aggregate score, based on several weighted categories:

  1. Presence: A weighted score (with higher scores for ChatGPT) of the LLMs tested for organic appearance in prompts.
  2. Source citation rate: how often the company in focus was used as a source.
  3. Secret shopper quality: accuracy of description and favorability towards the company.
  4. LinkedIn citation rate: this is becoming a strong citation source and winning companies in B2B need to be there.
  5. Editorial coverage: manually entered as part of the overall brand picture.

The scoring model is not always linear; I am using a logarithmic scale to give more points for first mentions in AI as it is harder to break the ice than it is to incrementally add the mentions.

Citations are as important as the mentions.  The value of checking citations returns not only a share of voice perspective, but also informs you of which content is working best, which media outlets and sources are showing up frequently and which competitors are influencing the results.

Testing Note:

Your objective should be to simulate a prompt entry that approximates what would happen if a user walked up to an LLM and typed in the question. This varies by LLM, but disabling memory, running a new chat thread each time and even deleting old chats are some of the ways to get there.

As with the inside-out analysis, one score is delivered, but it contains a multi-faceted aggregate of contributing measures.

What is the Value of the Maturity Model?

Coming back to where I started: marketing has always had a measurement problem, and AEO is no different. Without a repeatable way to track where you are and whether you're improving, it's hard to justify the investment and even harder to know what to do next.

The maturity model gives you that anchor. A score on each axis, a position in the matrix, and a clear picture of what's holding you back. Run it monthly and you start to see movement (or the absence of it) which is equally informative.

One thing I haven't mentioned yet: you can run the same prompt study on your competitors. That turns a self-assessment into a competitive map. In a category where AI visibility is still up for grabs for most companies, knowing where you stand relative to the field is worth understanding early.

The model will keep evolving. The weights will shift as AI platforms change their behavior, and I'll update the framework as I learn more from actual engagements. If you see something I'm missing or have pushback on the methodology, I'd genuinely like to hear it.

AEO is moving fast and nobody has this completely solved yet. Glad to share my learnings and hope that you find them useful.

Related Glossary Terms

Related blogs from AEO Wrangler

All AEO Wrangler blogs →

Ready to improve your AI visibility?

Get the AEO Readiness Assessment →Or learn more about how we work →