Start free

AEO guide

Answer Engine Optimization (AEO): A Practical Playbook

Answer engines like ChatGPT, Perplexity, and Google's AI Overviews don't rank your page — they extract a piece of it and hand that piece to the user, often without a click. This guide covers what that means for how you write and structure pages, and a concrete set of steps you can run this week to make your content easier to pull out and cite.

11 min readUpdated 2026

In this guide

  1. What answer engines actually do differently
  2. How to structure a page so it can be extracted
  3. The technical basics that still matter
  4. A one-week AEO playbook
  5. Mistakes that make pages hard to cite

What answer engines actually do differently

A traditional search crawler indexes your page, computes a relevance score against a query, and returns a ranked list of links. Your job under that system was to rank as high as possible for the terms your buyers search. That job hasn't disappeared, but a second, different job has shown up next to it.

Answer engines — the model behind ChatGPT's browsing mode, Perplexity's search, Google's AI Overviews, and the voice assistants that read a single result out loud — don't return a list. They read a handful of pages, pull out the specific sentence or paragraph that answers the question, and synthesize an answer. Some of them cite the source; all of them decide, sentence by sentence, whether your page contained something extractable enough to use. Ranking tenth on a normal search result page might still get you clicked. Ranking tenth in an answer engine's source set gets you nothing, because the model only quotes from what it judged to be the clearest, most directly relevant passage it found.

This changes the unit of competition. You're no longer competing to be the best overall page on a topic — you're competing to have the single best-formed answer to a specific question, sitting in a place the model can find and lift cleanly. A page can be comprehensive, well-designed, and full of good information, and still lose to a shorter, worse-written page because the worse page states its answer plainly in the first two sentences and the good page buries it in paragraph six.

The practical shift is this: write for extraction first, and persuasion second. A human reader will forgive throat-clearing, scene-setting, and a slow build to the point. A model doing extraction is looking for the most quotable, self-contained statement of fact on the page, and it will grab whichever passage qualifies fastest — yours or a competitor's.

How to structure a page so it can be extracted

Extractability is mostly a formatting discipline, not a writing-talent problem. The pattern that works is consistent across every answer engine we've looked at: state the question as a heading, answer it in the first one or two sentences that follow, then back the answer up with specifics.

Lead with the question, then the answer

Use the actual question as an h2 or h3 — phrased the way a person would type or ask it, not the way a marketer would title a section. "Pricing" is a bad heading. "How much does [X] cost?" is a good one, because it mirrors the query pattern the answer engine is trying to match. Directly beneath that heading, answer the question in plain language before you do anything else. Don't open with context, a definition of adjacent terms, or a caveat. State the answer, then support it.

A useful test: if you deleted every sentence on the page except the first sentence under each heading, would a stranger understand the core answer to each question? If the first sentence is throat-clearing ("There are many factors to consider when thinking about...") the page fails the test, and so does the extraction.

Follow the answer with specifics, not more generality

Once you've stated the direct answer, the next two or three sentences should add the specific detail that makes the answer credible and useful: numbers, conditions, exceptions, a short example. This is also where you differentiate from every other page answering the same question — most competitors state a vague version of the answer, so specificity is often the whole game. "It depends on your plan" is not useful. "Free plans cap at 3 seats; paid plans start at 10" is.

Keep facts consistent across every page that states them

Answer engines frequently cross-reference multiple pages on your own site (and elsewhere) before settling on an answer. If your pricing page says one thing and your FAQ says another, or your About page gives a different founding year than your press page, that inconsistency doesn't just look sloppy to a human — it makes a model less likely to treat any single page as authoritative, because it can't resolve the conflict. Before you publish anything meant to be cited, grep your own site for the numbers and claims you're stating and make sure they match everywhere.

The technical basics that still matter

None of the structural advice above works if the page can't be crawled and parsed in the first place. Answer engines generally rely on the same crawl infrastructure as search engines (some run their own bots, some piggyback on existing search indexes), so the fundamentals you'd apply for ordinary SEO still apply here — they're just non-negotiable now, because a model can't extract from a page it can't read.

  • Crawlability. Make sure your robots.txt isn't blocking the bots you want (GPTBot, PerplexityBot, Google-Extended, and similar), your key pages return real content on first load rather than depending entirely on client-side JavaScript, and there's nothing behind a login wall that you actually want cited.
  • Structured data. FAQPage and HowTo JSON-LD schema won't force a model to use your content, but they give it an unambiguous, pre-parsed question-and-answer pair to work with, which lowers the effort required to extract from your page versus a competitor's plain paragraph. If a page answers several distinct questions, mark each one up as its own FAQ entry rather than one long blob.
  • Clean HTML. Real heading tags (h1, h2, h3) in a logical order, not styled divs pretending to be headings. Answers in actual paragraph or list elements, not text baked into images or rendered only via JavaScript after the initial page load. A model parsing your DOM should be able to tell, structurally, where one question ends and the next begins.
  • One clear h1, and headings that match real questions. Don't reuse the same heading text on ten pages hoping one will rank — that dilutes which page a model considers the canonical source for that question.
  • Fast, stable pages. Slow or unstable pages get crawled less often and less deeply. This isn't unique to AEO, but it's still a filter you have to clear before any of the content-level work matters.

A one-week AEO playbook

This is a sequence you can actually run without hiring anyone. It assumes you already have a live site with some content on it.

  1. List the real questions your buyers ask. Pull these from sales calls, support tickets, Reddit and Quora threads in your space, and the "People also ask" boxes on Google for your core terms. Write them exactly as asked, not as you'd prefer to frame them. This list is your content roadmap — not "features," but questions.
  2. Audit your existing pages against that list. For each question, check: do you have a page or section that answers it? Is the answer in the first two sentences under a heading that matches the question? If not, that's your first batch of edits — often you don't need new pages, just better first sentences.
  3. Rewrite the top 10 answers for extraction. Pick your ten highest-intent questions and rewrite each one so the heading is the question, the first sentence is the direct answer, and the next two or three sentences are specific supporting detail. Cut any throat-clearing before the answer.
  4. Reconcile facts across the site. Search your own site for every number, claim, and date that shows up in more than one place — pricing, limits, integrations, team size, founding year — and make sure they match everywhere. Fix the ones that don't before you do anything else.
  5. Add FAQPage schema to your highest-value pages. Start with pricing, comparison pages, and your top support docs. Mark up the literal question and the literal answer text, not a summary of it.
  6. Check crawlability. Confirm robots.txt isn't blocking AI crawlers you want, fetch a few key pages with JavaScript disabled to see what a bot actually sees, and confirm your sitemap includes the pages you just edited.
  7. Ask the answer engines yourself and read what they cite. Query ChatGPT, Perplexity, and Google's AI Overview with your target questions. If a competitor gets cited and you don't, open their page and compare its first two sentences to yours — usually the gap is obvious once you look.
  8. Build out the gaps you found. If competitors are getting cited on questions you don't cover at all, that's a landing page or FAQ entry you're missing, not just a wording problem. This is also where a tool like Wally is useful — it can research the questions people are actually asking about your category, draft the page or FAQ answer in an extractable structure, and queue it for your review, so the research-and-drafting step doesn't eat a full day.
  9. Re-check monthly, not once. Answer engines resample sources over time, and competitors update their pages too. Put a recurring reminder on the calendar to re-run step 7 and catch drift before it costs you a citation you used to have.

Mistakes that make pages hard to cite

The most common failure is a buried answer. The page technically contains the right information, but it's the fourth sentence of the third paragraph, preceded by a story about why the question matters. A model extracting under time and token pressure will often grab a worse but more directly stated answer from a competitor over a better but buried one from you.

The second most common failure is inconsistency across your own pages. If your pricing page, your FAQ, and a blog post each state your free-tier limit slightly differently, you've made it harder for any model to trust a single page as the source of truth — and easier for it to skip citing you altogether in favor of a source that doesn't contradict itself.

The third is thin or generic content that answers the question technically but not usefully — "it varies," "many factors affect this," "contact us for details." These sentences are extractable in the sense that they're short and sit near the heading, but they carry no information, so a model that does grab them produces a useless answer, which is bad for the user and doesn't build any lasting advantage for you even when it happens. Specificity is what makes an extracted answer worth citing again next time the question comes up.

Related reading