How to vibe-code an SEO tool without losing control of your LLM

20 February 2026 at 19:00

How to vibe-code an SEO tool without losing control of your LLM

We all use LLMs daily. Most of us use them at work. Many of us use them heavily.

People in tech — yes, you — use LLMs at twice the rate of the general population. Many of us spend more than a full day each week using them — yes, me.

Even those of us who rely on LLMs regularly get frustrated when they don’t respond the way we want.

Here’s how to communicate with LLMs when you’re vibe coding. The same lessons apply if you find yourself in drawn-out “conversations” with an LLM UI like ChatGPT while trying to get real work done.

Choose your vibe-coding environment

Vibe coding is building software with AI assistants. You describe what you want, the model generates the code, and you decide whether it matches your intent.

That’s the idea. In practice, it’s often messier.

The first thing you’ll need to decide is which code editor to work in. This is where you’ll communicate with the LLM, generate code, view it, and run it.

I’m a big fan of Cursor and highly recommend it. I started on the free Hobby plan, and that’s more than enough for what we’re doing here.

Fair warning – it took me about two months to move up two tiers and start paying for the Pro+ account. As I mentioned above, I’m firmly in the “over a day a week of LLM use” camp, and I’d welcome the company.

A few options are:

Cursor: This is the one I use, as do most vibe coders. It has an awesome interface and is easily customized.
Windsurf: The main alternative to Cursor. It can run its own terminal commands and self-correct without hand-holding.
Google Antigravity: Unlike Cursor, it moves away from the file-tree view and focuses on letting you direct a fleet of agents to build and test features autonomously.

In my screenshots, I’ll be using Cursor, but the principles apply to any of them. They even apply when you’re simply communicating with LLMs in depth.

Your customers search everywhere. Make sure your brand shows up.

The SEO toolkit you know, plus the AI visibility data you need.

Start Free Trial

Get started with

Why prompting alone isn’t enough

You might wonder why you need a tutorial at all. You tell the LLM what you want, and it builds it, right? That may work for a meta description or a superhero SEO image of yourself, but it won’t cut it for anything moderately complex — let alone a tool or agentic system spanning multiple files.

One key concept to understand is the context window. That’s the amount of content an LLM can hold in memory. It’s typically split across input and output tokens.

GPT-5.2 offers a 400,000-token context window, and Gemini 3 Pro comes in at 1 million. That’s roughly 50,000 lines of code or 1,500 pages of text.

The challenge isn’t just hitting the limit, especially with large codebases. It’s that the more content you stuff into the window, the worse models get at retrieving what’s inside it.

Attention mechanisms tend to favor the beginning and end of the window, not the middle. In general, the less cluttered the window, the better the model can focus on what matters.

If you want a deeper dive into context windows, Matt Pocock has a great YouTube video that explains it clearly. For now, it’s enough to understand placement and the cost of being verbose.

A few other tips:

One team, one dream. Break your project into logical stages, as we’ll do below, and clear the LLM’s memory between them.
Do your own research. You don’t need to become an expert in every implementation detail, but you should understand the directional options for how your project could be built. You’ll see why shortly.
When troubleshooting, trust but verify. Have the model explain what’s happening, review it carefully, and double-check critical details in another browser window.

Dig deeper: How vibe coding is changing search marketing workflows

Tutorial: Let’s vibe-code an AI Overview question extraction system

How do you create content that appears prominently in an AI Overview? Answer the questions the overview answers.

In this tutorial, we’ll build a tool that extracts questions from AI Overviews and stores them for later use. While I hope you find this use case valuable, the real goal is to walk through the stages of properly vibe coding a system. This isn’t a shortcut to winning an AI Overview spot, though it may help.

Step 1: Planning

Before you open Cursor — or your tool of choice — get clear on what you want to accomplish and what resources you’ll need. Think through your approach and what it’ll take to execute.

While I noted not to launch Cursor yet, this is a fine time to use a traditional search engine or a generative AI.

I tend to start with a simple sentence or two in Gemini or ChatGPT describing what I’m trying to accomplish, along with a list of the steps I think the system might need to go through. It’s OK to be wrong here. We’re not building anything yet.

For example, in this case, I might write:

I’m an SEO, and I want to use the current AI Overviews displayed by Google to inspire the content our authors will write. The goal is to extract the implied questions answered in the AI Overview. Steps might include:

1 – Select a query you want to rank for.
2 – Conduct a search and extract the AI Overview.
3 – Use an LLM to extract the implied questions answered in the AI Overview.
4 – Write the questions to a saveable location.

With this in hand, you can head to your LLM of choice. I prefer Gemini for UI chats, but any modern model with solid reasoning capabilities should work.

Start a new chat. Let the system know you’ll be building a project in Cursor and want to brainstorm ideas. Then paste in the planning prompt.

The system will immediately provide feedback, but not all of it will be good or in scope. For example, one response suggested tracking the AI Overview over time and running it in its own UI. That’s beyond what we’re doing here, though it may be worth noting.

It’s also worth noting that models don’t always suggest the simplest path. In one case, it proposed a complex method for extracting AI Overviews that would likely trigger Google’s bot detection. This is where we go back to the list we created above.

Step 1 will be easy. We just need a field to enter keywords.

Step 2 could use some refinement. What’s the most straightforward and reliable way to capture the content in an AI Overview? Let’s ask Gemini.

I’m already familiar with these services and frequently use SerpAPI, so I’ll choose that one for this project. The first time I did this, I reviewed options, compared pricing, and asked a few peers. Making the wrong choice early can be costly.

Step 3 also needs a closer look. Which LLMs are best for question extraction?

That said, I don’t trust an LLM blindly, and for good reason. In one response, Claude 4.6 Opus, which had recently been released, wasn’t even considered.

After a couple of back-and-forth prompts, I told Gemini:

“Now, be critical of your suggestions and the benchmarks you’ve selected.”
“The text will be short, so cost isn’t an issue.”

We then came around to:

For this project, we’re going with GPT-5.2, since you likely have API access or, at the very least, an OpenAI account, which makes setup easy. Call it a hunch. I won’t add an LLM judge in this tutorial, but in the real world, I strongly recommend it.

Now that we’ve done the back-and-forth, we have more clarity on what we need. Let’s refine the outline:

I’m an SEO, and I want to use the current AI Overviews displayed by Google to inspire the content our authors will write. The idea is to extract the implied questions answered in the AI Overview. Steps might include:

1 – Select a query you want to rank for.
2 – Conduct a search and extract the AI Overview using SerpAPI.
3 – Use GPT-5.2 Thinking to extract the implied questions answered in the AI Overview.
4 – Write the query, AI Overview, and questions to W&B Weave.

Before we move on, make sure you have access to the three services you’ll need for this:

SerpAPI: The free plan will work.
OpenAI API: You’ll need to pay for this one, but $5 will go a long way for this use case. Think months.
Weights & Biases: The free plan will work. (Disclosure: I’m the head of SEO at Weights & Biases.)

Now let’s move on to Cursor. I’ll assume you have it installed and a project set up. It’s quick, easy, and free.

The screenshots that follow reflect my preferred layout in Editor Mode.

Step 2: Set the groundwork

If you haven’t used Cursor before, you’re in for a treat. One of its strengths is access to a range of models. You can choose the one that fits your needs or pick the “best” option based on leaderboards.

I tend to gravitate toward Gemini 3 Pro and Claude 4.6 Opus.

If you don’t have access to all of them, you can select the non-thinking models for this project. We also want to start in Plan mode.

Let’s begin with the project prompt we defined above.

Note: You may be asked whether you want to allow Cursor to run queries on your behalf. You’ll want to allow that.

Now it’s time to go back and forth to refine the plan that the model developed from our initial prompt. Because this is a fairly straightforward task, you might think we could jump straight into building it, which would be bad for the tutorial and in practice. If you thought that, you’d be wrong. Humans like me don’t always communicate clearly or fully convey our intent. This planning stage is where we clarify that.

When I enter the instructions into the Cursor chat in Planning mode, using Sonnet 4.5, it kicks off a discussion. One of the great things about this stage is that the model often surfaces angles I hadn’t considered at the outset. Below are my replies, where I answer each question with the applicable letter. You can add context after the letter if needed.

An example of the model suggesting angles I hadn’t considered appears in question 4 above. It may be helpful to pass along the context snippets. I opted for B in this case. There are obvious cases for C, but for speed and token efficiency, I retrieve as little as possible. Intent and related considerations are outside the scope of this article and would add complexity, as they’d require a judge.

The system will output a plan. Read it carefully, as you’ll almost certainly catch issues in how it interpreted your instructions. Here’s one example.

I’m told there is no GPT-5.2 Thinking. There is, and it’s noted in the announcement. I have the system double-check a few details I want to confirm, but otherwise, the plan looks good. Claude also noted the format the system will output to the screen, which is a nice touch and something I hadn’t specified. That’s what partners are for.

Finally, I always ask the model to think through edge cases where the system might fail. I did, and it returned a list. From that list, I selected the cases I wanted addressed. Others, like what to do if an AI Overview exceeds the context window, are so unlikely that I didn’t bother.

A few final tweaks addressed those items, along with one I added myself: what happens if there is no AI Overview?

Cursor - what happens if there is no AI Overview?

I have to give credit to Tarun Jain, whom I mentioned above, for this next step. I used to copy the outline manually, but he suggested simply asking the model to generate a file with the plan. So let’s direct it to create a markdown file, plan.md, with the following instruction:

Build a plan.md including the reviewed plan and plan of action for the implementation.

Remember the context window issue I discussed above? If you start building from your current state in Cursor, the initial directives may end up in the middle of the window, where they’re least accessible, since your project brainstorming occupies the beginning.

To get around this, once the file is complete, review it and make sure it accurately reflects what you’ve brainstormed.

Step 3: Building

Now we get to build. Start a new chat by clicking the + in the top right corner. This opens a new context window.

This time, we’ll work in Agent mode, and I’m going with Gemini 3 Pro.

Arguably, Claude 4.6 Opus might be a technically better choice, but I find I get more accurate responses from Gemini based on how I communicate. I work with far smarter developers who prefer Claude and GPT. I’m not sure whether I naturally communicate in a way that works better with Gemini or if Google has trained me over the years.

First, tell the system to load the plan. It immediately begins building the system, and as you’ll see, you may need to approve certain steps, so don’t step away just yet.

Once it’s done, there are only a couple of steps left, hopefully. Thankfully, it tells you what they are.

First, install the required libraries. These include the packages needed to run SerpAPI, GPT, Weights & Biases, and others. The system has created a requirements.txt file, so you can install everything in one line.

Note: It’s best to create a virtual environment. Think of this as a container for the project, so downloaded dependencies don’t mix with those from other projects. This only matters if you plan to run multiple projects, but it’s simple to set up, so it’s worth doing.

Open a terminal:

Then enter the following lines, one at a time:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

You’re creating the environment, activating it, and installing the dependencies inside it. Keep the second command handy, since you’ll need it any time you reopen Cursor and want to run this project.

You’ll know you’re in the correct environment when you see (.venv) at the beginning of the terminal prompt.

When you run the requirements.txt installation, you’ll see the packages load.

Next, rename the .env.example file to .env and fill in the variables.

The system can’t create a .env file, and it won’t be included in GitHub uploads if you go that route, which I did and linked above. It’s a hidden file used to store your API keys and related credentials, meaning information you don’t want publicly exposed. By default, mine looks like this.

I’ll fill in my API keys, sorry, can’t show that screen, and then all that’s left is to run the script.

To do that, enter this in the terminal:

python main.py "your search query"

If you forget the command, you can always ask Cursor.

Oh no … there’s a problem!

I’m building this as we go, so I can show you how to handle hiccups. When I ran it, I hit a critical one.

It’s not finding an AI Overview, even though the phrase I entered clearly generates one.

Thankfully, I have a wide-open context window, so I can paste:

An image showing that the output is clearly wrong.
The code output illustrates what the system is finding.
A link (or sometimes simply text) with additional information to direct the solution.

Fortunately, it’s easy to add terminal output to the chat. Select everything from your command through the full error message, then click “Add to Chat.”

It’s important not to rely solely on LLMs to find the information you need. A quick search took me to the AI Overview documentation from SerpAPI, which I included in my follow-up instructions to the model.

My troubleshooting comment looks like this.

Notice I tell Cursor not to make changes until I give the go-ahead. We don’t want to fill up the context window or train the model to assume its job is to make mistakes and try fixes in a loop. We reduce that risk by reviewing the approach before editing files.

Glad I did. I had a hunch it wasn’t retrieving the code blocks properly, so I added one to the chat for additional review. Keep in mind that LLMs and bots may not see everything you see in a browser. If something is important, paste it in as an example.

Now it’s time to try again.

Excellent, it’s working as we hoped.

Now we have a list of all the implied questions, along with the result chunks that answer them.

Dig deeper: Inspiring examples of responsible and realistic vibe coding for SEO

Logging and tracing your outputs

It’s a bit messy to rely solely on terminal output, and it isn’t saved once you close the session. That’s what I’m using Weave to address.

Weave is, among other things, a tool for logging prompt inputs and outputs. It gives us a permanent place to review our queries and extracted questions. At the bottom of the terminal output, you’ll find a link to Weave.

There are two traces to watch. The first is what this was all about: the analyze_query trace.

In the inputs, you can see the query and model used. In the outputs, you’ll find the full AI Overview, along with all the extracted questions and the content each question came from. You can view the full trace here, if you’re interested.

Now, when we’re writing an article and want to make sure we’re answering the questions implied by the AI Overview, we have something concrete to reference.

The second trace logs the prompt sent to GPT-5.2 and the response.

This is an important part of the ongoing process. Here you can easily review the exact prompt sent to GPT-5.2 without digging through the code. If you start noticing issues in the extracted questions, you can trace the problem back to the prompt and get back to vibing with your new friend, Cursor.

See the complete picture of your search visibility.

Track, optimize, and win in Google and AI search from one platform.

Start Free Trial

Get started with

Structure beats vibes

I’ve been vibe coding for a couple of years, and my approach has evolved. It gets more involved when I’m building multi-agent systems, but the fundamentals above are always in place.

It may feel faster to drop a line or two into Cursor or ChatGPT. Try that a few times, and you’ll see the choice: give up on vibe coding — or learn to do it with structure.

Keep the vibes good, my friends.

Normal view