Normal view

Before yesterdayMain stream

Google explains how crawling works in 2026

31 March 2026 at 20:52

Gary Illyes from Google shared some more details on Googlebot, Google’s crawling ecosystem, fetching and how it processes bytes.

The article is named Inside Googlebot: demystifying crawling, fetching, and the bytes we process.

Googlebot. Google has many more than one singular crawler, it has many crawlers for many purposes. So referencing Googlebot as a singular crawler, might not be super accurate anymore. Google documented many of its crawlers and user agents over here.

Limits. Recently, Google spoke about its crawling limits. Now, Gary Illyes dug into it more. He said:

  • Googlebot currently fetches up to 2MB for any individual URL (excluding PDFs).
  • This means it crawls only the first 2MB of a resource, including the HTTP header.
  • For PDF files, the limit is 64MB.
  • Image and video crawlers typically have a wide range of threshold values, and it largely depends on the product that they’re fetching for.
  • For any other crawlers that don’t specify a limit, the default is 15MB regardless of content type.

Then what happens when Google crawls?

  1. Partial fetching: If your HTML file is larger than 2MB, Googlebot doesn’t reject the page. Instead, it stops the fetch exactly at the 2MB cutoff. Note that the limit includes HTTP request headers.
  2. Processing the cutoff: That downloaded portion (the first 2MB of bytes) is passed along to our indexing systems and the Web Rendering Service (WRS) as if it were the complete file.
  3. The unseen bytes: Any bytes that exist after that 2MB threshold are entirely ignored. They aren’t fetched, they aren’t rendered, and they aren’t indexed.
  4. Bringing in resources: Every referenced resource in the HTML (excluding media, fonts, and a few exotic files) will be fetched by WRS with Googlebot like the parent HTML. They have their own, separate, per-URL byte counter and don’t count towards the size of the parent page.

How Google renders these bytes. When the crawler accesses these bytes, it then passes it over to WRS, the web rendering service. “The WRS processes JavaScript and executes client-side code similar to a modern browser to understand the final visual and textual state of the page. Rendering pulls in and executes JavaScript and CSS files, and processes XHR requests to better understand the page’s textual content and structure (it doesn’t request images or videos). For each requested resource, the 2MB limit also applies,” Google explained.

Best practices. Google listed these best practices:

  • Keep your HTML lean: Move heavy CSS and JavaScript to external files. While the initial HTML document is capped at 2MB, external scripts, and stylesheets are fetched separately (subject to their own limits).
  • Order matters: Place your most critical elements — like meta tags, <title> elements, <link> elements, canonicals, and essential structured data — higher up in the HTML document. This ensures they are unlikely to be found below the cutoff.
  • Monitor your server logs: Keep an eye on your server response times. If your server is struggling to serve bytes, our fetchers will automatically back off to avoid overloading your infrastructure, which will drop your crawl frequency.

Podcast. Google also had a podcast on the topic, here it is:

💾

Google went through crawling, fetching, and the bytes it processes.

ChatGPT enables location sharing for more precise local responses

31 March 2026 at 17:43

OpenAI now allows users of ChatGPT to share their device location so that ChatGPT can know more precisely where the user is and serve better answers and results based on that location.

The feature is called location sharing, OpenAI wrote, “Sharing your device location is completely optional and off until you choose to enable it. You can update device location sharing in Settings > Data Controls at any time.”

What it does. If ChatGPT knows your location, it can return better local results. OpenAI wrote:

  • “Precise location means ChatGPT can use your device’s specific location, such as an exact address, to provide more tailored results.”
  • “For example, if you ask “what are the best coffee shops near me?”, ChatGPT can use your precise location to provide more relevant nearby results. On mobile devices, you can choose to toggle off precise location separately while keeping approximate device location sharing on for additional control.”

Privacy. OpenAI said “ChatGPT deletes precise location data after it’s used to provide a more relevant response.” Here is how ChatGPT uses that information:

  • “If ChatGPT’s response includes information related to your specific location, such as the names of nearby restaurants or maps, that information becomes part of your conversation like any other response and will remain in your chat history unless you delete the conversation.”

Does it work. Does this work? Well, maybe not as well as you’d expect. Here is an example from Glenn Gabe:

I shared about the "Near Me ChatGPT Update" the other day and just let ChatGPT use my device location. This is supposed to enhance results for local queries. I just asked for the "best steakhouses near me" and several of the restaurants are ~45 minutes away. Both restaurants… pic.twitter.com/gRkMeuzMQt

— Glenn Gabe (@glenngabe) March 30, 2026

Why we care. Making ChatGPT local results better is a bit deal in local search and local SEO. Knowing the users location and better yet, precise location, can result in better local results.

Hopefully this will result in ChatGPT responding with more useful local results for users.

❌
❌