Myriam Jessier requested Google about what could be good attributes of an online crawler. Through which each Martin Splitt and Gary Illyes gave some responses to.
Myriam Jessier requested on Bluesky, “what are the great attributes? One ought to look into when choosing a crawler to test issues on a web site for search engine optimization and gen AI search?”
Martin Splitt from Google replied with this listing of attributes:
- help http/2
- declare identification within the consumer agent
- respect robots.txt
- backoff if the server slows
- comply with caching directives*
- affordable retry mechanisms
- comply with redirects
- deal with errors gracefully*
Gary Illyes from Google forwarded the dialog to a brand new IETF doc that talks about Crawler greatest practices. Gary wrote that this doc was posted a couple of weeks in the past.
It covers the really helpful greatest practices together with:
- Crawlers should help and respect the Robots Exclusion Protocol.
- Crawlers have to be simply identifiable by means of their consumer agent string.
- Crawlers should not intrude with the common operation of a web site.
- Crawlers should help caching directives.
- Crawlers should expose the IP ranges they’re crawling from in a standardized format.
- Crawlers should expose a web page that explains how the crawled knowledge is used and the way it may be blocked.
Try that full doc over right here – you may see that Gary Illyes co-authored it however not underneath Google’s title.
Discussion board dialogue at Bluesky.
Picture credit score to Lizzi