ArgandBot — Argand's web crawler

Who we are

ArgandBot is the web crawler for Argand, an independent search engine built by one person. Argand's design principles forbid content theft: no knowledge panels, no inline answer boxes scraped from your pages, no "zero-click" tricks. We index so that searchers can find and visit you — every result is a link out to its source.

Every request ArgandBot makes identifies itself with this User-Agent:

Mozilla/5.0 (compatible; ArgandBot/1.0; +https://argand.org/bot)

What ArgandBot fetches, and why

HTML pages only. It checks Content-Type before reading a body and never downloads images, video, audio, or archives it can't index. Bodies are capped at 5 MB.
robots.txt — before anything else on your host, cached at most 24 hours.
Sitemaps listed in your robots.txt, fetched with conditional requests so unchanged sitemaps cost you a 304 and no bytes.
Recrawls of pages already in the index, to keep results fresh — also conditional (If-None-Match / If-Modified-Since), so an unchanged page costs almost nothing.

It never logs in, never submits forms, never bypasses paywalls or members-only areas, and carries no cookies. If a page isn't visible to a logged-out visitor, ArgandBot doesn't see it either.

Politeness guarantees

robots.txt is law. Parsed per RFC 9309 and re-checked within 24 hours. If robots.txt is unreachable (server errors), ArgandBot assumes it is not allowed to crawl — it fails closed.
Crawl-delay honored (up to 30 seconds between requests, per directive).
At most ~1 request per second per site by default — usually far less — with one connection per host, and all subdomains of your site sharing a single budget.
Backs off on trouble. HTTP 429 and Retry-After are respected; errors and slow responses widen the delay automatically.
Conditional GET on recrawls — unchanged pages answer 304 and transfer no body.
Compressed transfer (gzip / brotli) on every request, keeping your bandwidth bill small.
Page-level directives respected: noindex (meta tag or X-Robots-Tag header) keeps a fetched page out of the index entirely; nofollow keeps its links out of the crawl frontier.

How to opt out

Site-wide: add this to your robots.txt. It takes effect within the 24-hour robots cache, usually much sooner:

User-agent: ArgandBot
Disallow: /

(A User-agent: Argand section works too, and directives for * are honored when no ArgandBot-specific section exists.)

Per page: either of these keeps a page out of Argand's index:

<meta name="robots" content="noindex">

X-Robots-Tag: noindex

A noindex page is never published to our index — the directive is honored at fetch time, not filtered later. You can also scope it to us alone with <meta name="argandbot" …> or X-Robots-Tag: argandbot: noindex.

Manual removal: email crawler@argand.org with the URLs or domain and we'll remove them and stop crawling — no robots.txt changes required.

Verifying it's really us

The User-Agent is exactly the string shown above, linking to this page.
ArgandBot currently crawls from Argand's own infrastructure at 135.181.231.218 (Hetzner, Finland). A request claiming to be ArgandBot from any other address is an impostor — block it freely.
The real ArgandBot always fetches /robots.txt before your pages and obeys it. Impostors rarely bother.

If the crawl origin changes or grows, this page is updated first — it is the canonical record of ArgandBot's identity.

Contact

Anything crawler-related — removal requests, rate concerns, odd behavior in your logs — goes to crawler@argand.org. It's read by the person who wrote the crawler; expect a reply, not a ticket number.