Who we are
ArgandBot is the web crawler for Argand, an independent search engine built by one person. Argand's design principles forbid content theft: no knowledge panels, no inline answer boxes scraped from your pages, no "zero-click" tricks. We index so that searchers can find and visit you — every result is a link out to its source.
Every request ArgandBot makes identifies itself with this User-Agent:
Mozilla/5.0 (compatible; ArgandBot/1.0; +https://argand.org/bot)What ArgandBot fetches, and why
- HTML pages only. It checks
Content-Typebefore reading a body and never downloads images, video, audio, or archives it can't index. Bodies are capped at 5 MB. - robots.txt — before anything else on your host, cached at most 24 hours.
- Sitemaps listed in your robots.txt, fetched with conditional requests so unchanged sitemaps cost you a 304 and no bytes.
- Recrawls of pages already in the index, to keep results fresh — also conditional (
If-None-Match/If-Modified-Since), so an unchanged page costs almost nothing.
It never logs in, never submits forms, never bypasses paywalls or members-only areas, and carries no cookies. If a page isn't visible to a logged-out visitor, ArgandBot doesn't see it either.
Politeness guarantees
- robots.txt is law. Parsed per RFC 9309 and re-checked within 24 hours. If robots.txt is unreachable (server errors), ArgandBot assumes it is not allowed to crawl — it fails closed.
Crawl-delayhonored (up to 30 seconds between requests, per directive).- At most ~1 request per second per site by default — usually far less — with one connection per host, and all subdomains of your site sharing a single budget.
- Backs off on trouble. HTTP 429 and
Retry-Afterare respected; errors and slow responses widen the delay automatically. - Conditional GET on recrawls — unchanged pages answer 304 and transfer no body.
- Compressed transfer (gzip / brotli) on every request, keeping your bandwidth bill small.
- Page-level directives respected:
noindex(meta tag orX-Robots-Tagheader) keeps a fetched page out of the index entirely;nofollowkeeps its links out of the crawl frontier.
How to opt out
Site-wide: add this to your robots.txt.
It takes effect within the 24-hour robots cache, usually much sooner:
User-agent: ArgandBot
Disallow: / (A User-agent: Argand section works too, and directives
for * are honored when no ArgandBot-specific section
exists.)
Per page: either of these keeps a page out of Argand's index:
<meta name="robots" content="noindex"> X-Robots-Tag: noindex A noindex page is never published to our index — the
directive is honored at fetch time, not filtered later. You can also
scope it to us alone with <meta name="argandbot" …> or X-Robots-Tag: argandbot: noindex.
Manual removal: email crawler@argand.org with the URLs or domain and we'll remove them and stop crawling — no robots.txt changes required.
Verifying it's really us
- The User-Agent is exactly the string shown above, linking to this page.
- ArgandBot currently crawls from Argand's own infrastructure at
135.181.231.218(Hetzner, Finland). A request claiming to be ArgandBot from any other address is an impostor — block it freely. - The real ArgandBot always fetches
/robots.txtbefore your pages and obeys it. Impostors rarely bother.
If the crawl origin changes or grows, this page is updated first — it is the canonical record of ArgandBot's identity.
Contact
Anything crawler-related — removal requests, rate concerns, odd behavior in your logs — goes to crawler@argand.org. It's read by the person who wrote the crawler; expect a reply, not a ticket number.