We grade against an answer key
We keep a set of real searches where the correct result is already known — like a test with the answers in the back. After every change, we re-run them and confirm Argand still lands the right page.
Quality, in plain words
Argand has one job: put what you were looking for at the top, the first time. Here is how we hold ourselves to that — in plain English, with no marketing and no hand-waving.
The one promise
You should find exactly what you were looking for, on the first try, every time.
That is the bar. Most search has quietly stopped trying to meet it — burying the answer under ads, lookalike pages, and boxes that keep you from clicking through. Everything below is how we keep score against that one sentence instead.
How we check
Quality has to be measured, or it is just a claim. These run on every change to the search engine — automatically.
We keep a set of real searches where the correct result is already known — like a test with the answers in the back. After every change, we re-run them and confirm Argand still lands the right page.
The measure that matters most isn't "somewhere in the top fifty" — it's whether the best result came up first. We track that number and push it up, search by search.
Every update has to do at least as well as the one before it. If quality slips — even a little — the update is blocked automatically, before anyone searches on it.
The proof
Beyond our own answer keys, we test Argand on public test sets — collections of searches, with agreed-upon correct answers, that researchers use to compare search engines fairly. On medical, scientific, and financial test sets, Argand matches or beats a standard open-source search engine on the same data.
The point isn't the exact decimals — it's that they're checkable. The test data is public, the scoring is public, and the command that produces our numbers is published too, so anyone can run it and get the same result. Quality you can't reproduce is just a poster on the wall.
See the real scores and re-run them yourselfWhere it's weak
First-try search is hard, and Argand is not finished. Long, conversational questions are still slower and weaker than short, direct ones — and we measure those misses on purpose, because the ones we can see are the ones we can fix. When a search comes back empty or thin, that gap is logged (never the wording of your search — see Privacy) so the next version can do better. We would rather show you the rough edges than pretend they aren't there.
What we won't do
Want the deeper dives? See the benchmarks, how little energy a search costs on Energy, and the why on About.