The receipts
Measured, not asserted.
Argand's quality claims are reproducible. Same public test sets, same answer keys, same scoring as the academic reference engine. Here is how it does, and how it compares.
Measured, not vibed
How do we know it's any good? We measured.
Search researchers have public test sets: a list of real queries plus a list of the right answers for each one. You run your engine on the queries, score the answers, get a number. The higher the number, the better the engine. We ran Argand's text-matching engine on three of these public tests, and put the scores next to the reference engine that researchers compare against. Argand matches it or beats it on all three. No fancy hardware required.
Step one: the plain keyword search
Scores are , the standard search-quality score. Higher is better, max 1.0. All runs on a regular CPU, no graphics card needed. The test sets are the public BEIR collections that academic search researchers use as the common yardstick.
The table above is step one: plain keyword matching, lined up against the standard engine researchers compare against. Argand does not stop there. Step two, below, is a careful re-read of the best matches, and that is where the scores climb. It is the same tests measured at two points, so a set like SciFact appears in both: 0.687 after the keyword search, 0.742 after the re-read. Not a contradiction, just the second step doing its job.
Step two: the re-read
Then it reads the top results more closely.
Matching keywords is fast but rough. A second, small model re-reads the top results and reorders them by how well they actually answer the question. It was never trained or tuned on these tests. The lift below is what that one extra pass buys.
The right answer, first try
How often a right answer lands in the very first slot. On finance questions it goes from about one in five to nearly one in two.
Overall quality, top ten results. Higher is better, max 1.0.
Small enough for a potato. This reranker has 68 million parameters and still beats a model eight times its size on the science test, 0.742 to 0.732. Tiny and better.
The version that runs on a plain CPU reproduces the original model to the decimal. Nothing heavy sits in the path of your search.
Reranking does not help everywhere. On one set, scientific document retrieval, it came out a touch worse. Across these public tests it lifts quality by 0.084 on average. We measured all of it, including the part that did not move.
How the numbers are reproducible. What's checkable today vs. at launch.
A benchmark needs three things to be reproducible: the test data, the scoring method, and the engine. The first two are public today. The test data is the BEIR collection that academic researchers use as the common yardstick. The scoring method is the standard one they all share. The "Reference" column above uses a publicly-available retrieval engine on that same public data, so anyone can rerun it today and check that the reference number is honest.
The engine itself, the part that produces the "Argand" column, is still being finished and the source is private. It goes public at launch (planned mid-2026), and from that day on the command below will print the same scores from the same public test sets, on a regular laptop. We're showing the command in advance so the method is on record before the source is. If the table above ever changes without the command and the test data changing, that's a tell.
cd ~/argand
ORT_DYLIB_PATH=$(pwd)/lib/onnxruntime-linux-x64-gpu-1.26.0/lib/libonnxruntime.so \
./target/release/beir_pipeline_bench \
--corpus eval/datasets/nfcorpus/corpus.jsonl \
--queries eval/datasets/nfcorpus/queries.jsonl \
--qrels eval/datasets/nfcorpus/qrels/test.tsv \
--retrieve-k 100 --no-rerank \
--out eval/results/beir_nfcorpus_bm25only.json The result file is a single JSON with nDCG@1, nDCG@5, nDCG@10, MRR@10, Recall@100, and MAP. The numbers in the table above are lifted verbatim from runs of this command.
BEIR canonical datasets are from the UKP Darmstadt mirror.
SciFact and FiQA swap in their own corpus / queries / qrels paths.
The SPLADE-v3 row adds --splade-model models/splade/splade-v3-8bit.onnx and drops --no-rerank. Every metric we cite has a
corresponding stored JSON in eval/results/.
How we compare
Every search engine makes different trade-offs.
Here is what each one actually does.
| Feature Tap ? for details | Bing | Kagi | DDG | Argand | |
|---|---|---|---|---|---|
| AI-generated answers | No | ||||
| Owns its own index | |||||
| Drives traffic to source sites | No | Partial | Mostly | ||
| Behavioural ad targeting | No | No | |||
| Zero retention for searches | No | Configurable | Yes | ||
| Business model | Behavioural ads | Subscription | Contextual ads | ||
| Built in Rust | n/a | n/a | n/a | n/a | |
| Open source |
Tap an engine, or swipe.
| Feature | Argand | |
|---|---|---|
| AI-generated answers | ||
| Owns its own index | ||
| Drives traffic to source sites | ||
| Behavioural ad targeting | ||
| Zero retention for searches | ||
| Business model | ||
| Built in Rust | n/a | |
| Open source |
| Feature | Bing | Argand |
|---|---|---|
| AI-generated answers | ||
| Owns its own index | ||
| Drives traffic to source sites | No | |
| Behavioural ad targeting | ||
| Zero retention for searches | No | |
| Business model | Behavioural ads | |
| Built in Rust | n/a | |
| Open source |
| Feature | Kagi | Argand |
|---|---|---|
| AI-generated answers | ||
| Owns its own index | ||
| Drives traffic to source sites | Partial | |
| Behavioural ad targeting | No | |
| Zero retention for searches | Configurable | |
| Business model | Subscription | |
| Built in Rust | n/a | |
| Open source |
| Feature | DDG | Argand |
|---|---|---|
| AI-generated answers | No | |
| Owns its own index | ||
| Drives traffic to source sites | Mostly | |
| Behavioural ad targeting | No | |
| Zero retention for searches | Yes | |
| Business model | Contextual ads | |
| Built in Rust | n/a | |
| Open source |
* Argand is in active development. These values reflect the design intent and current implementation. Claims about competitors are based on public documentation and policy pages.