Retrieval Tester

Test retrieval quality with MRR, Hit Rate, NDCG, and Precision metrics.

How It Works

  1. Configure the test: Enter your query and the list of documents to test.
  2. Define ground truth: Indicate which documents are actually relevant for this query.
  3. Analyze metrics: Get MRR, Hit Rate, NDCG and Precision to evaluate your retrieval.

Frequently Asked Questions

What is MRR (Mean Reciprocal Rank)?
MRR measures the average position of the first relevant result. An MRR of 1.0 means the right document is always in first position. An MRR of 0.5 means it's on average in position 2.
What Top K value should I use?
For factual Q&A, Top K = 3-5 is usually sufficient. For synthesis, increase to 10-20. Too high Top K dilutes relevancy and increases LLM context costs.
How can I improve my Hit Rate?
A low Hit Rate indicates a retrieval problem. Solutions: 1) Improve chunking, 2) Test different embedding models, 3) Add reranking, 4) Enrich document metadata.
What is NDCG?
NDCG (Normalized Discounted Cumulative Gain) measures ranking quality considering order. A relevant document in position 1 counts more than in position 5. Ideal score: > 0.8.
Should I use a reranker?
A reranker (like Cohere Rerank or cross-encoder) improves precision by re-ranking Top K results. Recommended if your MRR < 0.7 or for complex queries.
How do I test retrieval with real data?
Create a test set with 50-100 pairs (question, expected relevant documents). Use this tool to calculate metrics. Repeat after each pipeline modification.

Rank it

Test the retrieval quality of your RAG system

Top K
1.00
MRR
100%
Hit Rate
100%
P@3
1.00
NDCG

Retrieved documents

Click to toggle relevance

MRR

Mean Reciprocal Rank

NDCG

Normalized Discounted Cumulative Gain

Ailog optimizes your retrieval automatically.

Try it

How It Works

  1. 1

    Configure the test

    Enter your query and the list of documents to test.

  2. 2

    Define ground truth

    Indicate which documents are actually relevant for this query.

  3. 3

    Analyze metrics

    Get MRR, Hit Rate, NDCG and Precision to evaluate your retrieval.

More Tools

Frequently Asked Questions

MRR measures the average position of the first relevant result. An MRR of 1.0 means the right document is always in first position. An MRR of 0.5 means it's on average in position 2.

For factual Q&A, Top K = 3-5 is usually sufficient. For synthesis, increase to 10-20. Too high Top K dilutes relevancy and increases LLM context costs.

A low Hit Rate indicates a retrieval problem. Solutions: 1) Improve chunking, 2) Test different embedding models, 3) Add reranking, 4) Enrich document metadata.

NDCG (Normalized Discounted Cumulative Gain) measures ranking quality considering order. A relevant document in position 1 counts more than in position 5. Ideal score: > 0.8.

A reranker (like Cohere Rerank or cross-encoder) improves precision by re-ranking Top K results. Recommended if your MRR < 0.7 or for complex queries.

Create a test set with 50-100 pairs (question, expected relevant documents). Use this tool to calculate metrics. Repeat after each pipeline modification.