>_TheQuery
← Glossary

Reranking

Information Retrieval

A second-stage ranking process that reorders initially retrieved results using a more computationally expensive but accurate model, typically a cross-encoder.

Reranking is a technique where an initial set of retrieved candidates (often 50-100 documents) is rescored and reordered by a more sophisticated model to improve precision in the final top results. The initial retrieval stage optimizes for speed and recall using approximate methods, while the reranker optimizes for precision using slower but more accurate scoring.

The most common reranking approach uses cross-encoder models (such as ms-marco-MiniLM) that jointly encode the query and each candidate document together, allowing word-by-word attention between them. This produces much more accurate relevance scores than bi-encoder approaches that embed query and document independently.

Reranking is particularly valuable in RAG systems where the quality of the final top-5 documents directly impacts answer quality. A typical pipeline retrieves 100 candidates using fast vector search, reranks them with a cross-encoder, and passes only the top 5 to the LLM. Studies show reranking can improve answer quality by 20% or more, making it essential for production RAG systems rather than an optional optimization.

Last updated: February 22, 2026