Cross-Encoder

Information Retrieval

A model architecture that jointly encodes a query-document pair to compute a relevance score, offering higher accuracy than bi-encoders but at greater computational cost.

Like a judge who reads both the question and the answer together before scoring - slower but more accurate.

A cross-encoder is a transformer-based model that takes a query and a document as a single concatenated input and produces a relevance score. Unlike bi-encoders that encode query and document independently, cross-encoders allow full cross-attention between query and document tokens, enabling word-level interaction that captures fine-grained relevance signals.

Cross-encoders achieve significantly higher accuracy than bi-encoders because they can directly compare query terms against document terms. For example, when processing "capital of France" against a document containing "Paris is the beautiful capital of France," the cross-encoder's attention mechanism explicitly connects "capital" and "France" in the query to "Paris" and "capital" in the document.

The tradeoff is computational cost: a cross-encoder must run a forward pass for every query-document pair, making it O(n) per query where n is the number of candidates. This makes cross-encoders impractical for initial retrieval over millions of documents but ideal for reranking a small set of candidates (typically 20-100). Common cross-encoder models include ms-marco-MiniLM and models from the sentence-transformers library.

Related Terms

Bi-Encoder Reranking Retrieval Pipeline

Last updated: February 22, 2026