BM25
Information RetrievalA ranking function used in information retrieval that estimates document relevance based on term frequency with diminishing returns and document length normalization.
BM25 (Best Matching 25) is a probabilistic ranking function that improves upon TF-IDF by incorporating two key innovations: term frequency saturation (mentioning a word 100 times does not make a document 100 times more relevant) and document length normalization (longer documents naturally have higher term frequencies and should not be unfairly favored).
The BM25 formula combines IDF weighting with a saturation function controlled by parameter k1 (typically 1.2-2.0) and a length normalization parameter b (typically 0.75). As term frequency increases, the score approaches an asymptote rather than growing linearly, which models the intuition of diminishing returns.
Despite the rise of neural retrieval methods, BM25 remains widely used in production systems because it excels at exact term matching (product IDs, error codes, technical jargon), requires no GPU, is highly interpretable, and is extremely fast. In modern RAG systems, BM25 is commonly combined with dense retrieval in hybrid search configurations using techniques like reciprocal rank fusion.
Last updated: February 22, 2026