Cosine Similarity

Fundamentals

A measure of similarity between two vectors based on the cosine of the angle between them, ranging from -1 (opposite) to 1 (identical direction), widely used to compare text embeddings.

Like comparing the direction two arrows point, regardless of how long they are - same direction means similar meaning.

Cosine similarity measures how similar two vectors are by computing the cosine of the angle between them. The formula is cos(theta) = (A dot B) / (||A|| * ||B||), producing values between -1 and 1, where 1 means identical direction, 0 means orthogonal (unrelated), and -1 means opposite direction.

Cosine similarity is the preferred metric for comparing text embeddings because it is scale-invariant , it measures the direction of vectors rather than their magnitude. This is critical for text because a long document and a short summary may have very different vector magnitudes but similar semantic content. Cosine similarity correctly identifies them as similar by ignoring length differences.

In practice, many vector databases pre-normalize embeddings to unit length, which allows cosine similarity to be computed as a simple dot product for maximum speed. When choosing similarity thresholds for RAG systems, scores above 0.9 indicate very similar content, 0.7-0.9 indicates same topic with different phrasing, and below 0.5 is typically not relevant.

Last updated: February 22, 2026

Cosine Similarity

Related Terms