AI Glossary
Key terms and concepts in artificial intelligence and machine learning.
Activation Function
Deep LearningA mathematical function applied to a neuron's output that introduces non-linearity, enabling neural networks to learn complex patterns.
Adam Optimizer
OptimizationAn adaptive learning rate optimization algorithm that maintains per-parameter learning rates based on first and second moment estimates of gradients.
AI Agent
AgentsAn LLM-based system that can autonomously plan multi-step tasks, use external tools, and take actions in the real world to achieve specified goals.
AIME
FundamentalsThe American Invitational Mathematics Examination - a prestigious high school math competition whose problems are used as a benchmark for evaluating AI mathematical reasoning capabilities.
Anthropic
Platforms & ToolsAI safety company and creator of the Claude family of large language models, founded by former OpenAI researchers.
API
FundamentalsApplication Programming Interface - a defined set of rules and protocols that allows different software systems to communicate with each other.
Approximate Nearest Neighbor
Information RetrievalAn algorithm that finds points approximately closest to a query in high-dimensional space, trading small accuracy loss for dramatically faster search over large datasets.
Artificial Analysis
Platforms & ToolsAn independent platform that benchmarks AI models and inference providers across intelligence, performance, price, speed, and latency with standardized methodology.
Attention Mechanism
Deep LearningA technique that allows neural networks to focus on relevant parts of the input when producing each element of the output.
Backpropagation
FundamentalsAn algorithm for computing gradients of the loss function with respect to each weight in a neural network by applying the chain rule layer by layer.
Batch Normalization
Deep LearningA technique that normalizes the inputs of each layer in a neural network across the current mini-batch, stabilizing and accelerating training.
Benchmark
FundamentalsA standardized test or evaluation used to measure and compare the performance of AI models on specific tasks like reasoning, coding, math, or language understanding.
Bi-Encoder
Information RetrievalA model architecture that independently encodes queries and documents into separate embeddings for fast similarity comparison, used for initial retrieval at scale.
Bias-Variance Tradeoff
FundamentalsThe fundamental tension in machine learning between a model being too simple to capture patterns (high bias) and too complex, fitting noise instead of signal (high variance).
Binary
FundamentalsA compiled, executable file that a computer can run directly, as opposed to source code that must be interpreted or compiled first.
BM25
Information RetrievalA ranking function used in information retrieval that estimates document relevance based on term frequency with diminishing returns and document length normalization.
Chatbot
FundamentalsA software application that uses AI to simulate human-like conversation through text or voice, ranging from rule-based scripts to modern LLM-powered assistants.
ChatGPT
Platforms & ToolsOpenAI's conversational AI product that provides a chat interface to GPT models, widely credited with bringing large language models to mainstream public awareness.
Chunking
Information RetrievalThe process of dividing large documents into smaller, semantically coherent pieces suitable for embedding and retrieval in RAG systems.
Citation
NLPThe practice of attributing specific claims in an LLM-generated answer to their source documents, enabling verification and building trust.
Claude Code
Platforms & ToolsAnthropic's agentic coding tool that runs in the terminal, capable of reading, writing, and executing code across entire codebases with human oversight.
Claude Haiku 4.5
LLM ModelsAnthropic's fastest model released in October 2025, achieving 90% of Sonnet 4.5's performance on agentic coding at lower cost.
Claude Opus 4.5
LLM ModelsAnthropic's November 2025 flagship model achieving 80.9% on SWE-bench with 50-75% reduction in tool calling errors.
Claude Opus 4.6
LLM ModelsAnthropic's February 2026 flagship with 1M context window, 80.8% on SWE-bench, 68.8% on ARC-AGI-2, and the highest Terminal-Bench 2.0 score among all frontier models.
Claude Sonnet 4.5
LLM ModelsAnthropic's September 2025 model marketed as the best coding model and best for agents, achieving 77.2% on SWE-bench and 100% on AIME with Python.
Claude Sonnet 4.6
LLM ModelsAnthropic's February 2026 mid-tier model achieving 79.6% on SWE-bench and 72.5% on OSWorld, matching near-flagship performance at $3/$15 per million tokens.
Codex
Platforms & ToolsOpenAI's asynchronous coding agent that runs tasks in cloud sandboxes, designed for parallel software engineering work like writing features, fixing bugs, and running tests.
Cold Start
MLOpsThe initial delay when a system or service must initialize from scratch before it can handle requests, common in serverless and containerized deployments.
Context Fusion
Information RetrievalThe process of combining structured knowledge from a knowledge graph with unstructured text from RAG retrieval into a unified context for LLM generation.
Context Window
NLPThe maximum number of tokens a language model can process at once, which limits how much retrieved content can be included alongside a query.
Convolutional Neural Network
Computer VisionA neural network architecture that uses convolutional layers to automatically learn spatial hierarchies of features, primarily used for image and video analysis.
Cosine Similarity
FundamentalsA measure of similarity between two vectors based on the cosine of the angle between them, ranging from -1 (opposite) to 1 (identical direction), widely used to compare text embeddings.
Cross-Encoder
Information RetrievalA model architecture that jointly encodes a query-document pair to compute a relevance score, offering higher accuracy than bi-encoders but at greater computational cost.
Cross-Entropy
OptimizationA loss function that measures the difference between a model's predicted probability distribution and the true distribution, widely used for classification tasks.
CUDA
Platforms & ToolsNVIDIA's parallel computing platform and API that allows developers to use NVIDIA GPUs for general-purpose processing, forming the backbone of most AI training and inference workflows.
Cypher
Knowledge GraphsA declarative graph query language created for Neo4j that uses ASCII-art syntax to represent and match graph patterns.
Data Augmentation
FundamentalsA regularization technique that artificially expands the training dataset by applying label-preserving transformations to existing examples, forcing the model to learn invariances.
Deep Learning
FundamentalsA subset of machine learning that uses neural networks with many layers to learn complex patterns and representations from large amounts of data.
DeepSeek R1
LLM ModelsOpen-weight reasoning model released in January 2025, achieving 97.3% on MATH-500 and proving frontier AI doesn't require massive budgets.
DeepSeek V3
LLM ModelsOpen-weight MoE model with updated version V3-0324 scoring 81.2% on MMLU-Pro and ranking 5th on LMArena leaderboard.
Dense Retrieval
Information RetrievalA neural retrieval method that encodes queries and documents as dense vector embeddings and retrieves documents based on vector similarity.
Docker Image
MLOpsA lightweight, standalone, executable package that includes everything needed to run a piece of software - code, runtime, libraries, and system tools.
Doubao 1.5 Pro
LLM ModelsByteDance's reasoning model with Deep Thinking mode, matching GPT-4o performance at 50x lower cost with 256K context window.
Dropout
Deep LearningA regularization technique that randomly sets a fraction of neuron activations to zero during each training step, preventing co-adaptation and reducing overfitting.
Embedding
NLPA learned dense vector representation that maps discrete entities like words or items into a continuous vector space where similar items are closer together.
Entity Linking
Knowledge GraphsThe task of resolving different textual mentions of an entity to a single canonical representation, critical for knowledge graph quality.
Epoch
FundamentalsOne complete pass through the entire training dataset during model training.
ERNIE 4.5
LLM ModelsBaidu's open-source multimodal AI model processing text, images, audio, and video, with benchmark wins over GPT-4o and GPT-5 on specific tasks.
Exploding Gradients
Deep LearningA training problem where gradients grow exponentially large as they propagate backward through many layers, causing weight updates to be enormous and training to diverge.
FAISS
Information RetrievalFacebook AI Similarity Search -- an open-source library by Meta for efficient similarity search and clustering of dense vectors, optimized for billion-scale datasets.
Feature Engineering
FundamentalsThe process of transforming raw data into informative input features that make patterns more accessible to machine learning models.
Feed-Forward Network
Deep LearningA simple neural network layer within each transformer block that independently transforms each token's representation through two linear transformations with a non-linear activation in between.
Fine-tuning
MLOpsThe process of further training a pretrained model on a smaller, task-specific dataset to adapt it for a particular use case.
Foundation Model
FundamentalsA large AI model trained on broad data at scale that can be adapted to a wide range of downstream tasks, serving as the base layer for many AI applications.
Gemini
LLM ModelsGoogle DeepMind's family of multimodal AI models that power Google's AI products across Search, Workspace, Android, and developer APIs.
Gemini 2.5 Flash
LLM ModelsGoogle DeepMind's fast model released in May 2025 with 1M context window and 251 tokens/second output speed.
Gemini 2.5 Pro
LLM ModelsGoogle DeepMind's 2025 flagship with 1M token context window, leading Humanity's Last Exam with 18.8% accuracy.
Gemini 3.1 Pro
LLM ModelsGoogle DeepMind's February 2026 model topping 13 of 16 industry benchmarks with 77.1% on ARC-AGI-2 and 94.3% on GPQA Diamond.
Generative Adversarial Network
Deep LearningA framework consisting of two neural networks - a generator and a discriminator - that compete against each other to produce increasingly realistic synthetic data.
GLM-4.5
LLM ModelsZhipu AI's open-weight agentic model with 355B total parameters, ranking 3rd globally and excelling at tool use with 90.6% accuracy.
Google DeepMind
Platforms & ToolsGoogle's AI research lab formed by merging DeepMind and Google Brain, responsible for AlphaGo, AlphaFold, and the Gemini model family.
GPQA Diamond
FundamentalsA benchmark of 198 graduate-level multiple-choice questions in physics, biology, and chemistry that are designed to be unsolvable through internet search, requiring genuine PhD-level expertise.
GPT-4.1
LLM ModelsOpenAI's April 2025 API-focused model with a massive 1M token context window and 38.3% on MultiChallenge, beating GPT-4o by 10.5%.
GPT-4o
LLM ModelsOpenAI's fast, cost-effective multimodal flagship model released in May 2024, supporting text, image, and audio with a 128K context window.
GPT-5
LLM ModelsOpenAI's major generational leap released in August 2025, achieving 94.6% on AIME 2025 and 45% fewer factual errors than GPT-4o.
GPT-5.2
LLM ModelsOpenAI's December 2025 model with a 256K context window, 100% AIME 2025 accuracy, and hallucination rate reduced to 6.2%.
GPT-oss-120b
LLM ModelsOpenAI's first major open-weight model with 117B parameters and MoE architecture, rivaling proprietary o4-mini performance.
GPU
FundamentalsA Graphics Processing Unit - a specialized processor designed for parallel computation, now essential for training and running AI models due to its ability to perform thousands of operations simultaneously.
Gradient Clipping
OptimizationA technique that caps gradient magnitudes during training to prevent exploding gradients from destabilizing the optimization process.
Gradient Descent
FundamentalsAn optimization algorithm that iteratively adjusts model parameters in the direction that minimizes the loss function.
Graph Embedding
Knowledge GraphsA technique for representing graph nodes as dense vectors that preserve graph structure, enabling similarity search and machine learning over graph data.
Graph Traversal
Knowledge GraphsThe process of systematically visiting nodes in a graph by following edges, used in knowledge graphs to explore relationships and answer multi-hop queries.
GraphRAG
Information RetrievalAn architecture pattern that incorporates knowledge graph reasoning alongside vector-based retrieval in RAG systems, pioneered by Microsoft for enterprise search.
Grok 3
LLM ModelsxAI's June 2025 model with 1M context window, beating GPT-4o and Claude 3.5 Sonnet on AIME and GPQA with 1402 Arena Elo.
Grok 4
LLM ModelsxAI's July 2025 model achieving 100% on AIME 2025 and 61.9% on USAMO 2025, with 4-agent parallel collaboration in latest beta.
Grounding
NLPThe technique of anchoring LLM responses in factual, retrieved information rather than the model's parametric knowledge, reducing hallucinations.
Hallucination
NLPWhen an AI model generates plausible-sounding but factually incorrect or fabricated information with apparent confidence.
HNSW
Information RetrievalHierarchical Navigable Small World -- an efficient graph-based algorithm for approximate nearest neighbor search that builds a multi-layer navigation structure over vectors.
Hugging Face
Platforms & ToolsThe largest open-source AI platform and model hub, hosting over 2 million models, 500,000 datasets, and 1 million demo apps used by 10 million developers.
Hybrid Search
Information RetrievalA retrieval approach that combines different search methods, typically keyword-based (BM25) and semantic (dense embedding) search, to leverage the strengths of both.
Hyperparameter
FundamentalsA configuration value set before training begins that controls the learning process itself, as opposed to model parameters which are learned from data.
Inference
MLOpsThe process of using a trained model to make predictions on new, unseen data, as opposed to the training phase where the model learns from labeled examples.
Kimi K2
LLM ModelsMoonshot AI's open-source 1 trillion parameter MoE model with 32B active parameters, outperforming GPT-5 and Claude Sonnet 4.5 on reasoning benchmarks.
Kimi K2.5
LLM ModelsMoonshot AI's January 2026 open-weight multimodal model with vision and agent swarm capabilities, leading on agentic and coding benchmarks.
KL Divergence
FundamentalsA measure of how one probability distribution differs from a reference distribution, quantifying the information lost when approximating one distribution with another.
Knowledge Graph
Knowledge GraphsA structured representation of knowledge as entities (nodes) and relationships (edges), often with properties attached to both, enabling logical traversal and multi-hop reasoning over data.
KV Cache
Deep LearningA memory optimization technique in LLM inference that stores previously computed key-value pairs from attention layers, avoiding redundant recalculation when generating each new token.
Large Language Model
NLPA neural network trained on vast amounts of text data that can understand and generate human language with remarkable fluency and versatility.
Latent Space
Deep LearningA lower-dimensional representation space learned by a model where similar inputs are mapped to nearby points, capturing the essential structure of the data.
Learning Rate
OptimizationA hyperparameter that controls how much model weights are adjusted in response to the estimated error during each step of gradient descent optimization.
Llama 4 Maverick
LLM ModelsMeta's April 2025 open-weight flagship with 402B total parameters, 1M context window, and multimodal capabilities beating GPT-4o.
Llama 4 Scout
LLM ModelsMeta's April 2025 open-weight model with 109B total parameters and industry-leading 10M token context window.
LMArena
Platforms & ToolsA crowdsourced platform where users compare AI models head-to-head in blind conversations, producing Elo-based rankings that reflect real human preferences.
Loss Function
FundamentalsA function that measures the difference between a model's predictions and the actual target values, guiding the optimization process during training.
Machine Learning
FundamentalsA field of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed.
MCP
AgentsModel Context Protocol - an open standard created by Anthropic that defines how AI assistants connect to external data sources, tools, and services through a unified interface.
Meta AI
Platforms & ToolsMeta's AI division responsible for the open-source Llama model family, PyTorch, and FAIR research lab.
MiniMax M2.5
LLM ModelsMiniMax's February 2026 model scoring 80.2% on SWE-Bench Verified, outperforming Claude Opus 4.6 and GPT-5.2 at 1/20th the cost.
Mistral Medium 3
LLM ModelsEuropean AI model achieving 90% of Claude Sonnet 3.7 capabilities while demonstrating cost-efficient alternative to premium models.
MMMU-Pro
FundamentalsA rigorous multimodal AI benchmark with college-level questions across six disciplines that tests whether models truly understand visual and textual information together.
MoltBook
AgentsA social network exclusively for AI agents, where autonomous bots interact, post content, and form communities.
MoltBot
AgentsThe intermediate name for the OpenClaw AI agent framework during its transition from ClawdBot.
Multi-Head Attention
Deep LearningAn extension of attention that runs multiple attention operations in parallel with different learned projections, allowing the model to capture different types of relationships simultaneously.
Multi-Hop Reasoning
Knowledge GraphsAnswering questions that require connecting multiple pieces of information across several reasoning steps, a key strength of knowledge graph-augmented systems.
Named Entity Recognition
NLPAn NLP task that identifies and classifies named entities such as people, organizations, locations, and dates in unstructured text.
Neo4j
Knowledge GraphsThe most widely used graph database in industry, designed for storing and querying property graphs using the Cypher query language.
Neural Network
FundamentalsA computing system inspired by biological neural networks that learns to perform tasks by considering examples without being explicitly programmed.
Ollama
Platforms & ToolsAn open-source tool for running large language models locally on personal computers with a simple command-line interface.
Ontology
Knowledge GraphsA formal specification of concepts, categories, and relationships within a domain that defines what types of entities exist and how they can relate to each other.
Open Source
FundamentalsSoftware whose source code is publicly available for anyone to view, modify, and distribute, enabling community-driven development and transparency.
Open Weight Model
FundamentalsAn AI model whose trained parameters (weights) are publicly released for download and use, but whose training data, code, or methodology may remain proprietary.
OpenAI
Platforms & ToolsAmerican AI research company and creator of ChatGPT, GPT-series models, DALL-E, and Whisper.
OpenAI o3
LLM ModelsOpenAI's reasoning model released in April 2025 with a 200K context window, achieving 88.9% on AIME 2025 and 69.1% on SWE-bench Verified.
OpenAI o4-mini
LLM ModelsOpenAI's smaller reasoning model released in April 2025, achieving 92.7% on AIME 2025 and 99.5% with Python interpreter access.
OpenClaw (ClawdBot)
AgentsOpen-source autonomous AI agent framework originally called ClawdBot, capable of executing real-world tasks via LLMs.
Overfitting
FundamentalsA phenomenon where a model learns the training data too well, including its noise and outliers, resulting in poor performance on unseen data.
Pattern Matching
FundamentalsA technique for checking data against a set of predefined patterns or rules, used in programming languages, text processing, and machine learning.
Perceptron
FundamentalsThe simplest neural network unit that computes a weighted sum of inputs, adds a bias, and passes the result through an activation function to produce an output.
Positional Encoding
NLPA technique that injects information about token position into transformer inputs, since the attention mechanism itself is permutation-invariant and has no inherent notion of sequence order.
Prompt Engineering
NLPThe practice of carefully crafting input text to elicit desired behavior from large language models, including techniques like few-shot examples, chain-of-thought reasoning, and system instructions.
Prompt Injection
FundamentalsAn attack where malicious instructions are hidden inside input data to hijack an AI model's behavior, causing it to ignore its original instructions and follow the attacker's instead.
Quantization
Deep LearningA technique that reduces the numerical precision of a model's weights and activations, shrinking memory usage and speeding up inference with minimal loss in accuracy.
Query Routing
Information RetrievalThe process of classifying a user query and directing it to the most appropriate retrieval strategy, such as knowledge graph lookup, RAG search, or hybrid retrieval.
Qwen 3
LLM ModelsAlibaba's April 2025 open-source model family trained on 36 trillion tokens in 119 languages, competitive with DeepSeek R1 and o3-mini.
RAM
FundamentalsRandom Access Memory - the fast, volatile working memory a computer uses to store data that is actively being used or processed.
RDF
Knowledge GraphsResource Description Framework -- a W3C standard for representing information as subject-predicate-object triples, forming the foundation of the semantic web.
Reciprocal Rank Fusion
Information RetrievalA method for combining ranked result lists from different retrieval systems by summing reciprocal rank scores, commonly used to merge BM25 and dense retrieval results.
Recurrent Neural Network
Deep LearningA neural network architecture with loops that allow information to persist across time steps, designed for processing sequential data.
Recursive Language Model (RLM)
NLPAn inference approach that lets an LLM programmatically examine, decompose, and recursively call itself over snippets of extremely long input, handling contexts up to 100x beyond native window limits.
Regularization
OptimizationA set of techniques that constrain model complexity during training to prevent overfitting and improve generalization to unseen data.
Reinforcement Learning
Reinforcement LearningA machine learning paradigm where an agent learns to make decisions by taking actions in an environment and receiving rewards or penalties.
Relation Extraction
NLPThe NLP task of identifying and classifying semantic relationships between entities mentioned in text, a key step in knowledge graph construction.
ReLU
Deep LearningThe Rectified Linear Unit activation function, defined as max(0, x), which has become the default non-linearity in modern deep networks due to its simple gradient and computational efficiency.
Reranking
Information RetrievalA second-stage ranking process that reorders initially retrieved results using a more computationally expensive but accurate model, typically a cross-encoder.
Residual Connection
Deep LearningA shortcut that adds a layer's input directly to its output (y = F(x) + x), enabling training of very deep networks by providing a gradient highway that prevents vanishing gradients.
Retrieval Pipeline
Information RetrievalThe end-to-end sequence of steps in a RAG system: query processing, document retrieval, reranking, context construction, and LLM generation.
Retrieval-Augmented Generation
NLPAn architecture pattern that reduces LLM hallucination by retrieving relevant documents from an external knowledge base and including them as context before generating a response.
SaaS
FundamentalsSoftware as a Service - a delivery model where software is hosted in the cloud and accessed through a browser or API on a subscription basis rather than installed locally.
Self-Attention
Deep LearningAn attention mechanism where queries, keys, and values all come from the same input sequence, allowing each token to attend to every other token in the sequence including itself.
Semantic Search
Information RetrievalA search technique that finds results based on the meaning of a query rather than exact keyword matches, typically using vector embeddings and similarity metrics.
Sigmoid
Deep LearningAn S-shaped activation function that maps any real number to a value between 0 and 1, historically important but largely replaced by ReLU in hidden layers.
Small Language Model (SLM)
FundamentalsA language model with roughly 1 billion to 10 billion parameters, designed to run efficiently on edge devices and resource-constrained environments while retaining core NLP capabilities.
Softmax
Deep LearningA function that converts a vector of real numbers into a probability distribution, where each output is between 0 and 1 and all outputs sum to 1.
SPARQL
Knowledge GraphsA query language for RDF graph databases, similar to SQL but designed for querying data represented as subject-predicate-object triples.
Sparse Retrieval
Information RetrievalA retrieval method using high-dimensional sparse vectors based on term frequencies (like BM25 or TF-IDF), where most vector elements are zero.
Stochastic Gradient Descent
OptimizationAn optimization algorithm that updates model parameters using the gradient computed on a small random subset (mini-batch) of the training data rather than the entire dataset.
Temperature
NLPA parameter that controls the randomness of token sampling during LLM text generation by scaling the logits before applying softmax.
Text-to-Cypher
Knowledge GraphsThe technique of using LLMs to convert natural language questions into Cypher graph queries, enabling non-technical users to query knowledge graphs.
TF-IDF
Information RetrievalA numerical statistic combining term frequency and inverse document frequency to measure how important a word is to a document within a collection.
Tokenization
NLPThe process of breaking text into smaller units called tokens, which serve as the fundamental input elements for language models.
Tool Calling
AgentsA capability that allows large language models to invoke external functions, APIs, or tools to perform actions beyond text generation.
Transfer Learning
FundamentalsA technique where a model trained on one task is reused as the starting point for a model on a different but related task.
Transformer
NLPA neural network architecture based on self-attention mechanisms that processes input data in parallel, forming the basis of modern large language models.
Triple
Knowledge GraphsThe fundamental unit of knowledge in a graph, expressed as a (subject, predicate, object) statement such as (Alice, WORKS_AT, Acme Corp).
TRM (Tiny Recursive Model)
LLM ModelsSamsung's 7M-parameter recursive reasoning model that outperforms LLMs 10,000x its size on abstract reasoning benchmarks like ARC-AGI.
Turing Test
FundamentalsA test of machine intelligence proposed by Alan Turing in 1950, in which a human evaluator tries to distinguish between a machine and a human based on natural-language conversation alone.
Underfitting
FundamentalsWhen a model is too simple to capture the underlying patterns in the data, resulting in poor performance on both training and test sets.
Universal Approximation Theorem
Deep LearningA theorem proving that a neural network with a single hidden layer and non-linear activation can approximate any continuous function to arbitrary precision, given enough neurons.
Vanishing Gradients
Deep LearningA training problem where gradients become exponentially smaller as they propagate backward through many layers, effectively preventing early layers from learning.
Vector Database
Information RetrievalA specialized database optimized for storing and querying high-dimensional vector embeddings, supporting efficient similarity search operations.
Vibe Coding
FundamentalsA software development approach where a programmer describes what they want in natural language and an AI model generates the code, with the programmer guiding the process through conversation rather than writing code directly.
Weight Initialization
Deep LearningThe strategy for setting initial values of neural network parameters before training begins, critical for ensuring stable signal and gradient propagation through deep networks.
xAI
Platforms & ToolsElon Musk's AI company behind the Grok chatbot and the Colossus supercomputer, merged with SpaceX in 2026.
Yi-Lightning
LLM Models01.AI's speed-optimized MoE model ranking 6th on Chatbot Arena, trained for $3M and 70-80% cheaper than US frontier models.
Zero-day
FundamentalsA software vulnerability that is unknown to the vendor and has no available patch, giving defenders zero days of warning before it can be exploited.