POS Tagging

NLP

The task of assigning a part-of-speech label - noun, verb, adjective, etc. - to each token in a sentence.

Imagine diagramming a sentence in grammar class - labeling each word as noun, verb, adjective, and so on.

Part-of-speech (POS) tagging is a foundational NLP task that assigns a grammatical category to every word in a sentence. Common tags include noun (NN), verb (VB), adjective (JJ), adverb (RB), determiner (DT), and preposition (IN), among many others. Tagsets vary by framework; the Penn Treebank tagset is widely used for English.

POS information is used downstream in many NLP pipelines. Lemmatization requires POS context to resolve ambiguous words. Named entity recognition and dependency parsing both benefit from knowing whether a word is a noun or a verb. Syntactic analysis, grammar correction, and machine translation all rely on POS tags to some degree.

Early taggers used hand-written rules or Hidden Markov Models trained on annotated corpora. Modern systems use neural sequence models - often bidirectional LSTMs or transformer encoders - and achieve over 97% accuracy on standard benchmarks. Many NLP libraries such as spaCy and NLTK expose POS tagging as a built-in pipeline component.

Last updated: March 6, 2026

POS Tagging

Related Terms