The Differences Between Lemmatization and Stemming

Human language technology (HLT) has become the trendy way of referring to the traditional concept of natural language processing (NLP). The main difference is that HLT tends to emphasize the technological part of the model. Also, processing a “natural language” could encompass communications between any living creatures, whether it’s birds chirping about the neighborhood cat, simian sign language, or dolphins’ telepathic plans to leave Earth. In essence, this is not our purpose; for this document, I will use the term HLT rather than NLP.

HLT is the field in which linguistics and computer science merge to solve problems in processing digital information. Think of it as a place where two normally disparate types of people — linguists and computer scientists — can come together and discuss a topic of interest to both groups. The only other intersect imaginable for two such factions might be The Lord of the Rings, although even here one group would contend that Tolkien’s use of gerunds in the Quenya language is flawed, while the other group would counter that Jackson’s vision of Middle Earth is weakened by excluding the Orcs’ attack on Lothlórien. Can’t we all just get along?

