What is Natural Language processing (NLP)
Natural language processing (NLP) is the relationship between computers and human language. More specifically, natural language processing is the computer understanding, analysis, manipulation, and/or generation of natural language. Will a computer program ever be able to convert a piece of English text into a programmer-friendly data structure that describes the meaning of the natural language text? Unfortunately, no consensus has emerged about the form or the existence of such a data structure. Until such fundamental Artificial Intelligence problems are resolved, computer scientists must settle for the reduced objective of extracting simpler representations that describe limited aspects of the textual information.
Overview
Natural language processing (NLP) can be defined as the automatic (or semi-automatic) processing of human language. The term ‘NLP’ is sometimes used rather more narrowly than that, often excluding information retrieval and sometimes even excluding machine translation. NLP is sometimes contrasted with ‘computational linguistics’, with NLP being thought of as more applied. Nowadays, alternative terms are often preferred, like ‘Language Technology’ or ‘Language Engineering’. Language is often used in contrast with speech (e.g., Speech and Language Technology). But I’m going to simply refer to NLP and use the term broadly. NLP is essentially multidisciplinary: it is closely related to linguistics (although the extent to which NLP overtly draws on linguistic theory varies considerably).
What is it?
NLP is a way for computers to analyze, understand, and derive meaning from human language in a smart and useful way. By utilizing NLP, developers can organize and structure knowledge to perform tasks such as automatic summarization, translation, named entity recognition, relationship extraction, sentiment analysis, speech recognition, and topic segmentation. NLP is used to analyze text, allowing machines to understand how humans speak. This human-computer interaction enables real-world applications like automatic text summarization, sentiment analysis, topic extraction, named entity recognition, parts-of-speech tagging, relationship extraction, stemming, and more. NLP is commonly used for text mining, machine translation, and automated question-answering.
Figure1: NLP Techniques
Importance of NLP
Earlier approaches to NLP involved a more rules-based approach, where simpler machine learning algorithms were told what words and phrases to look for in text and given specific responses when those phrases appeared. But deep learning is a more flexible, intuitive approach in which algorithms learn to identify speakers' intent from many examples, almost like how a child would learn human language.
The advantage of natural language processing can be seen when considering the following two statements: "Cloud computing insurance should be part of every service level agreement" and "A good SLA ensures an easier night's sleep -- even in the cloud." If you use national language processing for search, the program will recognize that cloud computing is an entity, that cloud is an abbreviated form of cloud computing, and that SLA is an industry acronym for service level agreement.
Some Linguistic Terminology
The subareas loosely correspond to some of the standard subdivisions of linguistics:
- Morphology: the structure of words. For instance, unusually can be thought of as composed of a prefix un-, a stem usual, and an affix -ly. Composed is compose plus the inflectional affix -ed: a spelling rule means we end up with composed rather than composed.
- Syntax: The way words are used to form phrases. e.g., it is part of English syntax that a determiner such as the will come before a noun, and also that determiners are obligatory with certain singular nouns.
- Semantics: Compositional semantics is the construction of meaning (generally expressed as logic) based on syntax. This is contrasted to lexical semantics, i.e., the meaning of individual words.
Application
Here are a few common ways NLP is being used today:
- Spell check functionality in Microsoft Word is the most basic and well-known application.
- Text analysis, also known as sentiment analytics, is a key use of NLP. Businesses can use it to learn how their customers feel emotionally and use that data to improve their service.
- By using email filters to analyze the emails that flow through their servers, email providers can use Naive Bayes spam filtering to calculate the likelihood that an email is spam based its content.
- Call center representatives often hear the same, specific complaints, questions, and problems from customers. Mining this data for sentiment can produce incredibly actionable intelligence that can be applied to product placement, messaging, design, or a range of other uses.
- Google, Bing, and other search systems use NLP to extract terms from text to populate their indexes and parse search queries
- Google Translate applies machine translation technologies in not only translating words, but also in understanding the meaning of sentences to improve translations.
- Financial markets use NLP by taking plain-text announcements and extracting the relevant info in a format that can be factored into making algorithmic trading decisions. For example, news of a merger between companies can have a big impact on trading decisions, and the speed at which the particulars of the merger (e.g., players, prices, who acquires who) can be incorporated into a trading algorithm can have profit implications in the millions of dollars.
A Few NLP Examples
- Use Summarizer to automatically summarize a block of text, exacting topic sentences, and ignoring the rest.
- Generate keyword topic tags from a document using LDA (Latent Dirichlet Allocation), which determines the most relevant words from a document. This algorithm is at the heart of the Auto-Tag and Auto-Tag URL micro-services
- Sentiment Analysis, based on Stanford NLP, can be used to identify the feeling, opinion, or belief of a statement, from very negative, to neutral, to very positive.
References
[1] Ann Copestake, “Natural Language Processing”, 2004, 8 Lectures, available online at: https://www.cl.cam.ac.uk/teaching/2002/NatLangProc/revised.pdf
[2] Ronan Collobert and Jason Weston, “Natural Language Processing (Almost) from Scratch”, Journal of Machine Learning Research 12 (2011) pp. 2493-2537
[3] “Top 5 Semantic Technology Trends to look for in 2017”, available online at: https://ontotext.com/top-5-semantic-technology-trends-2017/
Read More