A black keyboard at the bottom of the picture has an open book on it, with red words in labels floating on top, with a letter A balanced on top of them. The perspective makes the composition form a kind of triangle from the keyboard to the capital A. — **What is Natural Language Processing?**

Image credit: Teresa Berndtsson / Better Images of AI / Letter Word Text Taxonomy / CC-BY 4.0

By Claire Barale, PhD student at the Centre for Technomoral Futures at the University of Edinburgh. Edited by Iwona Soppa, Data for Children Collaborative, Advocacy and Relations Manager.

A sub-field of artificial intelligence, Natural Language Processing or NLP, refers to a computer's ability to process, search and generate spoken or written text. It helps humans communicate with computers in a way that helps them learn to understand us and vice versa. NLP applications can be helpful in everyday life: think about a writing assistant Grammarly. It is a software that automatically reviews spelling, grammar, and syntax and can suggest sentences depending on your audience or a chosen tone of language. It is a complex NLP system that saves a great deal of time for everyone who must proofread their writing (and possibly some embarrassment, too!), but it can also support people with learning difficulties.

You may have heard about the Chinese room thought experiment. It is a perfect example of what we expect natural language processing to achieve and demonstrates its limitations. The experiment imagines someone alone in a closed room, communicating with a Chinese speaker on the other side of the door. The person in the room does not speak Chinese but has access to a computer and follows its instructions. The computer tells the user which ideograms to use to reply. So, the person on the other side of the door, the Chinese speaker, might think they are communicating with another Chinese speaker in the room.

Without going into the philosophical argument (you can learn more here about this: https://plato.stanford.edu/entries/chinese-room ), this experiment asks two fundamental questions of modern NLP:

Does the computer really understand Chinese? Or does it just create an illusion of understanding by following and applying a set of rules?

We know that the current state-of-the-art natural language processing does the latter, but could modern models of language have a computer actually understand language?

This isn’t just an interesting question on its own but also a necessary step in the quest toward creating general AI (AGI) or “superintelligence”. AGI is imagined as systems with broadly speaking abilities typically linked to the human brain, including the ability to reason, learn, and comprehend intellectual tasks. Many of these tasks humans perform are connected to language, including knowledge transfer and formation. Considering that mastering language is a prerequisite to replicating human intelligence, no one could claim to have created a human-like machine without mastering the language element. Is then NLP a reason why it might not be possible to achieve general AI, or can we bridge this gap using computational language models?

We interact with NLP every day, probably more than we know. When you talk to Alexa or Siri, you use NLP. Your work emails spell check uses NLP. When you type on your phone, and it suggests the next words to use, you use NLP. You may have noticed the words it suggests change according to what you usually type. It is learning from your habits and computing probabilities that you will use a word again.

Broadly speaking, NLP is an interdisciplinary field at the crossroads of computer science, AI, linguistics, and cognitive science. Any kind of machine learning (more on machine learning here) that uses text or speech as its source data belongs to the field of NLP.

Functionalities range from specific use cases and domain applications to high-level applications such as translation, reasoning, speech or text analysis and generation, and understanding. For example, when it comes to natural language understanding or NLU, one real-world application is categorizing texts; for example, is this email spam or not? Or more complex: is that insurance contract valid? It is a challenging task because of the context and ambiguity around a text (what are indicators that an email is a spam?) or around a word (does the word “apple” refer to the company or the fruit?).

More specific applications include information extraction from a text, whether an invoice, a contract, a tweet, or a patient record, and also classification and summarization of documents. For example, in a legal context, it is a gain of time to summarise some key facts about previous cases similar to the one you are processing. Likewise, a brief news report summary is helpful for a financial analyst.

This is very diverse and can appear overwhelming, but it is essential to think of this as layers. For instance, imagine you want your computer to generate a short story on a specific theme, maybe about a robot who becomes friends with a little boy. First, you would give it some instructions such as the name of the characters and location, like the robot’s name is Rusty and the boy is Jamie. Then, you would add everything else you want to include in your story. The necessary layers for this text to be generated will use results from the grammatical analysis, so it flows like a natural language we use, summaries of other stories the model can find, and information extracted from novels that were found thanks to the keywords (robots, boy, friendship and so on) you will use for creating your story. In other words, you would rely on various sub-fields (layers) of NLP. It is clear to see how for one specific task, we often need to build a sort of pyramid of knowledge. At the top of the pyramid sits understanding and generating language.

Most recent NLP models use deep learning models and a specific kind of neural network architecture called transformers. State-of-the-art model architectures rely on large language models (often called LLMs). Large language models are recent and notable breakthroughs for NLP that changed the quality of text generation and understanding as well as the landscape of NLP-based research. Why? Because of their size: they are trained on billions of parameters. You may have heard about such models as Google's BERT, OpenAI's GPT-3, Meta’s OPT-175B, Nvidia and Microsoft’s MT-NLG, or BigScience's BLOOM. To give an idea of the size, GPT-3 has 175 billion (!) parameters and requires 800 GB of storage. Its ability to generate language is remarkable and hard to distinguish from human-produced language, so if you used it for your story about Rusty the Robot and Jamie, the reader wouldn’t know it wasn’t a human who wrote it. Such a model can read the entire Internet and remember it. That is impressive, but it begs the same question we encounter in the Chinese room experiment:

Can large language models be intelligent at all, or are they just very good at imitating, and why does it matter?

Going one step further, let’s mention AskDelphi (https://delphi.allenai.org ), a research project aimed at testing a model of commonsense and moral reasoning. You can ask Delphi questions, which will answer you with a moral judgment. For example: “Can I ignore a phone call from a friend during my working hours.” Delphi answers: "It's okay”. On the surface, that’s not controversial and certainly quite entertaining to use. However, moral judgments are complex and carry a lot of meaning. Therefore, knowing what we know about NLP and its limitations, it seems sensible to treat such recommendations from an NLP system with caution. After all, as E.M Bender and A. Koller remind us (Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data), there is a fundamental difference between capturing the meaning of something based on the deeper understanding of the thing, and the mere form of that thing, which in this case is the text produced by Delphi. Delphi mimics a moral judgement, but to Delphi, this is just text. To us humans, that text carries meaning. This is why we should always examine very carefully how we use large language models and keep asking and probing the limitations of NLP applications.