ChatGPT: The Technicalities behind the Rising Star of Conversational AI

How far back does Conversational AI go?

Conversational AI has been around for some time, and one of the noteworthy early breakthroughs was when ELIZA, the first chatbot constructed in 1966. Pattern matching and substitution methodology were used to explore communication between humans and machines, in which both parties did not understand the conversation context.

The next milestone features A.L.I.C.E in 1995, coded using AIML (Artificial Intelligence Markup Language) based on heuristic pattern matching. The Open Source community subsequently gain interest and thus actively contribute to all sorts of research repositories which brings us the vast collection of machine learning models today.

Timeline by Antoine Louis on A Brief History of Natural Language Processing

Siri, Google Assistant, Cortana and Alexa, are the successive technologies rolled out in the 20th century. They are readily accessible via our handy devices and serve as an intelligent personal assistant instead of just simple question-answering based on internet information. NLP, Natural Language Processing and deep neural networks are the core building blocks of the technology which allows our machines, appliances and IOT devices to understand human language at ease. Command execution via voice recognition is the new norm where a simple instruction like "Hey Google, play me some country music!" will easily fire up your Spotify app to your liking.

Rising Star: ChatGPT

A nonprofit American Artificial Intelligence company called OpenAI was created with the common goal of developing artificial intelligence "in the way that is most likely to benefit humanity as a whole," according to a statement on OpenAI's website from December 11, 2015.

In November 2022, the public was introduced to ChatGPT, a pre-trained language model that had been fine-tuned on conversational data, and its jaw-dropping capabilities quickly became the talk of the town. The public has been drawn to ChatGPT because of its remarkable capacity to produce natural and compelling responses in a conversational environment, regardless of whether they are AI experts or not. In just 5 days, the AI model has amassed over one million users, prompting people to wonder how ChatGPT can provide such accurate and human-like answers.

Behind the scenes

(A) Large Language Model (LLM)

It all started with a large Language Model (LLM), a type of pre-trained neural network that is designed to understand and generate natural language in a way that is similar to human language. Being one of the largest LLMs available today, ChatGPT consists of over 175 billion parameters which grant it the ability to generate text that is remarkably similar to human writing. These models are engineered to comprehend to process a large corpus of text data to learn the patterns and structures of natural language. By feeding the model a large dataset of text from Wikipedia and Reddit, the model can analyze and learn from the patterns and relationships between the words and phrases in the text. As the model continues to learn and refine its understanding of natural language, it becomes increasingly adept at generating high-quality text outputs.

Training steps like predicting a word in a sentence, be it a next-word prediction or masked language modelling are crucial in shaping a high-accuracy LLM. Both techniques are normally deployed using Long-Short Term Memory (LSTM), which consists of feedback connections, i.e., it is capable of processing the entire sequence of data, apart from single data points such as images. However, the model has its drawbacks which limit the potential of large datasets.

Illustration of neural network by DeepMind design and Novoto Studio

LSTMs have difficulty handling long-term dependencies and struggle to remember information that is many steps removed from the current input.

Let's say we want to train an LSTM to predict the next letter in the sentence "I love to eat pizza". The LSTM takes in a sequence of letters as input and outputs a probability distribution over the possible next letters. If we only use a small context window (e.g. 2 or 3 letters), the LSTM may struggle to remember important information from earlier in the sentence.
For example, if the LSTM only sees the letters "zz" as input, it may have difficulty predicting the next letter "a" because it has forgotten that the sentence began with "I love to eat"

LSTMs have restricted context window sizes. The context window is the set of inputs that the network uses to predict the next output.

Let's say we have a language model that uses an LSTM with a context window size of 3. The model takes in a sequence of three words as input and tries to predict the next word in the sequence.
For example, given the input sequence "The cat sat", the model might predict the next word as "on" if it has learned that the sentence often continues as "The cat sat on the mat". However, if the full sentence is "The cat sat on the mat in the corner of the room", the LSTM with a context window size of 3 would only consider "on the mat" as the input and ignore "in the corner of the room", potentially leading to an incorrect prediction.

To address this, a team at Google Brain introduced transformers in 2017, which significantly improves the ability of LLMs to incorporate meaning, as well as the capacity to handle much larger datasets. Transformers differ from LSTMs in that they can process all input data at the same time. The model can assign varying importance to different parts of the input data in relation to any position of the language sequence, thanks to a self-attention mechanism.

(B) GPT

Prompted using Midjourney by The Decoder

In 2018, openAI released a paper "Improving Language Understanding by Generative Pre-Training" - introducing the concept of a Generative Pre-trained Transformer (GPT), which also serves as one of the contributing factors to the significant advancement in the area of transfer learning in the field of natural language processing (NLP). Simply put, GPTs are machine learning models based on the neural network architecture that mimics the human brain. These models are trained on vast amounts of human-generated text data and are capable of performing various tasks such as question generation and answering.

The model later evolved and they released GPT-2, which is a more robust version trained on a corpus of 8 million web pages, comprising 1.5 billion parameters that facilitate text prediction. However, due to their concerns about malicious applications of the powerful technology, they released a much smaller model for researchers to experiment with, as well as a technical paper. Other than next-word prediction, notable use cases include zero-shot learning. As opposed to typical large neural models that require an insane amount of data, a "zero-shot" framework enables measuring a model's performance having never been trained on the task.

Following two years of parameter adjustments and fine-tuning, GPT-3 was unveiled in May 2020, having been trained on a staggering 45 terabytes of text data, which ultimately translated into 175 billion parameters. It was smarter, faster, and more terrifying than anything we had seen before.

Architecture diagram of Transformer

The key success of all GPT models lies within the transformer architecture, which is both encoder (processing the input sequence) and the decoder (generating the output sequence) contain a multi-head self-attention mechanism that enables the model to give different levels of importance to different parts of the sequence in order to understand its meaning and context.

A simple yet comprehensive animation by Raimi Karim illustrating the self-attention mechanism

The encoder in a Transformer processes the input sequence and computes key, query, and value vectors. Attention weights are computed using the key and value vectors, while the query vector is used to produce the output. This is done by taking the dot product of the query and key vectors and scaling the result. The output is computed by taking the weighted sum of the value vectors, using the attention weights as the weights. This is then repeated in multiple layers in parallel to learn increasingly complex representations of the input, which brings about the term multi-head attention. Thus, combining the results for a final score, allows the Transformer to encode multiple contextual relationships for each word in a sequence.

In spite of all this, since GPTs are trained on large data sets, they do have training data bias reflecting on the generated text. Since it is generative in nature, it has the potential of generating inappropriate content due to a lack of understanding of the true meaning of the context. Limited long-term memory will be one of the drawbacks, unlike humans, they are unable to maintain coherence and consistency in longer pieces of text or over multiple exchanges in a conversation.

(C) ChatGPT

To rectify the shortcomings, OpenAI introduced a twist of including human feedback in the training process to improve the GPT-3 model's output to match user intent. This technique is called Reinforcement Learning from Human Feedback (RLHF), which is explained in detail in OpenAI's 2022 paper titled "Training language models to follow instructions with human feedback".

XIMNET taking on ChatGPT for conversational AI - by OpenAI

Source from OpenAI

The figure above summarizes the steps taken by researchers to enhance GPT-3's ability to follow instructions and accomplish tasks rather than simply predicting the most probable word. To start, a fine-tuning process is carried out which produces InstructGPT or also known as a supervised fine-tuning model (SFT). This approach uses patterns and structures learned from labelled training data to generate responses. For instance, a chatbot trained on a dataset of medical conversations will generate informative and appropriate responses to medical-related questions based on its supervised policy.

To incentivize a chatbot to produce more suitable and favourable responses, a reward model is necessary. This model takes in a prompt and the chatbot's responses and outputs a scalar reward based on the desirability of the response. Comparison data is collected by having labellers rank the output they prefer for a given input.

In the last stage, a random prompt is provided to the policy to produce an output, which is then evaluated by the reward model to determine the reward. This reward is then employed to modify the policy using Proximal Policy Optimization (PPO). The Rewards model decides the reward or penalty for each response produced by the chatbot and employs this reward function to steer the learning process, generating relevant, informative, or engaging responses for the user while avoiding producing inappropriate or offensive ones. These processes are then repeated through multiple iterations using Azure AI supercomputing infrastructure which completes the ChatGPT model generation.

How can you use it?

After getting familiar with the architecture behind the tools, how exactly can you implement it in your own applications, for instance building a chatbot?

Let's assume that we are using OpenAI's GPT-3 model, which in their API documentation, is also known as text-davinci-002. The basic steps, include creating an OpenAI API account, setting up an environment to use the API, and programming the chatbot to interact with users.

How about asking ChatGPT to help us with this setup?

Screenshot 1: XYAN asking ChatGPT to help with code development

Output: XYAN asking ChatGPT to help with code development

Screenshot 2: XYAN asking ChatGPT to help with code development

ChatGPT not only generated the code as per my command, but it also provided some explanations and annotations to clarify some sections of the code. This indicates that anyone, whether they are a software developer or a business person, can use ChatGPT's guidance to build small applications.

However, it is important to note that ChatGPT's responses may not always be precise, especially in the context of code implementation. Although it has extensive knowledge of various programming languages, it can still make errors or generate incomplete or incorrect code. While ChatGPT can handle many tasks, it may not be an expert in any particular field. Therefore, it is crucial to validate the information provided by ChatGPT before relying on it. The best approach is to still ask follow-up questions or provide a detailed prompt to generate a response that better fits your use case.

What's next?

As ChatGPT continue to make headlines ever since its grand launch in November 2022, we have seen people asking whether ChatGPT will dominate or even monopolize the entire chatbot industry due to its impressive language processing capabilities and ability to generate human-like responses.

Integrating ChatGPT into any business model is not a straightforward solution. When businesses deploy a chatbot, they typically expect it to be able to communicate with customers on their behalf. Additionally, the training data used to build the language models are only up to date as of 2021, so any new developments or changes after that may not be incorporated into the knowledge base. This raises the question of how to ensure that the chatbot powered by GPT-3 has access to the most current information relevant to the organization. While fine-tuning the model with custom data sets is an option, it's important to keep in mind that OpenAI invested significant resources and computing power to create multiple versions of the model, and the responses generated by the model still do not meet expectations at 100% accuracy. How tolerable are the businesses when it comes to handling the unpredictable nature of generative models like GPT?

While there is no definitive answer, it's clear that current chatbot solutions will need to elevate their offerings in order to meet evolving market demands. In the coming months, we can expect to see various chatbot providers formulate new strategies and introduce new features to ensure that their products remain competitive against ChatGPT in the industry. Instead of frowning upon whether AI will take over our jobs, let's embrace the changes and focus on leveraging them to improve efficiency and create new opportunities for growth and innovation.

More ideas and insights

Does AI Chatbot have limitations?

Most businesses are aware of how fast AI Chatbot industry is growing and how what first looked like a tech fad has moved on to becoming a promising tool to deliver real results.

Why you should consider Microsoft Bot Framework for your next AI Chatbot

Microsoft Bot Framework is a framework for building enterprise-grade conversational AI experiences. It is hosted on Microsoft Azure’s cloud infrastructure.

5 common misconceptions about AI Chatbot

The first chatbot ever was developed by MIT professor Joseph Weizenbaum in the 1960s. It was called ELIZA.