Glossary
Terms
ChatGPT
ChatGPT is a large language model (LLM) based on the GPT-3.5 architecture, developed by OpenAI, that is capable of generating human-like text responses to a wide variety of prompts and questions.
Large Language Model (LLM)
A type of artificial intelligence (AI) model that uses deep learning algorithms to analyze and understand human language, and is trained on vast amounts of text data to generate natural language responses to queries.
Embedding
An embedding is a mathematical representation of a word or a sequence of words, used by LLMs to process natural language text. Embeddings are designed to capture the semantic and syntactic relationships between words, and are typically represented as high-dimensional vectors.
Vector Database
A vector database is a data storage system that stores large numbers of high-dimensional vectors, typically used to store the embeddings of words or documents generated by LLMs.
Hallucination
Hallucination is a term used to describe the phenomenon where an LLM generates text that is factually incorrect, irrelevant, or inconsistent with the input prompt. This can occur when the model is not properly trained or when it encounters input that is outside of its training data.
Finetuning
Finetuning is the process of retraining an LLM on a specific task or domain, by adjusting the weights of its neural network based on a smaller dataset that is specific to the task or domain. This process helps the model to improve its performance on the task or domain, while retaining its general language capabilities.
Data-centric AI
Data-centric AI refers to AI systems that are designed to be highly data-driven, relying on large datasets to improve their performance over time. LLMs are an example of data-centric AI.
Artificial Intelligence
Artificial Intelligence (AI) is a branch of computer science that aims to create intelligent machines that can perform tasks that would normally require human intelligence.
Machine learning
Machine learning is a branch of artificial intelligence that uses statistical techniques to give computers the ability to learn from data, without being explicitly programmed.
Natural language processing
Natural language processing (NLP) is a branch of artificial intelligence that focuses on enabling computers to understand and process human language.
Deep learning
Deep learning is a branch of machine learning that uses artificial neural networks to learn from large amounts of data.
Training data
Training data is a set of examples used to train a machine learning model. The training data is used to teach the model how to perform a specific task or solve a specific problem.
Generative AI
Generative AI refers to AI systems that are designed to generate new content, such as text, images, or audio. LLMs are an example of generative AI.
Discriminative AI
Discriminative/Predictive AI refers to AI systems that are designed to make predictions based on existing data. For example, a predictive AI system might be used to predict the likelihood of a customer defaulting on a loan.