Welcome to LlamaIndex ๐Ÿฆ™ (GPT Index)!๏ƒ

LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLMโ€™s with external data.

โš ๏ธ NOTE: We are rebranding GPT Index as LlamaIndex! We will carry out this transition gradually.

3/20/2023: Most instances of gpt_index should be renamed to llama_index. We will preserve the name โ€œGPT Indexโ€ as a backup name for now.

2/19/2023: By default, our docs/notebooks/instructions now use the llama-index package. However the gpt-index package still exists as a duplicate!

2/16/2023: We have a duplicate llama-index pip package. Simply replace all imports of gpt_index with llama_index if you choose to pip install llama-index.

๐Ÿš€ Overview๏ƒ

Context๏ƒ

  • LLMs are a phenomenonal piece of technology for knowledge generation and reasoning. They are pre-trained on large amounts of publicly available data.

  • How do we best augment LLMs with our own private data?

  • One paradigm that has emerged is in-context learning (the other is finetuning), where we insert context into the input prompt. That way, we take advantage of the LLMโ€™s reasoning capabilities to generate a response.

To perform LLMโ€™s data augmentation in a performant, efficient, and cheap manner, we need to solve two components:

  • Data Ingestion

  • Data Indexing

Proposed Solution๏ƒ

Thatโ€™s where the LlamaIndex comes in. LlamaIndex is a simple, flexible interface between your external data and LLMs. It provides the following tools in an easy-to-use fashion:

  • Offers data connectors to your existing data sources and data formats (APIโ€™s, PDFโ€™s, docs, SQL, etc.)

  • Provides indices over your unstructured and structured data for use with LLMโ€™s. These indices help to abstract away common boilerplate and pain points for in-context learning:

    • Storing context in an easy-to-access format for prompt insertion.

    • Dealing with prompt limitations (e.g. 4096 tokens for Davinci) when context is too big.

    • Dealing with text splitting.

  • Provides users an interface to query the index (feed in an input prompt) and obtain a knowledge-augmented output.

  • Offers you a comprehensive toolset trading off cost and performance.