Data Connectors#

NOTE: Our data connectors are now offered through LlamaHub 🦙. LlamaHub is an open-source repository containing data loaders that you can easily plug and play into any LlamaIndex application.

The following data connectors are still available in the core repo.

llama_index.readers Package#

Data Connectors for LlamaIndex.

This module contains the data connectors for LlamaIndex. Each connector inherits from a BaseReader class, connects to a data source, and loads Document objects from that data source.

You may also choose to construct Document objects manually, for instance in our Insert How-To Guide. See below for the API definition of a Document - the bare minimum is a text property.

Functions#

download_loader(loader_class[, ...])

Download a single loader from the Loader Hub.

Classes#

WikipediaReader

Wikipedia reader.

YoutubeTranscriptReader

Youtube Transcript reader.

SimpleDirectoryReader([input_dir, ...])

Simple directory reader.

JSONReader([levels_back, collapse_length, ...])

JSON reader.

SimpleMongoReader([host, port, uri])

Simple mongo reader.

NotionPageReader

Notion Page reader.

GoogleDocsReader

Google Docs reader.

MetalReader(api_key, client_id, index_id)

Metal reader.

DiscordReader

Discord reader.

SlackReader

Slack reader.

WeaviateReader(host[, auth_client_secret])

Weaviate reader.

PathwayReader(host, port)

Pathway reader.

PineconeReader(api_key[, environment])

Pinecone reader.

PsychicReader([psychic_key])

Psychic reader.

QdrantReader([location, url, port, ...])

Qdrant reader.

MilvusReader([host, port, user, password, ...])

Milvus reader.

ChromaReader(collection_name[, ...])

Chroma reader.

DeepLakeReader([token])

DeepLake reader.

FaissReader(index)

Faiss reader.

TxtaiReader(index)

txtai reader.

MyScaleReader(myscale_host, username, password)

MyScale reader.

Document

Generic interface for a data document.

StringIterableReader

String Iterable Reader.

SimpleWebPageReader

Simple web page reader.

BeautifulSoupWebReader

BeautifulSoup web page reader.

TrafilaturaWebReader

Trafilatura web page reader.

RssReader

RSS reader.

MakeWrapper()

Make reader.

TwitterTweetReader

Twitter tweets reader.

ObsidianReader(input_dir)

Utilities for loading data from an Obsidian Vault.

GithubRepositoryReader(owner, repo[, ...])

Github repository reader.

MboxReader()

Mbox e-mail reader.

ElasticsearchReader

Read documents from an Elasticsearch/Opensearch index.

SteamshipFileReader([api_key])

Reads persistent Steamship Files and converts them to Documents.

ChatGPTRetrievalPluginReader(endpoint_url[, ...])

ChatGPT Retrieval Plugin reader.

BagelReader(collection_name)

Reader for Bagel files.

HTMLTagReader([tag, ignore_no_id])

Read HTML files and extract text from a specific tag with BeautifulSoup.

ReaderConfig

Represents a reader and it's input arguments.

PDFReader([return_full_document])

PDF parser.

DashVectorReader(api_key, endpoint)

DashVector reader.