Data Connectorsο
NOTE: Our data connectors are now offered through LlamaHub π¦. LlamaHub is an open-source repository containing data loaders that you can easily plug and play into any LlamaIndex application.
The following data connectors are still available in the core repo.
Data Connectors for LlamaIndex.
This module contains the data connectors for LlamaIndex. Each connector inherits from a BaseReader class, connects to a data source, and loads Document objects from that data source.
You may also choose to construct Document objects manually, for instance in our Insert How-To Guide. See below for the API definition of a Document - the bare minimum is a text property.
- class llama_index.readers.BeautifulSoupWebReader(website_extractor: Optional[Dict[str, Callable]] = None)ο
BeautifulSoup web page reader.
Reads pages from the web. Requires the bs4 and urllib packages.
- Parameters
file_extractor (Optional[Dict[str, Callable]]) β A mapping of website hostname (e.g. google.com) to a function that specifies how to extract text from the BeautifulSoup obj. See DEFAULT_WEBSITE_EXTRACTOR.
- load_data(urls: List[str], custom_hostname: Optional[str] = None) List[Document] ο
Load data from the urls.
- Parameters
urls (List[str]) β List of URLs to scrape.
custom_hostname (Optional[str]) β Force a certain hostname in the case a website is displayed under custom URLs (e.g. Substack blogs)
- Returns
List of documents.
- Return type
List[Document]
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.ChatGPTRetrievalPluginReader(endpoint_url: str, bearer_token: Optional[str] = None, retries: Optional[Retry] = None, batch_size: int = 100)ο
ChatGPT Retrieval Plugin reader.
- load_data(query: str, top_k: int = 10, separate_documents: bool = True, **kwargs: Any) List[Document] ο
Load data from ChatGPT Retrieval Plugin.
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.ChromaReader(collection_name: str, persist_directory: Optional[str] = None, host: str = 'localhost', port: int = 8000)ο
Chroma reader.
Retrieve documents from existing persisted Chroma collections.
- Parameters
collection_name β Name of the peristed collection.
persist_directory β Directory where the collection is persisted.
- create_documents(results: Any) List[Document] ο
Create documents from the results.
- Parameters
results β Results from the query.
- Returns
List of documents.
- load_data(query_embedding: Optional[List[float]] = None, limit: int = 10, where: Optional[dict] = None, where_document: Optional[dict] = None, query: Optional[Union[str, List[str]]] = None) Any ο
Load data from the collection.
- Parameters
limit β Number of results to return.
where β Filter results by metadata. {βmetadata_fieldβ: βis_equal_to_thisβ}
where_document β Filter results by document. {β$containsβ:βsearch_stringβ}
- Returns
List of documents.
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.DeepLakeReader(token: Optional[str] = None)ο
DeepLake reader.
Retrieve documents from existing DeepLake datasets.
- Parameters
dataset_name β Name of the deeplake dataset.
- load_data(query_vector: List[float], dataset_path: str, limit: int = 4, distance_metric: str = 'l2') List[Document] ο
Load data from DeepLake.
- Parameters
dataset_name (str) β Name of the DeepLake dataet.
query_vector (List[float]) β Query vector.
limit (int) β Number of results to return.
- Returns
A list of documents.
- Return type
List[Document]
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.DiscordReader(discord_token: Optional[str] = None)ο
Discord reader.
Reads conversations from channels.
- Parameters
discord_token (Optional[str]) β Discord token. If not provided, we assume the environment variable DISCORD_TOKEN is set.
- load_data(channel_ids: List[int], limit: Optional[int] = None, oldest_first: bool = True) List[Document] ο
Load data from the input directory.
- Parameters
channel_ids (List[int]) β List of channel ids to read.
limit (Optional[int]) β Maximum number of messages to read.
oldest_first (bool) β Whether to read oldest messages first. Defaults to True.
- Returns
List of documents.
- Return type
List[Document]
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.Document(text: Optional[str] = None, doc_id: Optional[str] = None, embedding: Optional[List[float]] = None, doc_hash: Optional[str] = None, extra_info: Optional[Dict[str, Any]] = None)ο
Generic interface for a data document.
This document connects to data sources.
- property extra_info_str: Optional[str]ο
Extra info string.
- classmethod from_langchain_format(doc: Document) Document ο
Convert struct from LangChain document format.
- get_doc_hash() str ο
Get doc_hash.
- get_doc_id() str ο
Get doc_id.
- get_embedding() List[float] ο
Get embedding.
Errors if embedding is None.
- get_text() str ο
Get text.
- classmethod get_type() str ο
Get Document type.
- classmethod get_types() List[str] ο
Get Document type.
- property is_doc_id_none: boolο
Check if doc_id is None.
- property is_text_none: boolο
Check if text is None.
- to_langchain_format() Document ο
Convert struct to LangChain document format.
- class llama_index.readers.ElasticsearchReader(endpoint: str, index: str, httpx_client_args: Optional[dict] = None)ο
Read documents from an Elasticsearch/Opensearch index.
These documents can then be used in a downstream Llama Index data structure.
- Parameters
endpoint (str) β URL (http/https) of cluster
index (str) β Name of the index (required)
httpx_client_args (dict) β Optional additional args to pass to the httpx.Client
- load_data(field: str, query: Optional[dict] = None, embedding_field: Optional[str] = None) List[Document] ο
Read data from the Elasticsearch index.
- Parameters
field (str) β Field in the document to retrieve text from
query (Optional[dict]) β Elasticsearch JSON query DSL object. For example: {βqueryβ: {βmatchβ: {βmessageβ: {βqueryβ: βthis is a testβ}}}}
embedding_field (Optional[str]) β If there are embeddings stored in this index, this field can be used to set the embedding field on the returned Document list.
- Returns
A list of documents.
- Return type
List[Document]
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.FaissReader(index: Any)ο
Faiss reader.
Retrieves documents through an existing in-memory Faiss index. These documents can then be used in a downstream LlamaIndex data structure. If you wish use Faiss itself as an index to to organize documents, insert documents, and perform queries on them, please use GPTVectorStoreIndex with FaissVectorStore.
- Parameters
faiss_index (faiss.Index) β A Faiss Index object (required)
- load_data(query: ndarray, id_to_text_map: Dict[str, str], k: int = 4, separate_documents: bool = True) List[Document] ο
Load data from Faiss.
- Parameters
query (np.ndarray) β A 2D numpy array of query vectors.
id_to_text_map (Dict[str, str]) β A map from IDβs to text.
k (int) β Number of nearest neighbors to retrieve. Defaults to 4.
separate_documents (Optional[bool]) β Whether to return separate documents. Defaults to True.
- Returns
A list of documents.
- Return type
List[Document]
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.GithubRepositoryReader(owner: str, repo: str, use_parser: bool = True, verbose: bool = False, github_token: Optional[str] = None, concurrent_requests: int = 5, ignore_file_extensions: Optional[List[str]] = None, ignore_directories: Optional[List[str]] = None)ο
Github repository reader.
Retrieves the contents of a Github repository and returns a list of documents. The documents are either the contents of the files in the repository or the text extracted from the files using the parser.
Examples
>>> reader = GithubRepositoryReader("owner", "repo") >>> branch_documents = reader.load_data(branch="branch") >>> commit_documents = reader.load_data(commit_sha="commit_sha")
- load_data(commit_sha: Optional[str] = None, branch: Optional[str] = None) List[Document] ο
Load data from a commit or a branch.
Loads github repository data from a specific commit sha or a branch.
- Parameters
commit β commit sha
branch β branch name
- Returns
list of documents
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.GoogleDocsReaderο
Google Docs reader.
Reads a page from Google Docs
- load_data(document_ids: List[str]) List[Document] ο
Load data from the input directory.
- Parameters
document_ids (List[str]) β a list of document ids.
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.JSONReader(levels_back: Optional[int] = None, collapse_length: Optional[int] = None)ο
JSON reader.
Reads JSON documents with options to help suss out relationships between nodes.
- Parameters
levels_back (int) β the number of levels to go back in the JSON tree, 0
None (if you want all levels. If levels_back is) β
the (then we just format) β
embedding (JSON and make each line an) β
collapse_length (int) β the maximum number of characters a JSON fragment
output (would be collapsed in the) β
ex β if collapse_length = 10, and
{a (input is) β [1, 2, 3], b: {βhelloβ: βworldβ, βfooβ: βbarβ}}
line (then a would be collapsed into one) β
not. (while b would) β
there. (Recommend starting around 100 and then adjusting from) β
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.MakeWrapperο
Make reader.
- load_data(*args: Any, **load_kwargs: Any) List[Document] ο
Load data from the input directory.
NOTE: This is not implemented.
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.MboxReaderο
Mbox e-mail reader.
Reads a set of e-mails saved in the mbox format.
- load_data(input_dir: str, **load_kwargs: Any) List[Document] ο
Load data from the input directory.
- load_kwargs:
max_count (int): Maximum amount of messages to read. message_format (str): Message format overriding default.
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.MetalReader(api_key: str, client_id: str, index_id: str)ο
Metal reader.
- Parameters
api_key (str) β Metal API key.
client_id (str) β Metal client ID.
index_id (str) β Metal index ID.
- load_data(limit: int, query_embedding: Optional[List[float]] = None, filters: Optional[Dict[str, Any]] = None, separate_documents: bool = True, **query_kwargs: Any) List[Document] ο
Load data from Metal.
- Parameters
query_embedding (Optional[List[float]]) β Query embedding for search.
limit (int) β Number of results to return.
filters (Optional[Dict[str, Any]]) β Filters to apply to the search.
separate_documents (Optional[bool]) β Whether to return separate documents per retrieved entry. Defaults to True.
**query_kwargs β Keyword arguments to pass to the search.
- Returns
A list of documents.
- Return type
List[Document]
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.MilvusReader(host: str = 'localhost', port: int = 19530, user: str = '', password: str = '', use_secure: bool = False)ο
Milvus reader.
- load_data(query_vector: List[float], collection_name: str, expr: Optional[Any] = None, search_params: Optional[dict] = None, limit: int = 10) List[Document] ο
Load data from Milvus.
- Parameters
collection_name (str) β Name of the Milvus collection.
query_vector (List[float]) β Query vector.
limit (int) β Number of results to return.
- Returns
A list of documents.
- Return type
List[Document]
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.MyScaleReader(myscale_host: str, username: str, password: str, myscale_port: Optional[int] = 8443, database: str = 'default', table: str = 'llama_index', index_type: str = 'IVFLAT', metric: str = 'cosine', batch_size: int = 32, index_params: Optional[dict] = None, search_params: Optional[dict] = None, **kwargs: Any)ο
MyScale reader.
- Parameters
myscale_host (str) β An URL to connect to MyScale backend.
username (str) β Usernamed to login.
password (str) β Password to login.
myscale_port (int) β URL port to connect with HTTP. Defaults to 8443.
database (str) β Database name to find the table. Defaults to βdefaultβ.
table (str) β Table name to operate on. Defaults to βvector_tableβ.
index_type (str) β index type string. Default to βIVFLATβ
metric (str) β Metric to compute distance, supported are (βl2β, βcosineβ, βipβ). Defaults to βcosineβ
batch_size (int, optional) β the size of documents to insert. Defaults to 32.
index_params (dict, optional) β The index parameters for MyScale. Defaults to None.
search_params (dict, optional) β The search parameters for a MyScale query. Defaults to None.
- load_data(query_vector: List[float], where_str: Optional[str] = None, limit: int = 10) List[Document] ο
Load data from MyScale.
- Parameters
query_vector (List[float]) β Query vector.
where_str (Optional[str], optional) β where condition string. Defaults to None.
limit (int) β Number of results to return.
- Returns
A list of documents.
- Return type
List[Document]
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.NotionPageReader(integration_token: Optional[str] = None)ο
Notion Page reader.
Reads a set of Notion pages.
- Parameters
integration_token (str) β Notion integration token.
- load_data(page_ids: List[str] = [], database_id: Optional[str] = None) List[Document] ο
Load data from the input directory.
- Parameters
page_ids (List[str]) β List of page ids to load.
- Returns
List of documents.
- Return type
List[Document]
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- query_database(database_id: str, query_dict: Dict[str, Any] = {}) List[str] ο
Get all the pages from a Notion database.
- read_page(page_id: str) str ο
Read a page.
- search(query: str) List[str] ο
Search Notion page given a text query.
- class llama_index.readers.ObsidianReader(input_dir: str)ο
Utilities for loading data from an Obsidian Vault.
- Parameters
input_dir (str) β Path to the vault.
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.PineconeReader(api_key: str, environment: str)ο
Pinecone reader.
- Parameters
api_key (str) β Pinecone API key.
environment (str) β Pinecone environment.
- load_data(index_name: str, id_to_text_map: Dict[str, str], vector: Optional[List[float]], top_k: int, separate_documents: bool = True, include_values: bool = True, **query_kwargs: Any) List[Document] ο
Load data from Pinecone.
- Parameters
index_name (str) β Name of the index.
id_to_text_map (Dict[str, str]) β A map from IDβs to text.
separate_documents (Optional[bool]) β Whether to return separate documents per retrieved entry. Defaults to True.
vector (List[float]) β Query vector.
top_k (int) β Number of results to return.
include_values (bool) β Whether to include the embedding in the response. Defaults to True.
**query_kwargs β Keyword arguments to pass to the query. Arguments are the exact same as those found in Pineconeβs reference documentation for the query method.
- Returns
A list of documents.
- Return type
List[Document]
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.QdrantReader(location: Optional[str] = None, url: Optional[str] = None, port: Optional[int] = 6333, grpc_port: int = 6334, prefer_grpc: bool = False, https: Optional[bool] = None, api_key: Optional[str] = None, prefix: Optional[str] = None, timeout: Optional[float] = None, host: Optional[str] = None, path: Optional[str] = None)ο
Qdrant reader.
Retrieve documents from existing Qdrant collections.
- Parameters
location β If :memory: - use in-memory Qdrant instance. If str - use it as a url parameter. If None - use default values for host and port.
url β either host or str of βOptional[scheme], host, Optional[port], Optional[prefix]β. Default: None
port β Port of the REST API interface. Default: 6333
grpc_port β Port of the gRPC interface. Default: 6334
prefer_grpc β If true - use gPRC interface whenever possible in custom methods.
https β If true - use HTTPS(SSL) protocol. Default: false
api_key β API key for authentication in Qdrant Cloud. Default: None
prefix β If not None - add prefix to the REST URL path. Example: service/v1 will result in http://localhost:6333/service/v1/{qdrant-endpoint} for REST API. Default: None
timeout β Timeout for REST and gRPC API requests. Default: 5.0 seconds for REST and unlimited for gRPC
host β Host name of Qdrant service. If url and host are None, set to βlocalhostβ. Default: None
- load_data(collection_name: str, query_vector: List[float], should_search_mapping: Optional[Dict[str, str]] = None, must_search_mapping: Optional[Dict[str, str]] = None, must_not_search_mapping: Optional[Dict[str, str]] = None, rang_search_mapping: Optional[Dict[str, Dict[str, float]]] = None, limit: int = 10) List[Document] ο
Load data from Qdrant.
- Parameters
collection_name (str) β Name of the Qdrant collection.
query_vector (List[float]) β Query vector.
should_search_mapping (Optional[Dict[str, str]]) β Mapping from field name to query string.
must_search_mapping (Optional[Dict[str, str]]) β Mapping from field name to query string.
must_not_search_mapping (Optional[Dict[str, str]]) β Mapping from field name to query string.
rang_search_mapping (Optional[Dict[str, Dict[str, float]]]) β Mapping from field name to range query.
limit (int) β Number of results to return.
Example
reader = QdrantReader() reader.load_data(
collection_name=βtest_collectionβ, query_vector=[0.1, 0.2, 0.3], should_search_mapping={βtext_fieldβ: βtextβ}, must_search_mapping={βtext_fieldβ: βtextβ}, must_not_search_mapping={βtext_fieldβ: βtextβ}, # gte, lte, gt, lt supported rang_search_mapping={βtext_fieldβ: {βgteβ: 0.1, βlteβ: 0.2}}, limit=10
)
- Returns
A list of documents.
- Return type
List[Document]
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.RssReader(html_to_text: bool = False)ο
RSS reader.
Reads content from an RSS feed.
- load_data(urls: List[str]) List[Document] ο
Load data from RSS feeds.
- Parameters
urls (List[str]) β List of RSS URLs to load.
- Returns
List of documents.
- Return type
List[Document]
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.SimpleDirectoryReader(input_dir: Optional[str] = None, input_files: Optional[List] = None, exclude: Optional[List] = None, exclude_hidden: bool = True, errors: str = 'ignore', recursive: bool = False, required_exts: Optional[List[str]] = None, file_extractor: Optional[Dict[str, BaseParser]] = None, num_files_limit: Optional[int] = None, file_metadata: Optional[Callable[[str], Dict]] = None)ο
Simple directory reader.
Can read files into separate documents, or concatenates files into one document text.
- Parameters
input_dir (str) β Path to the directory.
input_files (List) β List of file paths to read (Optional; overrides input_dir, exclude)
exclude (List) β glob of python file paths to exclude (Optional)
exclude_hidden (bool) β Whether to exclude hidden files (dotfiles).
errors (str) β how encoding and decoding errors are to be handled, see https://docs.python.org/3/library/functions.html#open
recursive (bool) β Whether to recursively search in subdirectories. False by default.
required_exts (Optional[List[str]]) β List of required extensions. Default is None.
file_extractor (Optional[Dict[str, BaseParser]]) β A mapping of file extension to a BaseParser class that specifies how to convert that file to text. See DEFAULT_FILE_EXTRACTOR.
num_files_limit (Optional[int]) β Maximum number of files to read. Default is None.
file_metadata (Optional[Callable[str, Dict]]) β A function that takes in a filename and returns a Dict of metadata for the Document. Default is None.
- load_data(concatenate: bool = False) List[Document] ο
Load data from the input directory.
- Parameters
concatenate (bool) β whether to concatenate all text docs into a single doc. If set to True, file metadata is ignored. False by default. This setting does not apply to image docs (always one doc per image).
- Returns
A list of documents.
- Return type
List[Document]
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.SimpleMongoReader(host: Optional[str] = None, port: Optional[int] = None, uri: Optional[str] = None, max_docs: int = 1000)ο
Simple mongo reader.
Concatenates each Mongo doc into Document used by LlamaIndex.
- Parameters
host (str) β Mongo host.
port (int) β Mongo port.
max_docs (int) β Maximum number of documents to load.
- load_data(db_name: str, collection_name: str, field_names: List[str] = ['text'], query_dict: Optional[Dict] = None) List[Document] ο
Load data from the input directory.
- Parameters
db_name (str) β name of the database.
collection_name (str) β name of the collection.
field_names (List[str]) β names of the fields to be concatenated. Defaults to [βtextβ]
query_dict (Optional[Dict]) β query to filter documents. Defaults to None
- Returns
A list of documents.
- Return type
List[Document]
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.SimpleWebPageReader(html_to_text: bool = False)ο
Simple web page reader.
Reads pages from the web.
- Parameters
html_to_text (bool) β Whether to convert HTML to text. Requires html2text package.
- load_data(urls: List[str]) List[Document] ο
Load data from the input directory.
- Parameters
urls (List[str]) β List of URLs to scrape.
- Returns
List of documents.
- Return type
List[Document]
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.SlackReader(slack_token: Optional[str] = None, ssl: Optional[SSLContext] = None, earliest_date: Optional[datetime] = None, latest_date: Optional[datetime] = None)ο
Slack reader.
Reads conversations from channels. If an earliest_date is provided, an optional latest_date can also be provided. If no latest_date is provided, we assume the latest date is the current timestamp.
- Parameters
slack_token (Optional[str]) β Slack token. If not provided, we assume the environment variable SLACK_BOT_TOKEN is set.
ssl (Optional[str]) β Custom SSL context. If not provided, it is assumed there is already an SSL context available.
earliest_date (Optional[datetime]) β Earliest date from which to read conversations. If not provided, we read all messages.
latest_date (Optional[datetime]) β Latest date from which to read conversations. If not provided, defaults to current timestamp in combination with earliest_date.
- load_data(channel_ids: List[str], reverse_chronological: bool = True) List[Document] ο
Load data from the input directory.
- Parameters
channel_ids (List[str]) β List of channel ids to read.
- Returns
List of documents.
- Return type
List[Document]
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.SteamshipFileReader(api_key: Optional[str] = None)ο
Reads persistent Steamship Files and converts them to Documents.
- Parameters
api_key β Steamship API key. Defaults to STEAMSHIP_API_KEY value if not provided.
Note
Requires install of steamship package and an active Steamship API Key. To get a Steamship API Key, visit: https://steamship.com/account/api. Once you have an API Key, expose it via an environment variable named STEAMSHIP_API_KEY or pass it as an init argument (api_key).
- load_data(workspace: str, query: Optional[str] = None, file_handles: Optional[List[str]] = None, collapse_blocks: bool = True, join_str: str = '\n\n') List[Document] ο
Load data from persistent Steamship Files into Documents.
- Parameters
workspace β the handle for a Steamship workspace (see: https://docs.steamship.com/workspaces/index.html)
query β a Steamship tag query for retrieving files (ex: βfiletag and value(βimport-idβ)=βimport-001ββ)
file_handles β a list of Steamship File handles (ex: smooth-valley-9kbdr)
collapse_blocks β whether to merge individual File Blocks into a single Document, or separate them.
join_str β when collapse_blocks is True, this is how the block texts will be concatenated.
Note
The collection of Files from both query and file_handles will be combined. There is no (current) support for deconflicting the collections (meaning that if a file appears both in the result set of the query and as a handle in file_handles, it will be loaded twice).
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.StringIterableReaderο
String Iterable Reader.
Gets a list of documents, given an iterable (e.g. list) of strings.
Example
from llama_index import StringIterableReader, GPTTreeIndex documents = StringIterableReader().load_data( texts=["I went to the store", "I bought an apple"]) index = GPTTreeIndex.from_documents(documents) query_engine = index.as_query_engine() query_engine.query("what did I buy?") # response should be something like "You bought an apple."
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.TrafilaturaWebReader(error_on_missing: bool = False)ο
Trafilatura web page reader.
Reads pages from the web. Requires the trafilatura package.
- load_data(urls: List[str]) List[Document] ο
Load data from the urls.
- Parameters
urls (List[str]) β List of URLs to scrape.
- Returns
List of documents.
- Return type
List[Document]
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.TwitterTweetReader(bearer_token: str, num_tweets: Optional[int] = 100)ο
Twitter tweets reader.
Read tweets of user twitter handle.
Check βhttps://developer.twitter.com/en/docs/twitter-api/ getting-started/getting-access-to-the-twitter-apiβ on how to get access to twitter API.
- Parameters
bearer_token (str) β bearer_token that you get from twitter API.
num_tweets (Optional[int]) β Number of tweets for each user twitter handle. Default is 100 tweets.
- load_data(twitterhandles: List[str], **load_kwargs: Any) List[Document] ο
Load tweets of twitter handles.
- Parameters
twitterhandles (List[str]) β List of user twitter handles to read tweets.
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.WeaviateReader(host: str, auth_client_secret: Optional[Any] = None)ο
Weaviate reader.
Retrieves documents from Weaviate through vector lookup. Allows option to concatenate retrieved documents into one Document, or to return separate Document objects per document.
- Parameters
host (str) β host.
auth_client_secret (Optional[weaviate.auth.AuthCredentials]) β auth_client_secret.
- load_data(class_name: Optional[str] = None, properties: Optional[List[str]] = None, graphql_query: Optional[str] = None, separate_documents: Optional[bool] = True) List[Document] ο
Load data from Weaviate.
If graphql_query is not found in load_kwargs, we assume that class_name and properties are provided.
- Parameters
class_name (Optional[str]) β class_name to retrieve documents from.
properties (Optional[List[str]]) β properties to retrieve from documents.
graphql_query (Optional[str]) β Raw GraphQL Query. We assume that the query is a Get query.
separate_documents (Optional[bool]) β Whether to return separate documents. Defaults to True.
- Returns
A list of documents.
- Return type
List[Document]
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.WikipediaReaderο
Wikipedia reader.
Reads a page.
- load_data(pages: List[str], **load_kwargs: Any) List[Document] ο
Load data from the input directory.
- Parameters
pages (List[str]) β List of pages to read.
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.
- class llama_index.readers.YoutubeTranscriptReaderο
Youtube Transcript reader.
- load_data(ytlinks: List[str], **load_kwargs: Any) List[Document] ο
Load data from the input directory.
- Parameters
pages (List[str]) β List of youtube links for which transcripts are to be read.
- load_langchain_documents(**load_kwargs: Any) List[Document] ο
Load data in LangChain document format.