Structured Index Configuration

Our structured indices are documented in Structured Store Index. Below, we provide a reference of the classes that are used to configure our structured indices.

SQL wrapper around SQLDatabase in langchain.

class llama_index.langchain_helpers.sql_wrapper.SQLDatabase(engine: Engine, schema: Optional[str] = None, metadata: Optional[MetaData] = None, ignore_tables: Optional[List[str]] = None, include_tables: Optional[List[str]] = None, sample_rows_in_table_info: int = 3, indexes_in_table_info: bool = False, custom_table_info: Optional[dict] = None, view_support: bool = False, max_string_length: int = 300)

SQL Database.

Wrapper around SQLDatabase object from langchain. Offers some helper utilities for insertion and querying. See langchain documentation for more details:

Parameters
  • *args – Arguments to pass to langchain SQLDatabase.

  • **kwargs – Keyword arguments to pass to langchain SQLDatabase.

property dialect: str

Return string representation of dialect to use.

property engine: Engine

Return SQL Alchemy engine.

classmethod from_cnosdb(url: str = '127.0.0.1:8902', user: str = 'root', password: str = '', tenant: str = 'cnosdb', database: str = 'public') SQLDatabase

Class method to create an SQLDatabase instance from a CnosDB connection. This method requires the ‘cnos-connector’ package. If not installed, it can be added using pip install cnos-connector.

Parameters
  • url (str) – The HTTP connection host name and port number of the CnosDB service, excluding “http://” or “https://”, with a default value of “127.0.0.1:8902”.

  • user (str) – The username used to connect to the CnosDB service, with a default value of “root”.

  • password (str) – The password of the user connecting to the CnosDB service, with a default value of “”.

  • tenant (str) – The name of the tenant used to connect to the CnosDB service, with a default value of “cnosdb”.

  • database (str) – The name of the database in the CnosDB tenant.

Returns

An instance of SQLDatabase configured with the provided CnosDB connection details.

Return type

SQLDatabase

classmethod from_databricks(catalog: str, schema: str, host: Optional[str] = None, api_token: Optional[str] = None, warehouse_id: Optional[str] = None, cluster_id: Optional[str] = None, engine_args: Optional[dict] = None, **kwargs: Any) SQLDatabase

Class method to create an SQLDatabase instance from a Databricks connection. This method requires the ‘databricks-sql-connector’ package. If not installed, it can be added using pip install databricks-sql-connector.

Parameters
  • catalog (str) – The catalog name in the Databricks database.

  • schema (str) – The schema name in the catalog.

  • host (Optional[str]) – The Databricks workspace hostname, excluding ‘https://’ part. If not provided, it attempts to fetch from the environment variable ‘DATABRICKS_HOST’. If still unavailable and if running in a Databricks notebook, it defaults to the current workspace hostname. Defaults to None.

  • api_token (Optional[str]) – The Databricks personal access token for accessing the Databricks SQL warehouse or the cluster. If not provided, it attempts to fetch from ‘DATABRICKS_TOKEN’. If still unavailable and running in a Databricks notebook, a temporary token for the current user is generated. Defaults to None.

  • warehouse_id (Optional[str]) – The warehouse ID in the Databricks SQL. If provided, the method configures the connection to use this warehouse. Cannot be used with ‘cluster_id’. Defaults to None.

  • cluster_id (Optional[str]) – The cluster ID in the Databricks Runtime. If provided, the method configures the connection to use this cluster. Cannot be used with ‘warehouse_id’. If running in a Databricks notebook and both ‘warehouse_id’ and ‘cluster_id’ are None, it uses the ID of the cluster the notebook is attached to. Defaults to None.

  • engine_args (Optional[dict]) – The arguments to be used when connecting Databricks. Defaults to None.

  • **kwargs (Any) – Additional keyword arguments for the from_uri method.

Returns

An instance of SQLDatabase configured with the provided

Databricks connection details.

Return type

SQLDatabase

Raises

ValueError – If ‘databricks-sql-connector’ is not found, or if both ‘warehouse_id’ and ‘cluster_id’ are provided, or if neither ‘warehouse_id’ nor ‘cluster_id’ are provided and it’s not executing inside a Databricks notebook.

classmethod from_uri(database_uri: str, engine_args: Optional[dict] = None, **kwargs: Any) SQLDatabase

Construct a SQLAlchemy engine from URI.

get_single_table_info(table_name: str) str

Get table info for a single table.

get_table_columns(table_name: str) List[Any]

Get table columns.

get_table_info(table_names: Optional[List[str]] = None) str

Get information about specified tables.

Follows best practices as specified in: Rajkumar et al, 2022 (https://arxiv.org/abs/2204.00498)

If sample_rows_in_table_info, the specified number of sample rows will be appended to each table description. This can increase performance as demonstrated in the paper.

get_table_info_no_throw(table_names: Optional[List[str]] = None) str

Get information about specified tables.

Follows best practices as specified in: Rajkumar et al, 2022 (https://arxiv.org/abs/2204.00498)

If sample_rows_in_table_info, the specified number of sample rows will be appended to each table description. This can increase performance as demonstrated in the paper.

get_table_names() Iterable[str]

Get names of tables available.

get_usable_table_names() Iterable[str]

Get names of tables available.

insert_into_table(table_name: str, data: dict) None

Insert data into a table.

property metadata_obj: MetaData

Return SQL Alchemy metadata.

run(command: str, fetch: str = 'all') str

Execute a SQL command and return a string representing the results.

If the statement returns rows, a string of the results is returned. If the statement returns no rows, an empty string is returned.

run_no_throw(command: str, fetch: str = 'all') str

Execute a SQL command and return a string representing the results.

If the statement returns rows, a string of the results is returned. If the statement returns no rows, an empty string is returned.

If the statement throws an error, the error message is returned.

run_sql(command: str) Tuple[str, Dict]

Execute a SQL statement and return a string representing the results.

If the statement returns rows, a string of the results is returned. If the statement returns no rows, an empty string is returned.

property table_info: str

Information about all tables in the database.

SQL Container builder.

class llama_index.indices.struct_store.container_builder.SQLContextContainerBuilder(sql_database: SQLDatabase, context_dict: Optional[Dict[str, str]] = None, context_str: Optional[str] = None)

SQLContextContainerBuilder.

Build a SQLContextContainer that can be passed to the SQL index during index construction or during query-time.

NOTE: if context_str is specified, that will be used as context instead of context_dict

Parameters
  • sql_database (SQLDatabase) – SQL database

  • context_dict (Optional[Dict[str, str]]) – context dict

build_context_container(ignore_db_schema: bool = False) SQLContextContainer

Build index structure.

derive_index_from_context(index_cls: Type[BaseIndex], ignore_db_schema: bool = False, **index_kwargs: Any) BaseIndex

Derive index from context.

classmethod from_documents(documents_dict: Dict[str, List[BaseNode]], sql_database: SQLDatabase, **context_builder_kwargs: Any) SQLContextContainerBuilder

Build context from documents.

query_index_for_context(index: BaseIndex, query_str: Union[str, QueryBundle], query_tmpl: Optional[str] = 'Please return the relevant tables (including the full schema) for the following query: {orig_query_str}', store_context_str: bool = True, **index_kwargs: Any) str

Query index for context.

A simple wrapper around the index.query call which injects a query template to specifically fetch table information, and can store a context_str.

Parameters
  • index (BaseIndex) – index data structure

  • query_str (QueryType) – query string

  • query_tmpl (Optional[str]) – query template

  • store_context_str (bool) – store context_str

Common classes for structured operations.

class llama_index.indices.common.struct_store.base.BaseStructDatapointExtractor(llm_predictor: BaseLLMPredictor, schema_extract_prompt: BasePromptTemplate, output_parser: Callable[[str], Optional[Dict[str, Any]]])

Extracts datapoints from a structured document.

insert_datapoint_from_nodes(nodes: Sequence[BaseNode]) None

Extract datapoint from a document and insert it.

class llama_index.indices.common.struct_store.base.SQLDocumentContextBuilder(sql_database: SQLDatabase, service_context: Optional[ServiceContext] = None, text_splitter: Optional[TextSplitter] = None, table_context_prompt: Optional[BasePromptTemplate] = None, refine_table_context_prompt: Optional[BasePromptTemplate] = None, table_context_task: Optional[str] = None)

Builder that builds context for a given set of SQL tables.

Parameters
  • sql_database (Optional[SQLDatabase]) – SQL database to use,

  • llm_predictor (Optional[BaseLLMPredictor]) – LLM Predictor to use.

  • prompt_helper (Optional[PromptHelper]) – Prompt Helper to use.

  • text_splitter (Optional[TextSplitter]) – Text Splitter to use.

  • table_context_prompt (Optional[BasePromptTemplate]) – A Table Context Prompt (see Prompt Templates).

  • refine_table_context_prompt (Optional[BasePromptTemplate]) – A Refine Table Context Prompt (see Prompt Templates).

  • table_context_task (Optional[str]) – The query to perform on the table context. A default query string is used if none is provided by the user.

build_all_context_from_documents(documents_dict: Dict[str, List[BaseNode]]) Dict[str, str]

Build context for all tables in the database.

build_table_context_from_documents(documents: Sequence[BaseNode], table_name: str) str

Build context from documents for a single table.