Simple Directory Reader
The SimpleDirectoryReader
is the most commonly used data connector that just works.
Simply pass in a input directory or a list of files.
It will select the best file reader based on the file extensions.
Get Started
from llama_index import SimpleDirectoryReader
Load specific files
reader = SimpleDirectoryReader(
input_files=["../data/paul_graham/paul_graham_essay.txt"]
)
docs = reader.load_data()
print(f"Loaded {len(docs)} docs")
Loaded 1 docs
Load all (top-level) files from directory
reader = SimpleDirectoryReader(input_dir="../../end_to_end_tutorials/")
docs = reader.load_data()
print(f"Loaded {len(docs)} docs")
Loaded 72 docs
Load all (recursive) files from directory
# only load markdown files
required_exts = [".md"]
reader = SimpleDirectoryReader(
input_dir="../../end_to_end_tutorials", required_exts=required_exts, recursive=True
)
docs = reader.load_data()
print(f"Loaded {len(docs)} docs")
Loaded 174 docs
Full Configuration
This is the full list of arguments that can be passed to the SimpleDirectoryReader
:
class SimpleDirectoryReader(BaseReader):
"""Simple directory reader.
Load files from file directory.
Automatically select the best file reader given file extensions.
Args:
input_dir (str): Path to the directory.
input_files (List): List of file paths to read
(Optional; overrides input_dir, exclude)
exclude (List): glob of python file paths to exclude (Optional)
exclude_hidden (bool): Whether to exclude hidden files (dotfiles).
encoding (str): Encoding of the files.
Default is utf-8.
errors (str): how encoding and decoding errors are to be handled,
see https://docs.python.org/3/library/functions.html#open
recursive (bool): Whether to recursively search in subdirectories.
False by default.
filename_as_id (bool): Whether to use the filename as the document id.
False by default.
required_exts (Optional[List[str]]): List of required extensions.
Default is None.
file_extractor (Optional[Dict[str, BaseReader]]): A mapping of file
extension to a BaseReader class that specifies how to convert that file
to text. If not specified, use default from DEFAULT_FILE_READER_CLS.
num_files_limit (Optional[int]): Maximum number of files to read.
Default is None.
file_metadata (Optional[Callable[str, Dict]]): A function that takes
in a filename and returns a Dict of metadata for the Document.
Default is None.
"""