Simple Directory Reader

The SimpleDirectoryReader is the most commonly used data connector that just works.
Simply pass in a input directory or a list of files.
It will select the best file reader based on the file extensions.

Get Started

from llama_index import SimpleDirectoryReader

Load specific files

reader = SimpleDirectoryReader(
    input_files=["../data/paul_graham/paul_graham_essay.txt"]
)
docs = reader.load_data()
print(f"Loaded {len(docs)} docs")
Loaded 1 docs

Load all (top-level) files from directory

reader = SimpleDirectoryReader(input_dir="../../end_to_end_tutorials/")
docs = reader.load_data()
print(f"Loaded {len(docs)} docs")
Loaded 72 docs

Load all (recursive) files from directory

# only load markdown files
required_exts = [".md"]

reader = SimpleDirectoryReader(
    input_dir="../../end_to_end_tutorials", required_exts=required_exts, recursive=True
)
docs = reader.load_data()
print(f"Loaded {len(docs)} docs")
Loaded 174 docs

Full Configuration

This is the full list of arguments that can be passed to the SimpleDirectoryReader:

class SimpleDirectoryReader(BaseReader):
    """Simple directory reader.

    Load files from file directory. 
    Automatically select the best file reader given file extensions.


    Args:
        input_dir (str): Path to the directory.
        input_files (List): List of file paths to read
            (Optional; overrides input_dir, exclude)
        exclude (List): glob of python file paths to exclude (Optional)
        exclude_hidden (bool): Whether to exclude hidden files (dotfiles).
        encoding (str): Encoding of the files.
            Default is utf-8.
        errors (str): how encoding and decoding errors are to be handled,
                see https://docs.python.org/3/library/functions.html#open
        recursive (bool): Whether to recursively search in subdirectories.
            False by default.
        filename_as_id (bool): Whether to use the filename as the document id.
            False by default.
        required_exts (Optional[List[str]]): List of required extensions.
            Default is None.
        file_extractor (Optional[Dict[str, BaseReader]]): A mapping of file
            extension to a BaseReader class that specifies how to convert that file
            to text. If not specified, use default from DEFAULT_FILE_READER_CLS.
        num_files_limit (Optional[int]): Maximum number of files to read.
            Default is None.
        file_metadata (Optional[Callable[str, Dict]]): A function that takes
            in a filename and returns a Dict of metadata for the Document.
            Default is None.
"""