TrafilaturaWebReader#

pydantic model llama_index.readers.TrafilaturaWebReader#

Trafilatura web page reader.

Reads pages from the web. Requires the trafilatura package.

Show JSON schema
{
   "title": "TrafilaturaWebReader",
   "description": "Trafilatura web page reader.\n\nReads pages from the web.\nRequires the `trafilatura` package.",
   "type": "object",
   "properties": {
      "is_remote": {
         "title": "Is Remote",
         "default": true,
         "type": "boolean"
      },
      "error_on_missing": {
         "title": "Error On Missing",
         "type": "boolean"
      },
      "class_name": {
         "title": "Class Name",
         "type": "string",
         "default": "TrafilaturaWebReader"
      }
   },
   "required": [
      "error_on_missing"
   ]
}

Config
  • arbitrary_types_allowed: bool = True

Fields
  • error_on_missing (bool)

  • is_remote (bool)

field error_on_missing: bool [Required]#
field is_remote: bool = True#
classmethod class_name() str#

Get the class name, used as a unique ID in serialization.

This provides a key that makes serialization robust against actual class name changes.

load_data(urls: List[str]) List[Document]#

Load data from the urls.

Parameters

urls (List[str]) – List of URLs to scrape.

Returns

List of documents.

Return type

List[Document]