Content AI-powered 1 credit

Extract Contents

Extract text content from book files (EPUB, PDF) as structured sections. Automatically uses AI to identify chapter boundaries in large documents.

How It Works

For EPUB files, the task parses the document structure to extract individual sections with their titles and content. Each section is automatically classified with a section type (e.g., “titlepage”, “dedication”, “chapter”, “epilogue”, “glossary”) when the EPUB includes structural metadata. Sections are also marked as front matter, body, or back matter based on their type. The section type and front matter status are displayed in the contents viewer.

For PDF files, AI is used to detect chapter boundaries within the continuous text.

AI Section Classification

Some EPUB files don’t include the structural metadata needed to automatically identify section types. In those cases, AI determines what each section is — whether it’s a chapter, dedication, epilogue, acknowledgements, and so on. Sections that are already identified from the file’s metadata are left as-is. This runs automatically and is included in the base cost.

When to Use

Use this task early in a pipeline to convert book files into the StructuredText format required by most AI analysis tasks.

Reference

BookFile

Processing

Extract Contents

Book File

Structured Text

1 credit

StructuredText

Inputs

Book File BookFile Required

Book file (EPUB or PDF) to extract content from.

Outputs

Structured Text StructuredText

Sections with titles and text content.

Credits

Base cost: 1 credit