Semantic Types

Every input and output in a Zedoc pipeline is tagged with a semantic type — a label that describes what kind of data it carries. Types are the reason you cannot accidentally connect a task that produces cover images to a task that expects text. They catch mistakes at design time, before you spend credits running a pipeline that would fail.

The Problem Types Solve

A pipeline is a chain of tasks, and each task expects specific kinds of data. A metadata extraction task needs a book file, not an audio recording. A text summarization task needs structured text, not an image. Without types, nothing would prevent you from wiring these together incorrectly. You would only discover the mistake when the pipeline fails mid-run.

Semantic types make these constraints visible in the editor. When you drag a connection between two tasks, Zedoc checks whether the output type matches the input type. If they are not compatible, the connection is not allowed. This gives you immediate feedback while you are building, rather than cryptic errors later.

How Types Appear in the Editor

In the pipeline editor, every input and output port displays its semantic type. When you hover over a port, you can see its type name and a brief description. As you drag a connection from an output, the editor highlights which inputs are compatible — making it easy to see where the data can go.

This means you do not need to memorize type names. The editor guides you toward valid connections naturally.

Content Types and Derived Types

Types in Zedoc fall into two broad categories.

Content types represent general data formats — the raw materials you feed into a pipeline. These include things like plain text, structured text with sections and headings, audio files, image files, and book files such as EPUBs and PDFs.

Derived types represent structured, domain-specific data that tasks produce. For example, after extracting metadata from an EPUB, you get book metadata (title, author, ISBN, publisher). After running a classification task, you get BISAC codes or Thema categories. After generating distribution data, you get an ONIX record. These derived types carry meaningful publishing data, not just raw files.

The distinction matters because derived types are more specific. A task that accepts book metadata will not accept raw text, even though both are “data.” Types enforce this precision so your pipelines produce reliable results.

Types and Pipeline Design

Types shape how you design pipelines. Because each task declares exactly what it accepts and what it produces, the set of possible connections is constrained in useful ways. You can think of types as a guide: they tell you which tasks can follow which.

For example, if you have a task that produces an ONIX record, you know the next task in the chain must accept ONIX records — you will not accidentally route that data to a task expecting an image file. This makes pipelines self-documenting. Anyone looking at a pipeline can trace the data flow and understand what each connection carries.

Types also enable Zedoc to validate your entire pipeline before it runs. If any connection is invalid, you will know immediately — not after waiting for tasks to execute.

Relationship to Other Concepts

Pipelines use types to enforce valid connections between tasks. The pipeline editor relies on type compatibility to guide you as you build.
Tasks declare their input and output types as part of their definition. Browsing a task’s page shows you exactly what types it works with.
Credits are only consumed when a pipeline runs successfully. Because types prevent many common wiring mistakes, they help you avoid wasting credits on runs that would have failed.