Pipelines

A pipeline is a step-by-step workflow you build visually. You pick the tasks you need, connect them together, and Zedoc runs them in the right order — no code required.

For example, a simple metadata pipeline might look like this:

  1. Upload an EPUB file
  2. Extract the text content
  3. Classify the book using AI (e.g., assign BISAC codes)
  4. Generate an ONIX record for distribution

Each step’s output feeds into the next. You design the flow once, then run it whenever you have new content to process.

Building a Pipeline

The pipeline editor is a visual canvas where you assemble your workflow. You browse a palette of available tasks and place the ones you need onto the canvas. From there, you link each task’s output to the next task’s input to define the flow of data. Each task can be configured individually — for example, you might choose a classification model or set an output format. Before you run anything, Zedoc validates the entire pipeline to make sure all connections are compatible.

Connections are type-safe: you can only link outputs to inputs that accept the same kind of data. For instance, a task that produces StructuredText can connect to any task that accepts StructuredText, but not to one expecting an ImageFile. This prevents errors before they happen.

Pipeline Inputs and Outputs

Every pipeline defines its own inputs and outputs — the data that goes in when you start a run, and the deliverables that come out when it finishes.

Inputs and outputs are added as you place tasks on the canvas and connect them to the pipeline’s entry and exit points. A blank pipeline starts with no inputs or outputs.

Pipeline inputs

Pipeline inputs define what users provide when launching a run. Each input becomes a form field in the run dialog. For example, a pipeline that processes EPUB files might have a single input named “EPUB File” of type EPUBFile.

Each input has:

  • A name shown as the form label (e.g., “Manuscript”, “Cover Image”)
  • A semantic type that determines what kind of data is accepted
  • A required/optional flag — required inputs must be filled in before the run can start
  • An optional multiple flag — when enabled, users can provide more than one value (e.g., upload several image files at once)

Pipeline outputs

Pipeline outputs define the final deliverables. Task outputs are mapped to pipeline outputs so that when the run completes, you get the results you need. For instance, a metadata pipeline might produce an ONIX record and enriched book metadata as its outputs.

You configure inputs and outputs in the pipeline editor.

Task Inputs and Outputs

Each task in a pipeline defines the data it needs (inputs) and the data it produces (outputs). Every input and output is typed with a semantic type, which is how Zedoc enforces that connections between tasks are valid.

  • Required inputs must be connected for the pipeline to run. Optional inputs can be left unconnected — the task will run without them.
  • Multiple-value inputs accept a list of values rather than a single one. For example, a task that merges chapters might accept multiple StructuredText inputs.
  • Configuration options are separate from data inputs. These are settings like “maximum number of subjects” or “output format” that you configure at design time and that don’t change between runs.

Some tasks have dynamic inputs or outputs that change based on configuration. For example, a task might add or remove output ports depending on which options you enable.

How Data Flows Between Tasks

When a task completes, its output values are stored so that downstream tasks can read them.

All inputs and outputs for every task in a run are preserved. After a run completes, you can inspect the data at each step to understand what happened or to debug issues.

Running a Pipeline

When you run a pipeline, Zedoc figures out the right execution order automatically:

  • Dependency-aware — Each task waits for the steps it depends on to finish before starting
  • Parallel when possible — Steps that don’t depend on each other run at the same time, so your workflow finishes faster
  • Real-time progress — You can watch each task’s status as the pipeline runs
  • Graceful error handling — If a task fails, the pipeline pauses and shows you exactly what went wrong so you can fix it and retry