# Pipelines - AI Agent Observability

> Observe AI agent work as pipelines - in real time

The Pipelines module models AI agent work as observability (not execution). The Pipeline -> PipelineStep -> PipelineJob hierarchy in PostgreSQL lets you track every stage of agent work: from research, through coding, to wrap-up. Job logs are stored as wiki pages, status propagates upward - a failed job pulls its step and pipeline down. The dashboard refreshes every 15 seconds, and is_stale() flags pipelines stuck for more than 6 hours.

## Features

- **Pipeline -> Step -> Job hierarchy**: Each pipeline has steps (research, coding, wrap-up), each step has jobs. Status propagates upward - failed job -> failed step -> failed pipeline.
- **Job logs as wiki pages**: Every job has its own wiki page with a full log (content in MinIO). Pages grouped under the 'pipeline-logi' parent, excluded from RAG - they don't pollute search results.
- **15s live polling**: The pipeline list and detail view refresh every 15 seconds via HTMX. Durations are computed on the fly (finished_at - started_at or now() - started_at for active jobs).
- **Stuck pipeline detection**: is_stale() flags pipelines in running status with no updates for more than 6 hours. Computed at read time, no cron job needed.
- **Read-only UI - control via MCP**: The dashboard is read-only - pipelines are created by skills via MCP, not from the UI. AI agents have full control through 9 dedicated MCP tools.
- **wiki-log integration**: When LLM Wiki is enabled, finishing a pipeline appends an entry to wiki-log. Best-effort - a pipeline MCP error never blocks the agent's work.

## How it works

1. **Agent creates the pipeline**: The skill calls create_pipeline via MCP - a ticket_work pipeline automatically gets 3 steps: research, coding, wrap-up.
2. **Agent reports job starts**: create_pipeline_job registers a new job in a step (e.g. 'Source code analysis' in the coding step). Status: pending -> running.
3. **Agent appends logs**: append_job_log appends markdown to the job's wiki page. Each call adds another section - the complete history of agent work.
4. **Agent reports the result**: update_pipeline_job(status='success'/'failure') closes the job, sets finished_at, and propagates status up to the step and pipeline.
5. **Dashboard shows progress**: Pipeline view with step columns and job cards. Statuses, durations, and links to wiki logs - everything in real time.
6. **finish_pipeline closes everything**: Agent calls finish_pipeline when done. Final status aggregated from steps if not provided. Optional wiki-log entry when LLM Wiki is enabled.

## AI and MCP

9 MCP tools for full observability of agent work - creating pipelines, reporting jobs, appending logs, and closing out. All operations are best-effort - a pipeline error never blocks the agent's work.

- `create_pipeline`: Create a pipeline (ticket_work: automatic research/coding/wrap-up steps).
- `create_pipeline_job`: Add a job to a step with name, description, and starting status.
- `update_pipeline_job`: Update job status and summary. Propagates upward: job -> step -> pipeline.
- `append_job_log`: Append markdown to the job's wiki page. Creates the page on first call.
- `finish_pipeline`: Close the pipeline with final status (aggregated from steps if not provided).
- `list_pipelines`: List project pipelines with status and type filters plus pagination.
- `get_pipeline`: Full pipeline tree: steps, jobs, timestamps, statuses, and log links.
- `get_pipeline_job_log`: Fetch the full job log as markdown from the linked wiki page.
- `clean_pipeline_logs`: Delete the wiki pages holding pipeline job logs for a sprint (cleanup after close).

## Technical details

- **Data model**: Pipeline -> PipelineStep -> PipelineJob in PostgreSQL. UUID keys, timestamps (created_at, started_at, finished_at). pipeline_type: ticket_work | sprint_close.
- **Status propagation**: _update_step_status() and _update_pipeline_status() in services/pipelines.py. Job failed -> step failed -> pipeline failed. Aggregated via min() of statuses.
- **Wiki logs**: PipelineJob.wiki_page_id links to WikiPage. exclude_from_embeddings=True - logs are excluded from RAG. The 'pipeline-logi' parent is created idempotently.
- **Live polling**: HTMX hx-trigger='load delay:15s' on listRow and treePartial. Durations computed on the fly - no duration column in the database.
- **is_stale()**: Computed at read time: pipeline running + last update > 6h ago. Evaluated in the service layer, no cron. Flags stuck agents.
- **GitLab CI/CD pattern**: Pipeline/step/job architecture modeled on GitLab CI/CD. Observability (not execution) - agents self-report progress via MCP.
