Skip to content

Time-Travel Debugging

Flowcraft's time-travel debugging allows you to replay workflow executions from persistent event logs, enabling powerful debugging and analysis capabilities. This feature reconstructs the exact state of any workflow execution without re-running the logic.

Overview

Time-travel debugging works by storing all workflow events (node starts, finishes, context changes, errors, etc.) in a persistent event store. You can then replay these events to reconstruct the workflow state at any point in time, making it easy to:

  • Debug complex workflow failures
  • Analyze performance bottlenecks
  • Understand execution flow
  • Build monitoring and observability tools

Setting Up Event Storage

Flowcraft provides several event store implementations:

In-Memory Event Store (Development)

typescript
import { InMemoryEventStore, PersistentEventBusAdapter } from 'flowcraft'

const eventStore = new InMemoryEventStore()
const eventBus = new PersistentEventBusAdapter(eventStore)
const runtime = new FlowRuntime({ eventBus })

SQLite Event Store

typescript
import { SqliteHistoryAdapter } from '@flowcraft/sqlite-history'

const eventStore = new SqliteHistoryAdapter({
  databasePath: './workflow-events.db',
  walMode: true // Enable WAL for concurrent access
})
const eventBus = new PersistentEventBusAdapter(eventStore)
const runtime = new FlowRuntime({ eventBus })

PostgreSQL Event Store

typescript
import { PostgresHistoryAdapter } from '@flowcraft/postgres-history'

const eventStore = new PostgresHistoryAdapter({
  host: 'localhost',
  port: 5432,
  database: 'flowcraft',
  user: 'flowcraft',
  password: 'password',
  tableName: 'workflow_events'
})
const eventBus = new PersistentEventBusAdapter(eventStore)
const runtime = new FlowRuntime({ eventBus })

Recording Workflow Executions

Once configured with a persistent event bus, all workflow executions are automatically recorded:

typescript
const runtime = new FlowRuntime({ eventBus })

// All executions are now recorded
const result = await runtime.run(blueprint, initialContext, {
  functionRegistry: registry
})

Events recorded include:

  • workflow:start - Workflow execution begins
  • workflow:resume - Workflow resumes from pause/stall
  • node:start - Node execution begins
  • node:finish - Node completes successfully
  • node:error - Node fails with error
  • node:retry - Node is retried
  • node:fallback - Node uses fallback execution
  • context:change - Context is modified
  • batch:start - Batch operation begins
  • batch:finish - Batch operation completes
  • workflow:stall - Workflow waits (sleep/timer)
  • workflow:pause - Workflow is paused
  • workflow:finish - Workflow completes

Replaying Executions

Replay reconstructs the workflow state from stored events:

typescript
// Get the execution ID from a previous run
const executionId = result.context._executionId

// Retrieve events for this execution
const events = await eventStore.retrieve(executionId)

// Replay the execution
const replayResult = await runtime.replay(blueprint, events, executionId)

console.log('Replayed result:', replayResult)

Key Replay Behaviors

  • Deterministic: Replay always produces the same final state
  • Fast: No node logic is re-executed, only state reconstruction
  • Complete: All context changes, outputs, and errors are reconstructed
  • Status: Replayed executions always show status: 'completed' (since they reconstruct the final state)

Advanced Usage

Replaying Multiple Executions

typescript
// Get events for multiple executions
const executionIds = ['exec-1', 'exec-2', 'exec-3']
const eventsMap = await eventStore.retrieveMultiple(executionIds)

// Replay each execution
for (const [execId, events] of eventsMap) {
  const replayResult = await runtime.replay(blueprint, events, execId)
  console.log(`Execution ${execId} final state:`, replayResult.context)
}

Analyzing Execution Patterns

typescript
// Get events and analyze execution patterns
const events = await eventStore.retrieve(executionId)

// Count different event types
const eventCounts = events.reduce((counts, event) => {
  counts[event.type] = (counts[event.type] || 0) + 1
  return counts
}, {} as Record<string, number>)

console.log('Event breakdown:', eventCounts)

Building Custom Analytics

typescript
// Extract timing information
const nodeTimings = events
  .filter(e => e.type === 'node:start' || e.type === 'node:finish')
  .reduce((timings, event) => {
    const nodeId = event.payload.nodeId
    if (event.type === 'node:start') {
      timings[nodeId] = { start: event.timestamp }
    } else if (timings[nodeId]) {
      timings[nodeId].end = event.timestamp
      timings[nodeId].duration = event.timestamp - timings[nodeId].start
    }
    return timings
  }, {} as Record<string, any>)

console.log('Node execution times:', nodeTimings)

Integration with Existing Tools

Combining with Interactive Debugging

typescript
import { createStepper } from 'flowcraft/testing'

// Use stepper for detailed debugging
const stepper = await createStepper(runtime, blueprint, registry)

// Step through execution while events are recorded
while (!stepper.isDone()) {
  const result = await stepper.next()
  console.log('Current state:', await stepper.state.getContext().getAll())

  // Events are automatically stored for later replay
}

Visual Execution Analysis

typescript
import { generateMermaidForRun } from 'flowcraft'

// Generate visual execution trace
const events = await eventStore.retrieve(executionId)
const mermaidDiagram = generateMermaidForRun(blueprint, events)

// Render with Mermaid to see execution path

Event Store Management

Cleanup and Maintenance

typescript
// Clear all events (useful for testing)
await eventStore.clear()

// Get statistics
const stats = await eventStore.getStats()
console.log(`Total events: ${stats.totalEvents}, Executions: ${stats.executions}`)

Custom Event Stores

Implement the IEventStore interface for custom storage backends:

typescript
import type { FlowcraftEvent, IEventStore } from 'flowcraft'

class CustomEventStore implements IEventStore {
  async store(event: FlowcraftEvent, executionId: string): Promise<void> {
    // Implement storage logic
  }

  async retrieve(executionId: string): Promise<FlowcraftEvent[]> {
    // Implement retrieval logic
    return []
  }

  async retrieveMultiple(executionIds: string[]): Promise<Map<string, FlowcraftEvent[]>> {
    // Implement bulk retrieval
    return new Map()
  }
}

Performance Considerations

  • Storage Size: Events accumulate over time; implement retention policies
  • Query Performance: Index execution_id and timestamps for fast retrieval
  • Memory Usage: Large workflows generate many events; consider pagination
  • Concurrent Access: Use WAL mode (SQLite) or connection pooling (PostgreSQL)

Best Practices

  1. Use Appropriate Storage: In-memory for development, persistent stores for production
  2. Monitor Storage Growth: Implement event retention and cleanup policies
  3. Index Strategically: Index on execution_id and event_type for fast queries
  4. Handle Large Workflows: Consider event pagination for very large executions
  5. Combine with Logging: Use alongside traditional logging for comprehensive observability

Troubleshooting

Common Issues

Events not being stored: Ensure PersistentEventBusAdapter is properly configured Replay state mismatch: Verify blueprint hasn't changed between recording and replay Performance issues: Check database indexes and consider event archiving Memory issues: Implement event streaming for very large workflows

Time-travel debugging provides unprecedented visibility into workflow execution, making it easier to build reliable and maintainable workflow applications.

Released under the MIT License