Time-Travel Debugging
Flowcraft's time-travel debugging allows you to replay workflow executions from persistent event logs, enabling powerful debugging and analysis capabilities. This feature reconstructs the exact state of any workflow execution without re-running the logic.
Overview
Time-travel debugging works by storing all workflow events (node starts, finishes, context changes, errors, etc.) in a persistent event store. You can then replay these events to reconstruct the workflow state at any point in time, making it easy to:
- Debug complex workflow failures
- Analyze performance bottlenecks
- Understand execution flow
- Build monitoring and observability tools
Setting Up Event Storage
Flowcraft provides several event store implementations:
In-Memory Event Store (Development)
import { InMemoryEventStore, PersistentEventBusAdapter } from 'flowcraft'
const eventStore = new InMemoryEventStore()
const eventBus = new PersistentEventBusAdapter(eventStore)
const runtime = new FlowRuntime({ eventBus })SQLite Event Store
import { SqliteHistoryAdapter } from '@flowcraft/sqlite-history'
const eventStore = new SqliteHistoryAdapter({
databasePath: './workflow-events.db',
walMode: true // Enable WAL for concurrent access
})
const eventBus = new PersistentEventBusAdapter(eventStore)
const runtime = new FlowRuntime({ eventBus })PostgreSQL Event Store
import { PostgresHistoryAdapter } from '@flowcraft/postgres-history'
const eventStore = new PostgresHistoryAdapter({
host: 'localhost',
port: 5432,
database: 'flowcraft',
user: 'flowcraft',
password: 'password',
tableName: 'workflow_events'
})
const eventBus = new PersistentEventBusAdapter(eventStore)
const runtime = new FlowRuntime({ eventBus })Recording Workflow Executions
Once configured with a persistent event bus, all workflow executions are automatically recorded:
const runtime = new FlowRuntime({ eventBus })
// All executions are now recorded
const result = await runtime.run(blueprint, initialContext, {
functionRegistry: registry
})Events recorded include:
workflow:start- Workflow execution beginsworkflow:resume- Workflow resumes from pause/stallnode:start- Node execution beginsnode:finish- Node completes successfullynode:error- Node fails with errornode:retry- Node is retriednode:fallback- Node uses fallback executioncontext:change- Context is modifiedbatch:start- Batch operation beginsbatch:finish- Batch operation completesworkflow:stall- Workflow waits (sleep/timer)workflow:pause- Workflow is pausedworkflow:finish- Workflow completes
Replaying Executions
Replay reconstructs the workflow state from stored events:
// Get the execution ID from a previous run
const executionId = result.context._executionId
// Retrieve events for this execution
const events = await eventStore.retrieve(executionId)
// Replay the execution
const replayResult = await runtime.replay(blueprint, events, executionId)
console.log('Replayed result:', replayResult)Key Replay Behaviors
- Deterministic: Replay always produces the same final state
- Fast: No node logic is re-executed, only state reconstruction
- Complete: All context changes, outputs, and errors are reconstructed
- Status: Replayed executions always show
status: 'completed'(since they reconstruct the final state)
Advanced Usage
Replaying Multiple Executions
// Get events for multiple executions
const executionIds = ['exec-1', 'exec-2', 'exec-3']
const eventsMap = await eventStore.retrieveMultiple(executionIds)
// Replay each execution
for (const [execId, events] of eventsMap) {
const replayResult = await runtime.replay(blueprint, events, execId)
console.log(`Execution ${execId} final state:`, replayResult.context)
}Analyzing Execution Patterns
// Get events and analyze execution patterns
const events = await eventStore.retrieve(executionId)
// Count different event types
const eventCounts = events.reduce((counts, event) => {
counts[event.type] = (counts[event.type] || 0) + 1
return counts
}, {} as Record<string, number>)
console.log('Event breakdown:', eventCounts)Building Custom Analytics
// Extract timing information
const nodeTimings = events
.filter(e => e.type === 'node:start' || e.type === 'node:finish')
.reduce((timings, event) => {
const nodeId = event.payload.nodeId
if (event.type === 'node:start') {
timings[nodeId] = { start: event.timestamp }
} else if (timings[nodeId]) {
timings[nodeId].end = event.timestamp
timings[nodeId].duration = event.timestamp - timings[nodeId].start
}
return timings
}, {} as Record<string, any>)
console.log('Node execution times:', nodeTimings)Integration with Existing Tools
Combining with Interactive Debugging
import { createStepper } from 'flowcraft/testing'
// Use stepper for detailed debugging
const stepper = await createStepper(runtime, blueprint, registry)
// Step through execution while events are recorded
while (!stepper.isDone()) {
const result = await stepper.next()
console.log('Current state:', await stepper.state.getContext().getAll())
// Events are automatically stored for later replay
}Visual Execution Analysis
import { generateMermaidForRun } from 'flowcraft'
// Generate visual execution trace
const events = await eventStore.retrieve(executionId)
const mermaidDiagram = generateMermaidForRun(blueprint, events)
// Render with Mermaid to see execution pathEvent Store Management
Cleanup and Maintenance
// Clear all events (useful for testing)
await eventStore.clear()
// Get statistics
const stats = await eventStore.getStats()
console.log(`Total events: ${stats.totalEvents}, Executions: ${stats.executions}`)Custom Event Stores
Implement the IEventStore interface for custom storage backends:
import type { FlowcraftEvent, IEventStore } from 'flowcraft'
class CustomEventStore implements IEventStore {
async store(event: FlowcraftEvent, executionId: string): Promise<void> {
// Implement storage logic
}
async retrieve(executionId: string): Promise<FlowcraftEvent[]> {
// Implement retrieval logic
return []
}
async retrieveMultiple(executionIds: string[]): Promise<Map<string, FlowcraftEvent[]>> {
// Implement bulk retrieval
return new Map()
}
}Performance Considerations
- Storage Size: Events accumulate over time; implement retention policies
- Query Performance: Index execution_id and timestamps for fast retrieval
- Memory Usage: Large workflows generate many events; consider pagination
- Concurrent Access: Use WAL mode (SQLite) or connection pooling (PostgreSQL)
Best Practices
- Use Appropriate Storage: In-memory for development, persistent stores for production
- Monitor Storage Growth: Implement event retention and cleanup policies
- Index Strategically: Index on execution_id and event_type for fast queries
- Handle Large Workflows: Consider event pagination for very large executions
- Combine with Logging: Use alongside traditional logging for comprehensive observability
Troubleshooting
Common Issues
Events not being stored: Ensure PersistentEventBusAdapter is properly configured Replay state mismatch: Verify blueprint hasn't changed between recording and replay Performance issues: Check database indexes and consider event archiving Memory issues: Implement event streaming for very large workflows
Time-travel debugging provides unprecedented visibility into workflow execution, making it easier to build reliable and maintainable workflow applications.