Data Flow Architecture

The Noodles.gl system uses reactive programming principles to manage data flow through the node graph, ensuring efficient updates and consistent state.

Reactive Programming Model

RxJS Foundation

The system is built on RxJS observables for reactive data flow:

field.setValue(value) // equivalent to field.next(field.schema.parse(value))

const value = field.value

// Listen to changes and re-render UI
field.subscribe(value => {
  // Update logic
})

Data Flow Principles

Unidirectional: Data flows from outputs to inputs
Reactive: Changes propagate automatically
Lazy: Nodes only execute when upstream values change
Memoized: Results are cached to avoid recomputation

Path Resolution System

Operator Identification in project serialization

Operators are identified by fully qualified paths that reflect their container hierarchy:

// Path examples
'/data-loader'                   // Root level operator
'/analysis/filter'               // Nested in analysis container
'/analysis/viz/scatter-plot'     // Deeply nested operator

Handle Identification in project serialization

Handles (connection points) use the operator's full path plus field information:

// Handle ID format: operatorPath.namespace.fieldName
'/analysis/processor.par.threshold'   // Parameter input
'/data-loader.out.data'               // Data output
'/viz/map.par.layers'                 // Nested operator parameter

Path Resolution Rules in a `CodeField`

The system supports Unix-style path resolution for operator references:

// From operator at '/analysis/processor'
op('/data-loader')       // Absolute: root level data-loader
op('./filter')           // Relative: /analysis/filter
op('../threshold')       // Parent: /threshold
op('normalizer')         // Same container: /analysis/normalizer

Connection System

Creating Connections

// Connect nodes programmatically using fully qualified paths
sourceNode.fields.output.addConnection(
  '/analysis/processor',  // target operator path
  targetNode.fields.input
)

Connection Lifecycle

Validation: Check type compatibility
Subscription: Set up reactive subscription
Data Flow: Values flow from source to target
Cleanup: Remove subscriptions when disconnected

Connection Rules

Type Safety: Zod schemas ensure type compatibility
Single Input: Each input accepts one connection
Multiple Outputs: Outputs can connect to many inputs
Cycle Detection: Prevents circular dependencies

Execution Model

Operator Execution

class Operator {
  execute(inputs: InputType): OutputType {
    // Pure function transformation
    return processInputs(inputs)
  }
}

Execution Triggers

Input Changes: When connected field values update
Parameter Changes: When operator parameters change
Manual Trigger: Explicit re-execution requests

Execution Order

Topological Sort: Determine execution order
Dependency Resolution: Execute upstream nodes first
Parallel Execution: Independent branches run concurrently
Result Propagation: Outputs trigger downstream execution

Memoization Strategy

Automatic Caching

// Results cached based on input hash
const cachedResult = memoize(operator.execute, inputs)

Cache Invalidation

Input Changes: Clear cache when inputs change
Parameter Updates: Invalidate on configuration changes
Manual Clearing: Explicit cache clearing for debugging

Memory Management

LRU Eviction: Remove least recently used results
Size Limits: Prevent unbounded cache growth
Weak References: Allow garbage collection

Performance Optimization

Batching Updates

// Batch multiple changes to avoid cascading updates
batch(() => {
  node1.fields.param1.setValue(value1)
  node2.fields.param2.setValue(value2)
})

Debouncing

// Debounce rapid changes to reduce computation
field.pipe(
  debounceTime(100),
  distinctUntilChanged()
).subscribe(value => {
  // Process debounced value
})

Selective Updates

Change Detection: Only update when values actually change
Shallow Comparison: Use object references for arrays/objects
Dirty Tracking: Mark nodes that need re-execution

Error Handling

Error Propagation

try {
  const result = operator.execute(inputs)
  field.next(result)
} catch (error) {
  field.error(error)  // Propagate error downstream
}

Error Recovery

Graceful Degradation: Continue execution with partial data
Default Values: Fall back to safe defaults
Error Boundaries: Isolate errors to prevent cascade failures

Debugging Support

Execution Tracing: Track data flow through graph
Performance Profiling: Measure execution times
State Inspection: Examine intermediate values

Integration Points

Theatre.js Timeline

// Keyframe field values over time using fully qualified paths
const animatedValue = useSheetValue(
  sheet.object('/node', '/analysis/processor').props.fieldName,
  defaultValue
)

Deck.gl Rendering

// Connect node outputs to Deck.gl layers
const layers = nodeGraph.getLayerNodes().map(node =>
  node.execute(inputs)
)

External Data Sources

// Reactive data loading
const dataStream = fromFetch('/api/data').pipe(
  map(response => response.json()),
  catchError(error => of(fallbackData))
)

Best Practices

Graph Design

Minimize Connections: Reduce complexity where possible
Logical Grouping: Group related operations
Clear Naming: Use descriptive node and field names
Documentation: Comment complex data transformations

Performance

Avoid Deep Graphs: Limit nesting depth
Batch Operations: Group related changes
Profile Bottlenecks: Identify slow operations
Optimize Hot Paths: Focus on frequently executed nodes

Debugging

Incremental Building: Test small graph sections
Data Inspection: Examine intermediate results
Error Logging: Capture and log execution errors
Visual Debugging: Use graph visualization tools

Maintenance

Version Control: Track graph changes
Migration Scripts: Handle schema updates
Testing: Unit test individual operators
Documentation: Maintain up-to-date docs

Reactive Programming Model​

RxJS Foundation​

Data Flow Principles​

Path Resolution System​

Operator Identification in project serialization​

Handle Identification in project serialization​

Path Resolution Rules in a CodeField​

Connection System​

Creating Connections​

Connection Lifecycle​

Connection Rules​

Execution Model​

Operator Execution​

Execution Triggers​

Execution Order​

Memoization Strategy​

Automatic Caching​

Cache Invalidation​

Memory Management​

Performance Optimization​

Batching Updates​

Debouncing​

Selective Updates​

Error Handling​

Error Propagation​

Error Recovery​

Debugging Support​

Integration Points​

Theatre.js Timeline​

Deck.gl Rendering​

External Data Sources​

Best Practices​

Graph Design​

Performance​

Debugging​

Maintenance​