Kubernetes-Inspired Reconciler Pattern for Embedded Systems
A reusable reconciliation pattern adapted from Kubernetes for embedded firmware, managing state convergence across multiple subsystems with built-in safety interlocks.
Context
In a disinfection robot, multiple hardware subsystems—UVC lamps, fans, plasma generators, and LEDs—must maintain desired states despite transient failures, communication errors, and physical constraints. Each subsystem has its own timing requirements and safety dependencies. Traditional state machine approaches quickly become tangled when cross-subsystem dependencies emerge.
Core Problem
How do you manage state convergence for multiple independent subsystems, each with different retry schedules and safety constraints, without creating a monolithic controller that couples everything together?
Key Insight
Kubernetes solves a similar problem at the orchestration layer: observe the current state, compute the diff from desired state, take action to reconcile. This “reconciliation loop” pattern translates well to embedded systems—each subsystem gets its own reconciler that independently drives toward its target state.
Approach
The implementation centers on a generic reconciler template using std::function callbacks:
ReconcileResult Reconcile() {
if (!check_diff_()) {
retry_counter_ = reload_value_; // Converged; ready for next diff
return kNoDiff;
}
if (--retry_counter_ <= 0) {
action_();
retry_counter_ = reload_value_;
return kTriedAction;
}
return kWaitingForNextAction;
}
- check_diff: Predicate that returns true when current state differs from desired state
- action: Function to execute when reconciliation is needed
The reconciler runs in a 100 Hz main loop using counter-based retry throttling. With a 1000ms retry interval, the reload value computes to 100 ticks ((retry_interval_ms / 1000) * loop_frequency). When check_diff clears (state converged), the counter reloads to full, enabling immediate action on the next detected diff.
Four independent reconcilers manage the subsystems:
- UVC Reconciler - Safety interlock: requires robot not tilted
- Fan Reconciler - Independent operation
- Plasma Reconciler - Safety interlock: requires fan ON
- LED Reconciler - Forces update every 60 seconds to handle a wireless charging firmware edge case
Each reconciler returns one of three states: kNoDiff, kTriedAction, or kWaitingForNextAction.
Safety interlocks embed directly in check_diff predicates. For example, the UVC reconciler’s diff check includes tilt sensor state—if the robot is tilted, the predicate returns false regardless of UVC target state, preventing dangerous UV exposure.
The entire pattern fits in 123 lines of header and 59 lines of implementation.
Failure Handling
- Cascade prevention: Plasma reconciler checks fan state in its check_diff—if Fan reconciler is stuck, Plasma returns kNoDiff rather than waiting indefinitely
- Stuck detection: If a reconciler stays in kWaitingForNextAction for extended periods, the three-state return enables external monitoring to detect and log anomalies
- Memory footprint:
std::functionclosures typically 8-16 bytes (callback pointer + captured state pointer); suitable for embedded systems with constrained RAM
Tradeoffs
| Decision | Rationale | Tradeoff |
|---|---|---|
std::function callbacks over virtual methods | Flexibility, lambda support, easier testing | Slight runtime overhead vs vtable |
| Counter-based throttling over timers | No timer dependencies, deterministic, integrates with main loop | Coupled to loop frequency |
| Safety interlocks in check_diff | Centralized safety, prevents dangerous states before action | check_diff logic can become complex |
| Independent reconciler per subsystem | Decoupled, independent retry schedules, easier maintenance | More instances to manage |
| Immediate action on diff transition | Rapid recovery from transient failures | Could cause action burst if state oscillates |
| LED forced periodic update | Mitigate wireless charging firmware edge case | Unnecessary commands in normal operation |
| Cascade prevention via check_diff | Dependent subsystems fail gracefully | Adds complexity to predicate logic |
Results
Production deployment:
- Deployed in disinfection robot fleet at Bear Robotics
- Four subsystems (UVC, Fan, Plasma, LED) using the pattern
Safety validation:
- UVC safety interlock verified: check_diff returns false when tilt sensor active, preventing UV exposure during transport
- Plasma-Fan dependency enforced: Plasma cannot activate without stable fan operation
Developer experience:
- Adding new subsystem reconciler: define check_diff predicate + action callback, instantiate reconciler
- Pattern adoption eliminated scattered retry logic across codebase
- Three-state return (kNoDiff/kTriedAction/kWaitingForNextAction) enables external monitoring without modifying reconciler internals
Key Takeaway
- Kubernetes patterns translate to embedded: Reconciliation loops work at 100 Hz just as they work at container scale
- Safety in predicates: Embedding interlocks in check_diff centralizes safety logic and prevents dangerous states before action execution
- Decoupling via independence: Each subsystem reconciles independently with its own retry schedule—no monolithic controller
- Reusable template: New subsystems adopt the pattern by providing two callbacks; retry throttling and state tracking come free