CAN Bus OTA Bootloader
Custom CAN bootloader for STM32F446 with block-level Stop-and-Wait, 3-layer verification, and 48x ACK traffic reduction vs frame-level protocol
Context
- System: Wally v2 architecture (RPi5 + STM32F446 baseboard)
- Requirement: Field-upgradable STM32 firmware via CAN from RPi5
- Constraint: No external flash for A/B partition, CAN limited to 8 bytes per frame, 32KB bootloader limit
Core Problem
Updating firmware over CAN requires solving several challenges:
- CAN’s 8-byte frame limit means 480KB firmware = 60,000+ frames
- No A/B partition—failed write bricks the device
- CAN bus noise in home appliance environment
- Bootloader must fit in 32KB while handling all error cases
Why this was hard: Frame-by-frame acknowledgment would require 60,000 ACKs (480KB of control traffic). Single-slot update means every byte must be verified before committing.
Key Insight
Block-level Stop-and-Wait (64 frames = 384 bytes per ACK) reduces control traffic by 48x compared to frame-level protocol, while maintaining reliability through multi-layer verification.
Why 384 bytes? LCM(6, 4) = 12. Block size must be multiple of 6 (payload per frame) and 4 (word size for flash). 384 = 12 × 32 satisfies both constraints with minimal RAM.
Approach
1) Memory Layout with Image Header
Flash: 512KB
+------------------+ 0x08000000
| Bootloader | 32KB (Sectors 0-1)
+------------------+ 0x08008000
| Image Header | 256 bytes
| Application | ~480KB (Sectors 2-7)
+------------------+ 0x08080000
Image Header (256 bytes):
typedef struct {
uint32_t magic; // 0x52565441 ("RVTA")
uint32_t header_ver; // Compatibility check
uint32_t image_size; // Excluding header
uint32_t image_crc32; // CRC of app code only
uint32_t vector_addr; // Entry point validation
uint32_t build_time; // Timestamp
// ... reserved for future use
} image_header_t;
Header enables bootloader to validate image before jumping to application—catches corrupted vectors that would cause hard fault.
2) Block-Level Stop-and-Wait Protocol
| CAN ID | Direction | Payload |
|---|---|---|
| 0x2F0 | RPi → STM | OTA_START [size(4), crc(4)] |
| 0x2F1 | RPi → STM | OTA_DATA [seq(2), data(6)] |
| 0x2F2 | RPi → STM | OTA_END [] |
| 0x1F0 | STM → RPi | OTA_ACK [status, seq, progress, error] |
| 0x1F1 | STM → RPi | OTA_HELLO [state, progress, error, ver] |
Traffic comparison:
| Protocol | Frames | ACKs | ACK Traffic |
|---|---|---|---|
| Frame-level | 60,000 | 60,000 | 480 KB |
| Block-level (384B) | 60,000 | 1,250 | 10 KB |
48x reduction in control traffic.
3) Three-Layer Verification
Layer 1: Per-block read-back (immediate)
↓
Layer 2: Header + vector validation (after transfer)
↓
Layer 3: Full image CRC (final)
Layer 1 - Per-block verification:
// Write to flash
flash_write_buffer(addr, buffer, 384);
// Immediately read back and compare
const uint8_t *flash = (const uint8_t *)addr;
for (int i = 0; i < 384; i++) {
if (flash[i] != buffer[i]) return false;
}
Layer 2 - Header validation:
- Magic value check (0x52565441)
- Header version compatibility
- Vector table address within valid range
Layer 3 - CRC verification:
- Calculate CRC over entire application
- Compare with header’s expected CRC
4) Interrupt Safety with Timing Analysis
Flash write cannot happen during CAN RX—buffer corruption risk:
HAL_NVIC_DisableIRQ(CAN1_RX0_IRQn); // ~2ms window
flash_write_buffer(addr, buffer, len);
// Read-back verification
HAL_NVIC_EnableIRQ(CAN1_RX0_IRQn);
Timing breakdown:
- Flash write (384 bytes): ~1.5ms
- Read-back verify: ~0.3ms
- Total interrupt disable: ~2ms per block
- 480KB / 384B = 1,250 blocks × 2ms = 2.5 seconds total (spread across 10s transfer)
Host receives BUSY (rate-limited to 50ms) during flash write—prevents timeout while signaling “alive but working”.
5) Sequence Recovery Mechanism
Three-tier handling for lost/corrupted frames:
if (seq == expected_seq) {
// Normal case: accept frame
} else if (seq < block_start_seq) {
// Retransmit of already-written block
// Host didn't receive our ACK—echo it again
send_ack(OK);
} else {
// Out of sequence: reset to block boundary
received -= buffer_len; // Undo partial count
buffer_len = 0;
expected_seq = block_start_seq;
send_ack(ERROR_SEQ);
}
Key insight: Reset to block boundary (not frame) ensures clean recovery. Partial blocks are discarded, not patched.
6) Dynamic Timeout
Timeout resets on every frame, not just ACKs:
void handle_data(uint8_t *data) {
last_data_tick = HAL_GetTick(); // Reset on ANY frame
// ... process data
}
This handles slow links gracefully—1 frame every 2 seconds is fine, as long as frames keep arriving. Timeout only triggers on silence.
Tradeoffs
| Decision | Rationale | Tradeoff |
|---|---|---|
| 64-frame blocks (384B) | LCM(6,4)=12, balances RAM vs flash writes | 384 bytes static RAM |
| Single-slot (no A/B) | 32KB bootloader limit, maximize app space | Must verify before ACK |
| Block-level Stop-and-Wait | 48x less ACK traffic than frame-level | Slightly slower than pipelined |
| Dual ACK sending | CAN has no retransmit layer, handles lost ACK | 0.004% overhead |
| 2ms interrupt disable | Flash integrity > latency | Host must handle BUSY |
| 256-byte image header | Validate before jump, catch bad vectors | Reduces app space by 256B |
| Dynamic timeout | Handles variable link speeds | Slightly more complex logic |
Results
Performance:
- Throughput: 46.9 KB/s effective (500 kbps × 75% efficiency)
- Update time: ~10 seconds for 33KB image
- Data transfer: 0.7s
- Flash operations: ~2.5s
- Overhead (erase, verify): ~6.8s
- Bootloader size: 8KB (within 32KB limit)
Reliability:
- Bricked devices: 0 in testing
- Verification coverage: 100% (per-block + header + CRC)
- Recovery: Automatic retry on sequence mismatch
ACK Status Codes:
| Code | Meaning |
|---|---|
| 0x00 | OK - continue |
| 0x01 | READY - flash erased |
| 0x02 | BUSY - writing flash |
| 0x03 | COMPLETE - all verified |
| 0x10 | ERROR_CRC |
| 0x11 | ERROR_SIZE |
| 0x12 | ERROR_FLASH |
| 0x13 | ERROR_TIMEOUT |
| 0x14 | ERROR_SEQ |
Key Takeaway
Block-level Stop-and-Wait with 3-layer verification provides reliable CAN OTA without complex windowing protocols. The 48x reduction in ACK traffic (vs frame-level) comes from careful block sizing using LCM constraints. Per-block verification catches errors immediately; header validation prevents jumping to corrupted vectors; final CRC confirms end-to-end integrity.