Add option to persist boot/update failure info to flash#809
Conversation
When boot/update partition verification fails during boot or update, with this option the event is logged to flash in an ad-hoc partition. Information about logged failures is made available to the application through an API.
There was a problem hiding this comment.
Pull request overview
This PR adds an optional “persistent failure diagnostics” feature to wolfBoot: when enabled, boot/update/rollback verification failures are recorded to a dedicated flash region and made available to the application via a small read/clear API.
Changes:
- Add a flash-backed, circular log for failure records and expose read/clear APIs in
libwolfboot. - Record verification failures during update/boot and record rollback-not-confirmed events when rollback occurs.
- Add build options and document the new feature and API.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/update_flash.c | Records boot/update verification failures and rollback-not-confirmed events when diagnostics are enabled. |
| src/libwolfboot.c | Implements the on-flash diagnostics log format, scanning/ordering logic, and the public read/clear APIs. |
| options.mk | Adds build-time options/macros to enable diagnostics and configure the reserved flash region. |
| include/wolfboot/wolfboot.h | Defines failure phases/causes, the persisted record structure, and the new public API prototypes. |
| docs/API.md | Documents the new failure diagnostics feature and how applications consume the records. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
danielinux
left a comment
There was a problem hiding this comment.
Check word size comment.
Secondary notes (non-blocking):
- No overlap/alignment validation. Address-must-be-sector-aligned and must-not-overlap-partitions are documented but not enforced at compile time. A BUILD_BUG-style check against partition bounds would prevent a footgun.
- App-side wolfBoot_clear_failures() needs the flash driver. get_failure/clear_failures compile outside the __WOLFBOOT guard, so an application calling clear pulls in hal_flash_unlock/erase/lock. That's presumably intended, but worth a doc note that clearing requires the HAL flash driver linked into the app.
- diag_read non-ext path ignores unmapped/secure memory faults — fine for memory-mapped internal flash, just noting the XMEMCPY assumes the region is always readable.
|
Currently the empty slot for storage is picked as the first one with wrong CRC (and it works fine as long as we got happy path and the memory is indeed empty). In case of an interrupted record write (e.g. power failure) this slot will be chosen for the next record infinitely and the whole feature gets stuck. On write-once/ECC flash reprogramming such partially written word is also not allowed, so the write itself can fail. |
|
diag_write_header returns the ext_flash_write/hal_flash_write result but it is not consistent across HALs, some return 0 on success and some return len. So this value can't be checked with != 0 - on those HALs a successful header write would be treated as an error and with WOLFBOOT_DIAGNOSTICS_EXT the whole feature stops working. Possibly checking < 0 should be safe for both conventions. |
|
Two verification failure paths don't get recorded at all:
|
Also updates docs to specify the requirement of the HAL flash driver by wolfBoot_clear_failures().
- Failed boot after emergency update (WOLFBOOT_FAILURE_PHASE_RECOVERY) - Failed verification of self-update image
|
Thanks @KotMorderca! |
|
Thanks @mattia-moffa for fixes! Awesome work! Few additional smaller findings:
|
That would be tricky to enforce, because at the moment we don't have a generic wolfBoot macro that specifies the flash write granularity. |
- Check diag_erase() return value in wolfBoot_record_failure() - Dedicated WOLFBOOT_FAILURE_PHASE_SELF_UPDATE - Mark the diagnostics flash write buffers XALIGNED_STACK(4) for HALs that access the source word by word - Ensure diagnostics, update, or swap partitions don't overlap the bootloader
Maybe those two static asserts would catch the obvious misconfigurations:
Apart from that the feature looks good to me now. |
Some configs may exploit that possibility
This one is reasonable. Thanks, I added it. |
|
Looks good to me, thanks for all fixes. @mattia-moffa grat work! |
When boot/update partition verification fails during boot or update, with this option the event is logged to flash in an ad-hoc partition. Information about logged failures is made available to the application through an API.
For a more detailed description see the changes to
docs/API.md.