esp32/ota: Implement ESP-IDF OTA functionality.#7048
Conversation
63b3fa5 to
814f065
Compare
|
Anybody? |
e98f531 to
a9ba869
Compare
23025f2 to
5fc57a9
Compare
e353cfd to
cdd760c
Compare
baa1249 to
8d0268c
Compare
3ce6443 to
bb49a3f
Compare
ffbdab7 to
e4d981c
Compare
e4d981c to
ab34d54
Compare
a2b5312 to
be56a18
Compare
|
@ekondayan Thanks for keeping this up to date, I'm using this successfully in my project. Hopefully it can get mainlined soon. |
cb5b5d3 to
7f51f67
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7048 +/- ##
=======================================
Coverage 98.41% 98.41%
=======================================
Files 171 171
Lines 22324 22324
=======================================
Hits 21971 21971
Misses 353 353 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
1052e6f to
6bf53c4
Compare
217647d to
4d52394
Compare
This is now enabled in the v1.21.0 release. See #12475 (sorry - I missed this PR when I was preparing that PR). Note - if you are interested in OTA, I also have a micropython OTA tool at https://github.com/glenn20/micropython-esp32-ota which uses the existing support for OTA in micropython. There is also a separate tool at https://github.com/glenn20/mp-image-tool-esp32 which can (among other things) add OTA partition tables to micropython firmware images and flash storage on ESP32 devices. |
4d52394 to
3c72acd
Compare
de47d80 to
3a98bae
Compare
3a98bae to
dbb6555
Compare
|
This is an automated heads-up that we've just merged a Pull Request See #13763 A search suggests this PR might apply the STATIC macro to some C code. If it Although this is an automated message, feel free to @-reply to me directly if |
|
Code size report: |
There was a problem hiding this comment.
Thanks for keeping this PR up to date for so long, @ekondayan.
The state of OTA support in MicroPython has probably changed since you first opened the PR, but if I follow correctly MicroPython already has implemented OTA functionality for ESP32 and this PR adds three related areas of functionality:
- New APIs for reading app information from the partition.
- Additional rollback support for marking an update invalid or testing if rollback is possible.
- A wrapper around the ESP-IDF OTA write API rather than writing to the partition directly.
Part 1 makes sense to me as useful functionality for managing OTA updates in a deployed application. You could probably put these into a separate PR and we could merge them pretty quickly.
Part 2 I have some inline questions about, but the check_rollback_is_possible() function definitely seems useful to avoid reboot loops.
Part 3, I'm not certain what the exact benefits of an ota_* API is here compared to manually writing the partition and then calling part.set_boot() to set it as the next boot partition.
Is the main difference that the OTA API will verify the image? If so, could we add this as an optional boolean verify argument on set_boot() and get the same functionality? If there are any other benefits, can you please explain them?
Finally, the code size report suggests this PR adds 1.6KB of static RAM usage (+1632(data) +8(bss)]). I'm not sure how or why, there's no static buffer that I can see, but that's a big impact for ESP32s without PSRAM so it'd be good if there was a way to avoid it.
|
|
||
| .. method:: Partition.app_state() | ||
|
|
||
| Returns the app state of a valid ota partition. It can be one of the following strings: |
There was a problem hiding this comment.
The more common pattern in MicroPython would be to return an integer from this function, and define these as constants on the Partition class - i.e. Partition.APP_STATE_NEW, Partition.APP_STATE_VERIFY, etc.
There was a problem hiding this comment.
You're right, that aligns better with MicroPython conventions. I'll refactor app_state() to return the integer value directly and add the corresponding constants to the Partition class:
Partition.APP_STATE_NEW
Partition.APP_STATE_PENDING_VERIFY
Partition.APP_STATE_VALID
Partition.APP_STATE_INVALID
Partition.APP_STATE_ABORTED
Partition.APP_STATE_UNDEFINED
| If the "CONFIG_BOOTLOADER_APP_ROLLBACK_ENABLE" option is set, and a reset occurs without | ||
| calling either | ||
| ``mark_app_valid_cancel_rollback()`` or ``mark_app_invalid_rollback_and_reboot()`` | ||
| function then the application is rolled back. |
There was a problem hiding this comment.
Is there a difference between calling this API, versus simply restarting without calling mark_app_valid_cancel_rollback()? If we can get the same behaviour without adding a new function here then it might be better to leave it out?
There was a problem hiding this comment.
Yes, there is a meaningful difference. When you simply restart without calling mark_app_valid_cancel_rollback(), the rollback only happens on the next boot (the bootloader detects the "pending_verify" state persisted across two boots and then changes it to "aborted") - so it requires two boot cycles.
In contrast, mark_app_invalid_rollback_and_reboot() performs an immediate rollback in one boot cycle - it actively marks the current partition as invalid in otadata and reboots to the previous working partition in one atomic operation.
- Post-validation failures: If the app already called
mark_app_valid_cancel_rollback()early in boot and later detects a critical issue (can't reach server, hardware fails under load, config corruption), passive rollback is no longer possible - the partition is already marked valid. - Atomicity: A manual approach risks power loss between operations, leaving undefined state. The ESP-IDF function writes otadata and reboots as a single coordinated operation.
|
Benefits of an TL;DR
|
Add OTA update support to the Partition class, including: - ESP-IDF OTA API wrappers. - app metadata inspection - rollback management Implemented new functions: * mark_app_invalid_rollback_and_reboot() * check_rollback_is_possible() * app_description() * app_state() returns integer, use APP_STATE_* constants * ota_begin() * ota_write() * ota_write_with_offset() for ESP-IDF version >= 4.2 * ota_end() * ota_abort() for ESP-IDF version >= 4.3 Added APP_STATE_* constants to Partition class: * APP_STATE_NEW * APP_STATE_PENDING_VERIFY * APP_STATE_VALID * APP_STATE_INVALID * APP_STATE_ABORTED * APP_STATE_UNDEFINED * create tests * update documentation
|
Hi @ekondayan, Thanks for getting back to me.
There's a function esp_image_verify() which we can call directly (it's what
The ESP-IDF OTA API is calling the same Before I keep answering these points, I'm sorry but I have to ask an important question: did you use generative AI (like ChatGPT) to respond to my review? I don't mean to be rude, but many of the answers have the "confident and mostly but not totally correct" style of something written by an LLM. I need to know that you're putting in time and effort into creating these comments, before I put more of my own time and effort into responding to them. |
Of course I use LLM. It would be weird if I don't. You are not rude at all to question this. I myself avoid investing effort and energy unless I see a reasonable and thinking AI or NI(Natural Intelligence) on the other end. I don't use LLM to generate my response, just to review and polish. The technical reasoning is mine.
You're right that I see where this leads - you want to save few bytes and rely on the developers to implement the correct sequence. My argument for that is: for OTA enabled devices, this is the one of the most important features, because if not implemented correctly, you can render the device useless and end its life prematurely. Imagine if you sell or deploy thousands of devices which can get bricked easily by a tiny human mistake. It depends on who you expect to use MicroPython. If you are targeting DIY hobbyists who primarily use it for watering their plants, you can take the risk and rely on the developers to implement the correct workflow manually, but if you want to ship commercial devices that sit in factories where downtime is unacceptable, you can not rely solely on devs to implement it correctly. I wouldn't trust even myself for that. It is too risky to deviate from the official battle tested workflow recommended by Espressiff. I can make this feature optional via a flag in #ifndef MICROPY_PY_ESP32_PARTITION_OTA_EXTENDED
#define MICROPY_PY_ESP32_PARTITION_OTA_EXTENDED (1)
#endifWhat do you think? TL;DR P.S. |
We agree about this point, and certainly I'd be happy to see robustness improvements to the esp32 OTA system in MicroPython (not least because I know for a fact it used in some commercial deployments now.) The question is about how to achieve them. As MicroPython maintainers we have to balance a number of factors in order to find the best outcome. It's not simple from our perspective.
This is true even if we wrap the OTA API from Espressif, and is why we have
This isn't a great solution unless we can enable it on most boards, because disabled-by-default features are hard to test and tend to bitrot. It'd also leave us in the situation of having an existing, documented, less robust OTA system (the current one) and a better one gated behind a compile flag. This itself is error-prone for developers. BTW, if the 1.6KiB static memory usage I noted is unavoidable then we would have to leave it disabled on most boards. Do you have any idea where that comes from?
Not sure I agree. If we changed the set_boot API call to be As far as I can see, the difference of "reading the IDF documentation carefully" comes down to manually erasing the region and then calling the partition write function which isn't particularly complex (and we could wrap this in a micropython-lib module to appear like a file object as well, if we had to). Am I missing something else? |
UPDATE: The test completed successfully on NodeMCU ESP32 by ai-thinker with ESP-IDF v4.0, v4.1, v4.2, v4.3
Implemented new functions in esp32.Partition:
mark_app_invalid_rollback_and_reboot()
check_rollback_is_possible()
app_description()
app_state()
ota_begin()
ota_write()
ota_write_with_offset() for ESP-IDF version >= 4.2
ota_end()
ota_abort() for ESP-IDF version >= 4.3
create tests
update documentation
For many commercial products, Over The Air updates are a very important and critical part. It must be reliable and should not brick the device. Writing a good OTA from scratch is a daunting task. For that reason the use of well tested and proven reliable libraries is much more preferable than the ones developed in the house.
USECASE
I'm developing an industrial device where the OTA is an essential part of it. Since only a few functions from esp-idf are
implemented (enough for hobby project but not enough for commercial project), I ended up duplicating the esp-idf ota functionality in python.
The result was a module with a questionable quality. I tried to predict all the possible places where it could crash, but my gut was
telling me that I could be missing something. So I decided to implement more of the the OTA functions from esp-idf and to use them in my OTA module.
I've rewritten my OTA module and replaced the redundant code with the implemented functions from esp-idf. This allowed me to reduced the size of the module significantly, increase the robustness of the code and on top of that now the code got much simpler and easier to maintain.
The total increase in size of the compiled app image is 2432 bytes.
IMPLEMENTATION
Extend the esp32.Partition class where all the OTA related functions are
prefixed with "ota_" and app related functions are prefixed with "app_".
Example:
from esp32 import Partition
app_part = Partition(Partition.RUNNING)
app_part.app_description()
app_part.app_state()
handle = app_part.ota_begin()
app_part.ota_end(handle)
New functions:
Partition.mark_app_invalid_rollback_and_reboot(cls)
Partition.check_rollback_is_possible(cls)
Partition.app_description(self)
Partition.app_state(self)
Partition.ota_begin(self, image_size = 0)
Partition.ota_write(self, handle_in, data_in)
Partition.ota_write_with_offset(self, handle_in, data_in, offset)
Partition.ota_end(self, handle_in)
Partition.ota_abort(self, handle_in) only for ESP-IDF version >= 4.3
BENEFITS
CONS