Reconstructing lost data from parity blocks
The FUOTA (firmware update over the air) mechanism was built to allow our client Kelvin to remotely and reliably update thousands of products already in use.
Kelvin’s smart radiators
Kelvin's mission is "decarbonizing the world's legacy buildings." The flagship product, the Cozy smart radiator cover, cuts heating-related energy use and emissions in legacy steam-heated buildings by 25% or more. The Cozy wraps an insulated enclosure, with sensors and a heat-distributing fan, around every steam radiator in a building. (These are multi-story apartment or commercial buildings with hundreds of rooms.) Data collected at each radiator is aggregated over a LoRaWAN network and used to optimize the operation of the building's boiler.
A given building or complex may have hundreds or thousands of individual Cozy smart radiators, each with a microcontroller programmed entirely in Rust and utilizing the LoRa-rs library. Kelvin has sponsored extensions to this library supporting its ability to collect data from and remotely update these devices. This includes adding Class C and Multicast support to the LoRa-rs library, and implementing the standalone Fuota-Fragmentation library.
Updating over the LoRaWAN network
LoRa is a very low data rate network. Typical packets are between eleven and fifty-three bytes of usable data, and are transmitted every twenty to thirty seconds during a firmware update. With a firmware image of up to 220 kilobytes, a firmware update is a lengthy process.
Like many wireless networks, the LoRaWAN network is lossy.. If a packet arrives, it is guaranteed to be correct, but not all packets will arrive. The most obvious way to deal with this is to have the client request the packets it has not yet received a second time, thus requiring singlecast operation. Obviously, this is not very efficient when you have a lot of clients to update.
Hence, the LoRaWAN over-the-air update mechanism uses a different tactic: In addition to the firmware itself, we also send additional parity data in the hopes that the device can receive enough of it to reconstruct the firmware. This resilience to lost packets enables multicast FUOTA updates, allowing the update to succeed without individual device-level interaction. Once the update group is established and those devices prepared for the update, the update can be performed "open loop", without checking device status throughout the download. Considering the number of microcontrollers ‘in the field’, this is a much more efficient way to ensure updates are distributed.
For this process, the firmware is split up into blocks. Each block of the firmware is sent separately to the devices, followed by a number of parity blocks each generated by taking the xorsum of some of the data blocks. If each data block is summed into about half of the parity blocks, it can be shown that only a few extra blocks of data need to be received to have enough data to reconstruct the firmware.
Reconstructing the firmware
This leaves the problem of reconstructing the data again from the parity blocks. Before diving into that problem however, it is really useful to realize that we can actually do calculations with a single bit in a very similar way to normal numbers, having addition, subtraction, multiplication and division.
Multiplication and division here are very straightforward, as multiplying combinations of 0 and 1 will always only give 0 or 1. Furthermore, division by 1 is trivial, and is the only division we need to consider as division by 0 is still not allowed.
Moving to addition and subtraction, let us look at the results of XOR and contrast it with regular binary addition:
A | B | XOR | + |
---|---|---|---|
0 | 0 | 0 | 0 |
0 | 1 | 1 | 1 |
1 | 0 | 1 | 1 |
1 | 1 | 0 | 10 |
Looking at this table, we see that xor is the same as addition modulo 2. It is a well known result from math that this form of addition behaves well, and combines in the usual ways with multiplication.
Finally, looking at the above table, we also immediately construct subtraction of 2 bits by switching the roles of the XOR and A columns, and it turns out that this gives exactly the same operation.
Combined, this idea of doing math on individual bits is also known as doing math in what mathematicians call the field GF(2). We will use it here both on single bits at a time, and with blocks of firmware. In the latter case, if we do operations on an entire block of firmware, we just do them on the individual bits.
We can use this to write equations relating the firmware to the data we received. Suppose we are receiving a firmware consisting of 4 blocks F1 through F4, and due to data loss, we have only received parity blocks P1= F1 xor F2 xor F3, P2 = F1 xor F2 xor F4, P3 = F1 xor F3 xor F4 and P4 = F2 xor F3 xor F4. Then using the above observation we can rewrite this more suggestively as:
P1 = 1 * F1 + 1 * F2 + 1 * F3 + 0 * F4,
P2 = 1 * F1 + 1 * F2 + 0 * F3 + 1 * F4,
P3 = 1 * F1 + 0 * F2 + 1 * F3 + 1 * F4,
P4 = 0 * F1 + 1 * F2 + 1 * F3 + 1 * F4.
This is a system of equations, and it can be solved by using linear algebra, using a process called gaussian elimination.
We start this process by rewriting the system in matrix form:
/P1\ /1 1 1 0\/F1\
|P2| |1 1 0 1||F2|
|P3| = |1 0 1 1||F3|
\P4/ \0 1 1 1/\F4/
We are going to manipulate the left-hand side and the matrix such that this equation will stay true in all steps, but progressively becomes simpler. First, we note that the first row of the matrix has a 1 in the first position. We can use this to eliminate all 1s in the first position in all other rows by adding the first row to all with a 1 there, and doing the same in the vector on the left hand side. This gives:
/P1 \ /1 1 1 0\/F1\
|P1 + P2| |0 0 1 1||F2|
|P1 + P3| = |0 1 0 1||F3|
\P4 / \0 1 1 1/\F4/
Next, we can reshuffle the rows such that the second row has a 1 in the second position:
/P1 \ /1 1 1 0\/F1\
|P1 + P3| |0 1 0 1||F2|
|P1 + P2| = |0 0 1 1||F3|
\P4 / \0 1 1 1/\F4/
And again add that row to the ones below it as needed to eliminate all 1s in the second position below the second row.
/P1 \ /1 1 1 0\/F1\
|P1 + P3 | |0 1 0 1||F2|
|P1 + P2 | = |0 0 1 1||F3|
\P1 + P3 + P4/ \0 0 1 0/\F4/
Repeating this process we can get a matrix where the bottom left part is all zeroes:
/P1 \ /1 1 1 0\/F1\
|P1 + P3 | |0 1 0 1||F2|
|P1 + P2 | = |0 0 1 1||F3|
\P2 + P3 + P4/ \0 0 0 1/\F4/
At this point, we can work from the bottom to the top to one-by-one extract the firmware data blocks. This results in a solution:
F4 = P2 + P3 + P4
F3 = P1 + P2 + F4
F2 = P2 + P3 + F4
F1 = P1 + F2 + F3
Optimization for practical situation
The process above is straightforward enough to implement in a computer as written down. In fact, if we do receive some of the actual firmware blocks, they can be included as "trivial" parity blocks of the form Pi = Fi in the above systems of equations. However, during the process we do need to keep track of the matrix as well as all the blocks that we have received. As a firmware image can consist of 1000s of blocks, this matrix may take non-trivial amounts of space to store.
Furthermore, if we are using flash for the storage, then received parity blocks need to be stored in a separate location from where we will eventually place the firmware, because we can only erase the parity blocks after we have used them to reconstruct the firmware.
To save on the storage space needed, there are several optimizations we can use. First of all, any blocks of firmware data we receive directly can immediately be stored in their final location in the firmware image. This saves us some of the scratch space needed to store received parity blocks.
There is another clever trick that can gain us some space in the matrix storage. If we first send all the non-parity blocks and only then start sending parity data, the firmware knows which firmware blocks it already has when it receives the first parity block. These no longer need to be solved for in the system of equations, so we can in effect entirely ignore them if we adjust the received parity data to subtract out any included firmware blocks we already have. This reduces the size of the matrix from NxN with N the number of blocks in the firmware, to MxM, where M is the number of firmware blocks we missed during initial transmission. If we miss say only 10 percent of those blocks, the resulting matrix will be a factor 100 smaller this way.
Finally, we can actually do the processing to eliminate all the 1s in the lower left triangle on reception. For this, we start with a matrix of all 0s, using the diagonal to indicate which parity blocks are already filled. When we receive a parity block, suppose it has its leftmost 1 in position x. We then look at the xth row, and if it also has a 1 at position x, use it to eliminate the leftmost 1 from the new matrix row, and we can repeat this process. If the xth row does not have a 1, this parity block can then be used to fill in that row.
Working out the example above again, we would build up the matrix as follows. We start with the trivial (but useless) equation:
/0\ /0 0 0 0\/F1\
|0| |0 0 0 0||F2|
|0| = |0 0 0 0||F3|
\0/ \0 0 0 0/\F4/
The first parity block P1 has matrix row (1 1 1 0) to start with. Its first 1 is in column 1, and since column 1 of row 1 is zero, we just immediately insert this into the equation there:
/P1\ /1 1 1 0\/F1\
|0 | |0 0 0 0||F2|
|0 | = |0 0 0 0||F3|
\0 /
\0 0 0 0/\F4/
The next parity block P2 has matrix row (1 1 0 1), which also has its first 1 in column 1. The first row is now in place, so we use it to eliminate the 1 giving a new parity block P1+P2 and new matrix row (0 0 1 1). The leftmost 1 is now in position 3, and row 3 still has a zero in position 3, so we insert:
/P1 \ /1 1 1 0\/F1\
|0 | |0 0 0 0||F2|
|P1 + P2| = |0 0 1 1||F3|
\0 / \0 0 0 0/\F4/
Now we receive parity block P3, which has matrix row (1 0 1 1). Again, we can eliminate the 1 in the first position, giving new block P1 + P3 with matrix row (0 1 0 1). The second row is still empty, so we insert this:
/P1 \ /1 1 1 0\/F1\
|P1 + P3| |0 1 0 1||F2|
|P1 + P2| = |0 0 1 1||F3|
\0 / \0 0 0 0/\F4/
Finally, we receive parity block P4, which has matrix row (0 1 1 1). We eliminate the 1 in position 2 giving P1 + P3 + P4, with matrix row (0 0 1 0). We can also eliminate the 1 in position 3, which gives P2 + P3 + P4 as the parity block, with matrix row (0 0 0 1):
/P1 \ /1 1 1 0\/F1\
|P1 + P3 | |0 1 0 1||F2|
|P1 + P2 | = |0 0 1 1||F3|
\P2 + P3 + P4/ \0 0 0 1/\F4/
The matrix is now fully filled in, and in the triangular form required, so solving can be done the same as earlier.
This process ensures the bottom-left part of the matrix is always 0. Since these values don't change, we don't actually have to store them, eliminating about half the storage space required for the matrix. Furthermore, it has the advantage that if we ever receive a parity block that provides us with no additional information, we know this before we try to store it as the matrix row will become all 0s in the elimination process, so there is also no need to store extra parity blocks to ensure we have enough information to reconstruct the firmware once we start doing that.
Practical implementation
We implemented this as part of firmware update logic for Kelvin, at https://github.com/Radiator-Labs/fuota-fragmentation-rs. Using the above optimizations, this resulted in a system that can receive a 256kB firmware transferred in chunks of 48 bytes and reconstruct it in 512kB of storage space on the flash, even when up to 30 percent of the packets are lost.
The implementation was tested in a real-world firmware update over the air with segment_size = 50, 3354 data segments, and 100% parity (3354 parity packets). In this testing, we successfully downloaded with r=45% (i.e. 45% of packets were intentionally dropped in this FUOTA test). At r=50%, we missed successfully recovering by only one packet.
It thus provides a reliable, relatively performant and space-efficient method for receiving firmware updates over unreliable channels.
Facing an embedded challenge?
We can help you deal with:
- low-power applications
- hardware restrictions
- Rust Embedded engineering