Putting seat calculations in Dutch election software to the (fuzz) test

Abacus
Abacus is currently still in development; it is not in use yet, and won't be until it's ready. Once it is, it is to be used by municipalities, specifically by municipal electoral committees. Volunteers at the municipalities will enter votes in Abacus' React front-end, and these votes will then be collected in the Rust back-end. Once all votes from all polling stations have been gathered at the central polling station, the seat apportionment (Dutch: zetelverdeling) can be computed to determine how many seats every political party receives.
Seat apportionment is complicated
To get an idea of how seat apportionment works in the Netherlands, let's take a look at a hypothetical election result in the fictitious Dutch city of Stuipendrecht. Let's say that their municipal council has 6 seats to allocate over 4 political parties. In total there were 938 votes cast, so the electoral quota (Dutch: kiesdeler) is 938/6 ≈ 156.333. By dividing the number of votes of each party by this quota, we can determine how many seats every party is supposed to get, which we round down to get the number of full seats:
Political party | Votes | Seats | Full seats |
---|---|---|---|
Altijd vd Partij | 490 | ~3.134 | 3 |
Lokaal Belang | 123 | ~0.787 | |
GSGR | 231 | ~1.478 | 1 |
Lijst Bommel | 94 | ~0.601 |
This is a good start, but now 2 seats remain unallocated. These seats are considered residual seats (restzetels), which are allocated using one of two algorithms depending on the total number of seats in the council:
- Smaller councils of less than 19 seats allocate residual seats according to the largest remainders
- Larger councils of 19 or more seats allocate residual seats according to the largest averages
Since the council of Stuipendrecht has 6 seats, we'll use the largest remainders to allocate the 2 remaining residual seats. This method comes down to looking at the remaining fractional part of the number of seats every party was supposed to get to decide who deserves the residual seats. In our case, Lokaal Belang and Lijst Bommel have the largest remainders (~0.787 and ~0.601 respectively). However, when using the largest remainder method, only parties which meet the electoral threshold (kiesdrempel) of 75% of a seat are eligible for a residual seat. Since the number of seats Lijst Bommel deserves does not meet this threshold, the last residual seat goes to GSGR (with a remainder of ~0.478) instead:
Political party | Votes | Seats | Full seats | Residual seats |
---|---|---|---|---|
Altijd vd Partij | 490 | ~3.134 | 3 | |
Lokaal Belang | 123 | ~0.787 | 1 | |
GSGR | 231 | ~1.478 | 1 | 1 |
Lijst Bommel | 94 | ~0.601 |
However, there is one more caveat: when a party has received an absolute majority of the votes, they should also receive a majority of the seats. So, the last residual seat actually needs to go to Altijd vd Partij instead of GSGR to satisfy the absolute majority requirement. This brings the final seat allocation to:
Political party | Votes | Final seats |
---|---|---|
Altijd vd Partij | 490 | 4 |
Lokaal Belang | 123 | 1 |
GSGR | 231 | 1 |
Lijst Bommel | 94 |
The full details of the seat apportionment algorithm used in the Netherlands is specified in the Election Law (Dutch: Kieswet). Implementing this legal text is not trivial, as various edge cases could lead to subtle ambiguities. Since people generally don't trust computers, especially for important systems on which our entire democracy relies, the manual process on paper always remains leading and Abacus is mainly used for verification. Still, it would be good to verify that Abacus does this computation correctly.
So, how do we verify that the seat allocation algorithm in Abacus is correct?
Unit testing
A first logical step for verifying any algorithm is to write unit tests. Naturally, Abacus already contains unit tests for the seat allocation algorithm. Here is one of their unit tests:
#[test]
fn test_seat_allocation_less_than_19_seats_without_residual_seats() {
let totals = get_election_summary(vec![480, 160, 160, 160, 80, 80, 80]);
let result = seat_allocation(15, &totals).unwrap();
assert_eq!(result.steps.len(), 0);
let total_seats = result.get_total_seats();
assert_eq!(total_seats, vec![6, 2, 2, 2, 1, 1, 1]);
}
This test verifies that a particular set of votes leads to a particular seat allocation. In this case, the apportionment doesn't need to allocate any residual seats. Various other test cases exist for apportionments that do use residual seats, situations that need absolute majority correction, and other edge cases, both for councils with less than 19 seats and for councils with 19 or more seats.
These unit tests provide a good basis for ensuring the algorithm works correctly. Using code coverage analyzers we can even ensure that all branches of the algorithm are tested at least once. However, it doesn't necessarily guarantee that it works correctly for all possible inputs, as we only test a limited number of specific scenarios.
Model-based verification
Since there are many possible combinations of votes per party and council sizes it is completely infeasible to test all possible cases with unit tests alone. Instead of writing specific test cases, it would be ideal if we could somehow prove that the algorithm as a whole is correct.
As we don't have a reference implementation for the seat apportionment algorithm, our initial plan was to use model checking frameworks for Rust, like Kani or Creusot, to verify properties of the seat apportionment algorithm. However, Kani seemed to have trouble verifying complex properties, and Creusot seems to require lots of complex annotations, which would get in the way of Abacus' ongoing development.
Instead, we can use external model-based tools to verify the specific parts of Abacus that we are interested in. For example, the seat apportionment algorithm uses a custom fraction data structure to do the computations. We can model this fraction data structure and its associated functions in the verification framework Why3 to verify that the fractions work as expected.
For these fractions, we have an obvious reference to compare their behavior to, namely the Why3's model of real numbers. This allows us to verify that the implementations of operations like addition, multiplication, and equality correspond to the mathematical "gold standard".
For example, here is our Why3 model of how Abacus implements equality for their fractions:
predicate (===) (x: fraction) (y: fraction)
= x.numerator * y.denominator = x.denominator * y.numerator
Assuming that integer overflows are not a problem (Abacus is configured to always catch integer overflows), this should make sense. We can verify that this equality is correct by checking if it produces the same result as the built-in equality on real numbers in Why3:
goal eq_eqs:
forall x y: fraction. x === y <-> eval x = eval y
Here, the eval
function takes a fraction and turns it into a real number by dividing the numerator by the denominator. However, this goal can only be proven under the condition that the fraction's denominators are not zero. Using the previously shown method of computing equality, a fraction of 0/0 would be equal to any other fraction, which is clearly not something we want. In our full proof, we have annotated the fraction type with an invariant to make sure the denominator is larger than 0 to allow the goals to be proven successfully.
In Abacus, you would have the same problem if you could create fractions with the denominator being 0. While Abacus does not produce such mischievous fractions, it's better to prohibit this possibility altogether. We therefore submitted a pull request to Abacus that ensures such fractions can never be created, which was promptly merged.
A drawback of using external tools is that we cannot guarantee that our model is equivalent to the Rust code we are trying to verify. For smaller pieces of code, like the fraction implementation, this can be verified manually. However, to verify the seat allocation algorithm as a whole, we are better off trying a different approach.
Property-based fuzzing
Fuzzers are a very powerful way to test software. They generate random inputs to try to make a program crash. In Rust, we can use cargo fuzz
, which makes it easy to define and run fuzz targets for Rust projects. It uses libFuzzer
internally to do coverage-guided fuzzing, which means the fuzzer uses code coverage metrics to look for new interesting inputs by mutating other previously found interesting inputs.
We can use fuzzers to verify properties of a function by intentionally letting the program crash if the desired property does not hold (using e.g. assert!
statements). This means that the fuzzer will try to find inputs with unsatisfactory outputs, as these will crash the fuzz targets. Unlike model-based verification, this cannot 100% guarantee that a property always holds for all possible inputs. However, if the fuzzer hasn't found any problems after letting it run for while, we can at least be fairly sure the property holds.
Which properties to test?
Over the year many mathematicians have discussed what fair apportionment should look like. A first basic property that should hold is that a party which received x% of the votes should receive at least x% of the seats in the council, rounded down to the nearest integer. This lower bound should hold as it is equivalent to the number of full seats every party deserves.
Another property we can test is balancedness. Balancedness says that if two parties received the same number of votes, the number of seats they receive should differ by at most 1. The number of seats they receive is not necessarily equal because a last residual seat may need to be randomly allocated to one of these two parties in certain scenarios. We can make this balancedness property slightly more general by saying that a party that received more votes than another party should have at least the same number of seats as that other party. This more general property is known as concordance.
Other properties we can verify include anonymity, which says the order of the parties on the ballot paper should not affect the outcome, and monotonicity, which says the outcome should scale predictably when increasing the number of available seats or the number of votes. You can find these among the fuzz targets in our fork of Abacus.
Note that under certain edge cases, these properties might not necessarily hold. If a party is allocated more seats than it has candidates, then those extra seats are reallocated to another party instead. This is known as list exhaustion (Dutch: lijstuitputting), and this could for example cause concordance not to hold. When we did our first fuzz tests, Abacus did not yet take list exhaustion into account during the allocations, so this wasn't a problem. When such an edge case does become a problem, the fuzzer should simply skip the property tests when an edge case occurs for which the property is known not to necessarily hold.
The fuzzer found a problem
Now that we know what properties to test, let's run the fuzzer!
When we ran the fuzz target that checks for the lower bound, balancedness, and concordance properties, it quickly found an issue:
thread '<unnamed>' panicked at fuzz_targets/apportionment.rs:70:25:
[Concordance] more votes -> >= seats (511 vs 0 votes, 0 vs 1 seats),
Votes: [19858427, 0, 50397184, 511, 16777727, 0, 0, 0, 0, 65791, 65787, 0, 0, 0,...
Seats: [34, 0, 87, 0, 29, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 30, 0, 0, 1, 0, 0, 0,...
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
==67327== ERROR: libFuzzer: deadly signal
#0 0x62bba3c5fdc1 (/mnt/Projects/abacus/backend/fuzz/target/x86_64-unknown-...
#1 0x62bba3cd859d (/mnt/Projects/abacus/backend/fuzz/target/x86_64-unknown-...
#2 0x62bba3cc6b59 (/mnt/Projects/abacus/backend/fuzz/target/x86_64-unknown-...
It has found a case for which concordance doesn't seem to hold. A party with 511 votes got 0 seats, while a party with 0 votes somehow got 1 seat. This certainly doesn't seem right!
The first thing to do once the fuzzer finds something is to verify whether the behavior is actually a bug in the system you are testing and not a bug in your own fuzzing logic. So, to start our investigation into what is going on we wrote a unit test similar to the one we saw before to see if the correct number of seats are allocated for the case found by the fuzzer:
#[test]
fn my_test() {
let totals = get_election_summary(vec![19858427, 0, 50397184, 511, 16777727, ...
let result = seat_allocation(18, &totals).unwrap();
let total_seats_allocated = result.final_standing.iter()
.map(|p| p.total_seats).sum::<u64>();
assert_eq!(18, total_seats_allocated);
}
Surprisingly, this test fails! Somehow Abacus has allocated 24 seats, despite there only being 18 seats to be distributed.
How did this happen? Well, after looking through the Abacus source code, we traced the problem back to some code in the get_election_summary
function:
let mut political_group_votes: Vec<PoliticalGroupVotes> = vec![];
for (index, votes) in pg_votes.iter().enumerate() {
political_group_votes.push(PoliticalGroupVotes::from_test_data_auto(
(index + 1) as u8,
*votes,
&[],
))
}
Here, the party numbers (which are used to identify parties) are cast to u8
using as
. The test case generated by the fuzzer happens to have more than 255 parties, which means the party numbers overflow. This means that the 256th party will get party number 1, which is also used by the first party. The residual seats happen to be awarded based on the party numbers, so when party #256 deserves a residual seat, party #1 will also get a residual seat because it has the "same" party number.
Fortunately, this bug can only be triggered in the test code, as the get_election_summary
function is only used for unit tests (although we also reused it for our fuzzer). If you actually try to run an election with more than 255 parties, the database in the back-end will give an error that the party numbers don't fit in the database.
While it is unlikely that we will have more than 255 political parties in the Netherlands, it feels like an unnecessary limitation to limit the party number to u8
. We created a pull request to change the party number to u32
instead to make sure this will never be an issue. Additionally, we would like to disallow using unchecked integer truncation to prevent these kinds of bugs from occurring. We've created a separate pull request to enable the Clippy lint cast_possible_truncation
, which means the CI will now check to make sure integers are not truncated using as
.
Differential fuzzing
While property-based fuzzing should catch most bugs that cause clearly unfair apportionment, it won't be able to catch all possible bugs. For example, the properties we test do not describe how exactly the residual seats are supposed to be allocated. For this reason, it would still be nice to compare Abacus to a reference implementation. So, we made our own implementation of a subset of the Election Law (Kieswet) in Rust.
We tried to keep our implementation as clean and simple as possible. Since we do not need to do all the additional bookkeeping that Abacus needs to do, our implementation is much smaller compared to Abacus. Still, we cannot guarantee that our implementation is correct.
To give us some confidence that our implementation is correct, we verified it against historical election data. We verified all elections from 2017 onwards (except for the Eerste Kamer elections), as well as Tweede Kamer elections between 1918 and 1972. (Elections between 1972 and 2017 used 'list combinations' (Dutch: lijstcombinaties), which allowed parties to combine their candidate lists with other parties, which our implementation doesn't support, so we did not use this data.)
Note that while being able to verify the algorithm with historical data is great, it still only tests a limited number of scenarios. Edge cases such as list exhaustion and absolute majority correction are quite rare. For example, list exhaustion and absolute majority correction each only occurred once during the elections in our data set.
In addition to the Rust implementation, we also made a Python implementation of the apportionment algorithm. This Python implementation is integrated in our fuzzer using PyO3 (which we previously discussed in our interop guide, although here we call Python from Rust instead of Rust from Python). Python has the benefit that it has built-in support for fractions, and integers in Python do not overflow. Additionally, because more people are familiar with Python, this implementation might be more approachable for those unfamiliar with Rust. This makes it easier for anyone with some programming experience to verify for themselves that this implementation makes sense.
Other than the list exhaustion and absolute majority correction, which Abacus did not yet support when we started this project, our implementations seem to match Abacus according to our fuzzer, as it did not find any discrepancies.
What's next?
Because Abacus is still in active development, the code base is constantly changing as new features are being developed. Since we started writing this blog post, Abacus has now implemented absolute majority correction and list exhaustion. As more special edge cases of the Election Law are being implemented, there is a good chance of more bugs getting introduced due to the increased complexity.
For instance, when list exhaustion was added, this led to the discovery of several more bugs. One bug was found because it violated the anonymity property due to a programming error. Another bug was related to details in the rest seat apportionment in situations where there is an absolute majority, which was found using differential fuzzing. Eventually, we started to find some edge cases that were so esoteric that the Electoral Council's legal experts had to be consulted on what the intended behavior is. These edge cases will almost certainly never happen in practice (for example, what happens if no one votes in an election?), but the fact that our fuzzer finds these edge cases does show the effectiveness of using fuzzers to discover unexpected behavior.
We plan to keep updating and running the fuzzer every once in while to see if any new bugs have popped up.
Conclusion
In conclusion, property-based and differential fuzzing can be great methods to verify the correctness of software. Unlike formal verification frameworks, fuzzing can't fully guarantee that the software does not contain any bugs. However, fuzzing is in our opinion much easier and quicker to set up and use compared to the alternative, more formal methods, while still being able to discover any potential problems in the software.
During this project, we did a number of contributions to Abacus. We prevented fractions from having 0 as a denominator, we turned on warnings for integer truncation, and we changed party numbers to be 32-bit integers instead of 8-bit integers. Additionally, we found and reported several more bugs using our fuzzer after our initial fuzz tests. Our fuzzer for the seat apportionment algorithm is available in our fork of Abacus along with our Python implementation.
We would like to thank the Electoral Council for their enthusiastic response to our findings. We think it's awesome that software which is so important for our democracy is open source and allows anyone to contribute to it. Hopefully we'll get to see Abacus in action soon!