Couple of weeks ago I decided to learn Python. Mostly because I no longer have access to a Matlab license and the price of Matlab is kind of off putting. Additionally as of late I have gravitated toward free software both in the sense of no money involved and in the sense that the source code can be used as one wishes. For example Google’s cloud services and Libre Office. I have no clear answer to why Python. One reason is that someone said its fairly easy to learn if you already know Matlab.
In the vaalit.fi service it is possible to download the results of the 2012 municipal elections from this page. Descriptions for the csv-files can be found at the top of the page under instructions. Since the election are still fresh in my memory I thought I’d dig into that data as an exercise and use Python to make any tools I would need.
The file containing the results for the whole country is quite large, about 400 Mbytes. When loaded to the memory of my laptop it took about 3 Gbytes, which slowed things down dramatically. There are many ways to solve this, I decided to install a MySQL server and move the data to a database and then query whichever data I need. Although the data still would not fit to memory, it is much faster to find things when it is not necessary to go through the whole file. MySQL is also free if you don’t need to use a consultant.
It took about a week to setup everything and learn enough Python to be able to query the database I created. Although it was probably only about three days of actual work. The experience was not bad, there were some problems finding the right modules for Python, those that would enable all the calculations and drawing the figures I might want.
Infact finding the modules was not that difficult, but finding the correct ones for my operating system processor combo took some time. In the end I think I installed a version meant for AMD-processors although this laptop has an Intel one. Seems to work in any case. It was also a bit of a conundrum to select between Python 2.x and 3.x, they are not completely compatible and I couldn’t tell if the community will change to the new version or not. The modules I’d likely need were however available for 3.x so I selected that one. For me the risk should be a small one as I intend to to write scripts and not software that needs to be maintained.
I used MySQL Workbench to create the database. It was not that much work to learn the parts I needed. A small problem appeared when MySQL-connector wouldn’t work with Python version 3.3, so I had to move to 3.2. After this the biggest stumbling block was to get the connection to the database working even when it is on the same computer. Workbench seems to use something called “named pipes” but this wouldn’t work with the MySQL Python connector. After some trial I opened a hole in the firewall for the port used by MySQL and managed to find the correct initialization file where I could tell the server to listen to localhost and only it. Not quite sure what in the end made it work but the searches started to return results. Hopefully my laptop isn’t wide open now.
18 Dec 2012 is a zip file with source code and some sql for creating and quering the database. Everything in it can be freely used, modified and shared.
Below a couple of figures extracted from the data and a little bit of speculation on what can be seen. I’m too lazy to remake the figures with english texts.
The title for the largest voting area goes with 15971 voters to the aptly named area “Ä-alue 090A”, based on the municipality number it is somewhere in Helsinki.
The smallest with under a hundred voters are Markby (88), Norrby (93) and Korsbäck (95), which are in the municipalities of Uusikaarlepyy, Kruunupyy and Korsnäs. First I thought that these areas would be in islands, but they are not. There are other small areas in these same municipalities. In Sipoo there is also an area called “saaret” (islands) which doesn’t have any eligible voters, perhaps it is not in use.
From the figure it is clear that the distribution has two peaks, but I’m not quite sure what is the reason behind this. I first thought that it was due to geography, if the distance to a polling station is too long it is going to have an effect on turnout, but in case of the smallest voting places this doesn’t seem to be the case.
Figure 2. Size of voting areas and turnout vary. For example little over 3000 votes were cast in one area. Abscissa is the number of votes given on an area, ordinate is the count of areas.
The distribution of votes in figure 2 shows the same double peak that was seen in figure 1. In ten areas the number of votes was less than 50, smallest activity (22) was seen in Aska area of Puolanka municipality, Paloniemi area in Kuhmo was clearly more active with 36 votes. Is voting secrecy adequately preserved when the number of voters is so small? At least they should shake the box before opening it.
Figure 3. Histogram of ratio of voters to eligible voters for each voting area. If the abscissa is multiplied by 100 it gives percents. in more detail: for each voting area the number of voters on election day was divided by the number of those eligible to vote in Finland, those who voted before the actual election day are not included.
Figure 3 shows a histogram of election day turnouts for different voting areas. In some cases the turnout was dismal. For example in Aleksanterin koulu area, city of Tampere, 78 voters showed up, the ratio of voters to eligible voters was 0.02. Perhaps this is some sort of special area?
Small areas show up on the llist of most active areas including to areas named Korsbäck. The relatively most active are however is the Ala-Ähtävä area in the Pedersöre municipality. There out of 1032 eligible voters 797 turned out on election day.