Transparency - Jakke Mäkelän kotisivu

Analysis of the 2012 municipal elections in Finland I, start up

Couple of weeks ago I decided to learn Python. Mostly because I no longer have access to a Matlab license and the price of Matlab is kind of off putting. Additionally as of late I have gravitated toward free software both in the sense of no money involved and in the sense that the source code can be used as one wishes. For example Google’s cloud services and Libre Office. I have no clear answer to why Python. One reason is that someone said its fairly easy to learn if you already know Matlab.

In the vaalit.fi service it is possible to download the results of the 2012 municipal elections from this page. Descriptions for the csv-files can be found at the top of the page under instructions. Since the election are still fresh in my memory I thought I’d dig into that data as an exercise and use Python to make any tools I would need.

The file containing the results for the whole country is quite large, about 400 Mbytes. When loaded to the memory of my laptop it took about 3 Gbytes, which slowed things down dramatically. There are many ways to solve this, I decided to install a MySQL server and move the data to a database and then query whichever data I need. Although the data still would not fit to memory, it is much faster to find things when it is not necessary to go through the whole file. MySQL is also free if you don’t need to use a consultant.

It took about a week to setup everything and learn enough Python to be able to query the database I created. Although it was probably only about three days of actual work. The experience was not bad, there were some problems finding the right modules for Python, those that would enable all the calculations and drawing the figures I might want.

Infact finding the modules was not that difficult, but finding the correct ones for my operating system processor combo took some time. In the end I think I installed a version meant for AMD-processors although this laptop has an Intel one. Seems to work in any case. It was also a bit of a conundrum to select between Python 2.x and 3.x, they are not completely compatible and I couldn’t tell if the community will change to the new version or not. The modules I’d likely need were however available for 3.x so I selected that one. For me the risk should be a small one as I intend to to write scripts and not software that needs to be maintained.

I used MySQL Workbench to create the database. It was not that much work to learn the parts I needed. A small problem appeared when MySQL-connector wouldn’t work with Python version 3.3, so I had to move to 3.2. After this the biggest stumbling block was to get the connection to the database working even when it is on the same computer. Workbench seems to use something called “named pipes” but this wouldn’t work with the MySQL Python connector. After some trial I opened a hole in the firewall for the port used by MySQL and managed to find the correct initialization file where I could tell the server to listen to localhost and only it. Not quite sure what in the end made it work but the searches started to return results. Hopefully my laptop isn’t wide open now.

18 Dec 2012 is a zip file with source code and some sql for creating and quering the database. Everything in it can be freely used, modified and shared.

Below a couple of figures extracted from the data and a little bit of speculation on what can be seen. I’m too lazy to remake the figures with english texts.

Figure 1. Size of voting areas. Size differences are quite large: there are about 20 areas with around 100 to 150 voters (eligible to vote in Finland) and a couple that are a lot larger.

The title for the largest voting area goes with 15971 voters to the aptly named area “Ä-alue 090A”, based on the municipality number it is somewhere in Helsinki.

The smallest with under a hundred voters are Markby (88), Norrby (93) and Korsbäck (95), which are in the municipalities of Uusikaarlepyy, Kruunupyy and Korsnäs. First I thought that these areas would be in islands, but they are not. There are other small areas in these same municipalities. In Sipoo there is also an area called “saaret” (islands) which doesn’t have any eligible voters, perhaps it is not in use.

From the figure it is clear that the distribution has two peaks, but I’m not quite sure what is the reason behind this. I first thought that it was due to geography, if the distance to a polling station is too long it is going to have an effect on turnout, but in case of the smallest voting places this doesn’t seem to be the case.
Figure 2. Size of voting areas and turnout vary. For example little over 3000 votes were cast in one area. Abscissa is the number of votes given on an area, ordinate is the count of areas.

The distribution of votes in figure 2 shows the same double peak that was seen in figure 1. In ten areas the number of votes was less than 50, smallest activity (22) was seen in Aska area of Puolanka municipality, Paloniemi area in Kuhmo was clearly more active with 36 votes. Is voting secrecy adequately preserved when the number of voters is so small? At least they should shake the box before opening it.

Figure 3. Histogram of ratio of voters to eligible voters for each voting area. If the abscissa is multiplied by 100 it gives percents. in more detail: for each voting area the number of voters on election day was divided by the number of those eligible to vote in Finland, those who voted before the actual election day are not included.

Figure 3 shows a histogram of election day turnouts for different voting areas. In some cases the turnout was dismal. For example in Aleksanterin koulu area, city of Tampere, 78 voters showed up, the ratio of voters to eligible voters was 0.02. Perhaps this is some sort of special area?

Small areas show up on the llist of most active areas including to areas named Korsbäck. The relatively most active are however is the Ala-Ähtävä area in the Pedersöre municipality. There out of 1032 eligible voters 797 turned out on election day.

Pollution week 5: Summary

“So we will continue to plod on. In fact, we will be expanding this activity to a new website soon.After all, what’s the alternative? Maybe if we all close our eyes and ears, all the bad things will go away?”

Written by: Jakke Mäkelä, Timo Tokkonen, and Niko Porjo.

The postings this week have, we think, given an overview of what a project like Troglodyte could hope to achieve against entities like Intellectual Ventures. Not much, but even a tiny bit helps. Especially posting four might give ideas on countermeasures against the worst of the trolls.

The possibilities are quite limited; on the other hand, being prepared is infinitely better than being unprepared.

Continue reading Pollution week 5: Summary

Net voyeurs: a national resource?

If only we could utilize Internet voyeurs properly, how much could we get done?

[Finnish version: click here].

Recent tragedies in Finland have shown a gap between official communications and what is available on the Internet. Official communications are terse and protect the privacy of the people involved. The mainstream media, for the most part, does not publish the names of victims. However, any and all information can be found on the Internet. There are forums for everything, in good taste and bad. We haven’t yet gotten to the stage where crime scene photos are circulated, but that day may come. Petteri Järvinen summarizes this well in his blog (my translation).

“The Finnish police communicate very little about accidents and their victims. Names are withheld, based on privacy arguments or “tactical reasons”. The principle is good, but is it valid anymore in the Internet age? Net detectives can sometimes know more even than the police … Voyeurism is improper and insensitive, but it is an unavoidable consequence of the information society”.

That is true. Where transparency, there voyeurism. And especially now, with the recession, Finland is filled with thousands of people who have nothing better to do than sit at a computer. Net detectives have competence and time, and everything that can be found will be found.

The old-fashioned high-quality media does seem lost. Names are only published when everyone knows them already. Details are omitted, even when everyone knows them. Old-fashioned.

So? Why are professionalism and ethics a bad thing? Let it be old-fashioned. It is no one’s loss if the real media reports with professionalism and respect, and lets others dig in the dirt. After all, if all information is available elsewhere, then everyone can find a source that suits his mental level.

In fact there is no need for the media to lower itself, because the “bottom” is already raising itself. I admit (with shame) that I have followed (with interest) on the Internet as people have filled in the puzzles of these tragedies. The motives of these detectives are fuzzy at best, but there is one uncomfortable fact about them: they are good.

Not good journalists, but good intelligence operatives, as it were. Which begs the question: since the net detectives have the time and resources to find out things that the police cannot find, what could they achieve if they turned their energies to something socially useful?

Here is a concrete example whose details I have fuzzified (the exact information can be found on the Internet).

A Finnish city wants to expand its municipal waste landfill. The operator has tried to use a “light” approval process rather than the “heavy” one needed for all projects with a major environmental impact. The decision-makers did wake up, and are requiring the heavier process. Much of the information is secret, but public documents and information on the Internet can be used to piece together a rough picture.

The amount of waste is planned to increase by a factor of five. The heavy process is needed if certain thresholds are exceeded; perhaps by coincidence, all projected values are exactly below these thresholds. The application cites a change in the municipal waste strategy. This strategy, however, is not yet public (which only becomes apparent by searching through multiple sources).

Nothing illegal has happened, this may not even be in the grey zone, but it should still raise some alarm bells. In fact it has, and the situation in now being monitored (on the Internet). With near-zero resources, but monitored nonetheless.

There are hundreds or thousands of such cases. If even a fraction of the best Internet voyeurs put their energy into these issues, what would happen? A lot, I would claim. I am not an optimist, and I do not expect to see anything happen, but there is a lot of potential.

More on open monitoring: here.

Data transparency in shipping safety: good or bad idea?

Radical transparency is an intriguing school of thought, with the philosophy that the best society is a transparent society. In other words, all data that can be opened should be opened. I find such transparency an interesting concept, and in many cases probably worth aiming for. The key question is: what is a realistic environment in which to begin experimenting with it? I focus here on one tightly restricted area: data transparency in shipping safety. [Finnish version: Click here]

For a slightly perspective on this issue by Niko Porjo, see here.

At the moment, international standards require large ships to transmit AIS information. At minimum, this information contains, in standardized format, the ship’s identity, location, speed, and bearing. The AIS information is transmitted in the clear and its purpose is to help ships maintain positional awareness of other traffic. Internet distribution of the data originally raised some controversy, but in practice the controversy is over: the AIS information is public.

It is quite sensible to ask a further question: should even more information from the ships be openly available? There are good reasons to ask this question; above all, in an emergency it would make the passengers active participants rather than passive subjects. It would also help to show up poor safety practices that would remain invisible in a closed environment. The technical problem can be stated quite simply: should the information currently collected by the black box be available and public (although not necessarily in real time)? More radically, it is technically feasible to make all the information that is available on the bridge available to the public. Should it be made available?

Unfortunately, I tend to arrive at a pessimistic outcome for this specific case. Openness would benefit the overall system. Unfortunately, it would not benefit any of the individual players, at least in the beginning stages. The problem with transparency in this particular area is that the first adopter ends up taking most of the risk. Although radical transparency is a good concept to aim for, shipping security does not seem like a reasonable platform in which to start experimenting with it.

The authorities cannot be bossed around

In practice, security is defined and enforced by national or international authorities. In a democratic system, it is in principle possible to force the authorities to make good decisions. Unfortunately, in a democratic system this is also painfully difficult in practice. Authorities are dependent on what legislators decide. Legislation in turn is a slow process, undergoing massive lobbying from established interestes, and requiring a significant push from citizens. Based on the lukewarm reception these issues are getting, it does not seem that there is any real political push in this direction.

Laws and directives change most rapidly through major accidents, which lead to security recommendations. Even then, the new directives may or may not be followed adequately, especially if they require significant amounts of money. Waiting for the authorities to act requires patience and (unfortunately) often new accidents. This path does work, but is not likely to lead to rapid or radical solutions.

Anonymization does not work

In order to balance between data transparency and personal privacy, security-related information should be anonymized. Unfortunately, this does not work in the Internet age, where all information (whether correct or not) will be on Twitter within minutes of an accident. The most tragic failure of anonymization is the Überlingen air accident of 2002, in which two aircraft collided. The investigation report concluded that it was a system-wide problem, and no single individual was to blame. Nevertheless, a man who lost his family in the accident blamed the air traffic controller, found out his identity and home address, and murdered him.

The Überlingen case is extreme, but in an open system there is no automatic mechanism to protect those initially blamed for the accident. It is a serious scenario is that in any accident, the people potentially responsible will be identified immediately, they will be blamed by the media, their personal information will be found immediately, and Internet mobbing could start immediately. The risk may look small now, but already cyber-bullying in South Korea shows that a risk exists. How many people would be willing to work under such circumstances?

Data without metadata is nothing

The technical problems are considerable. The AIS parameters are standardized tightly and are easily understandable. If more generic information is to be transmitted, then its interpretation becomes problematic. Raw data is just rows of numbers; processing, interpretation, and displaying are what make it into information. Someone must do this, must be paid to do it, and must be responsible for quality control.

Some parameters will be considered trade secrets by the shipping companies (or at least in a gray area). Realistically speaking, any shipping company will either not want to do such an analysis, or will want to keep the results secret. It is certainly possible to force a company to make the raw data available. Without extra incentives, it is barely realistic to expect the company to make the data available in a form which could be easily utilized by competitors.

Transparency benefits the unscrupulous

Transparency is an equalizing safety factor when all parties have the same information on all parties. If one party stops sharing information, it creates a business advantage for itself (even more so if it begins to distort it). No idealism can change this fact; surveillance and enforcement are needed. The enforcement needs to be global. It can be argued that for technologies such as nuclear energy such a global enforcement system already exists; that is true, but nuclear energy was born in completely different historical circumstances than shipping, and was in fat able to start from a clean table.

Open real-time information also makes piracy easier. More information means more opportunities to plan attacks. Merchant ships near the coast of Somalia will certainly not be willing to participate in experiments in radical transparency.

Terrorism is invoked too easily, but it cannot be ignored. Any transparency model must accept the brutal truth that there are destructive entities. The sinking of a large passenger ship might not even be the worst-case scenario; societies can recover from large losses of life very rapidly, even though the scars are horrible. A more worrisome scenario might be an Exxon Valdez-type massive oil leak event next to a nuclear power plant.

What can we do?

Many people reflexively oppose this type of radical transparency, whether with good reason or by knee-jerk reflex. How could they be motivated to at least try? Even if calculations clearly show that transparency is useful for the whole system in the long run, people are irrational and think in the short run. Given that the early adopters take a risk, how would this risk be compensated to them? Shipping has a long history and legacy practices which are difficult to overcome. Radical transparency is something that absolutely should be tested in a suitable environment. However, I am forced to conclude that shipping safety is simply not a sensible environment in which to start.

“Transparency”: Weathercaching

One of the first projects done partly with the Zygomatica team is a prime example of what we are calling “transparency” projects. The idea combines geocaching and open weather data to create what we called “weathercaching“.

The concept is simple indeed: Let’s enhance geocaching so that you get more points for finding a cache in really horrible weather.

There are two catches that made this a very difficult project indeed:

1) How do you actually define “horrible weather”? We wanted something that would be global, unambiguous, and based on valid physiology and meteorology. We further wanted to define this “horribleness” as a single weather factor (W) between 0.5 and 5, analogous to the difficulty/terrain (D/T) points in normal geocaching.

2) How can you measure the “horribleness” of the weather automatically? Having the user measure the weather at the cache location would sound like a “trivial” solution, but this was felt to be a cop-out; the technological question only becomes interesting if open weather data are used.

Full analysis: A full analysis is on the project page. (Note that the text is rather academical and dry).

Summary: There are “weather corridors” along Finnish highways which have enough weather stations to allow sufficiently accurate monitoring of the weather (see map below, adapted from www.geocaching.com). Even more importantly, the majority of caches are within these corridors. Problem 2 is therefore technically solvable. However, we could not find a reasonable solution for problem 1. Meteorology has good parameters to define hazardous weather; it does not have any tools to define miserable weather. Also, there is no unambiguous way to determine W from the weather data; misery is culturally defined.

Conclusion: The technology and data sources exist. The specific application itself is, however, not worth implementing, and no demonstrator was made. Other uses for the weather data might well be possible.

Team: Jakke Mäkelä, Pertti Sundquist, Gavin Treadgold, Kalle Pietilä, Niko Porjo.

Map adapted from www.geocaching.com

Tag: Transparency

Analysis of the 2012 municipal elections in Finland I, start up

Like this:

Pollution week 5: Summary

Like this:

Net voyeurs: a national resource?

Like this:

Data transparency in shipping safety: good or bad idea?

Anonymization does not work

Like this:

“Transparency”: Weathercaching

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Anonymization does not work

Share this:

Like this:

Share this:

Like this: