Accident information comparisons

 

Safety and accident related information is available at the websites of airlines, but you will find more of it if you use Google. There were also a couple of interesting peculiarities in the data, where it seems possible that the airline was hiding information. See also Part 1, this post is part of our “Is aviation safety a shameful thing?” project.

In this part two I will compare the number of safety/accident related links that were found by the airline’s own search to the number found by Google when it was limited to the website in question. Both link counts are also analyzed against against information about the airlines and their home countries. The intention is to find out how open airlines are with this information. Absolute numbers show how much information is available, relative numbers show how well it can be found with the search provided by the airline and might give a hint about how desirable it is to the airline to show that information. Comparison with other data might reveal factors that are common to airlines with high or low number of links.

I searched through 46 airlines. Figure 1 shows the raw link counts. The x-axis shows how many links were found and the y-axis shows the number of airlines that had that count. The large blue and orange bars at x=0 show that for many airlines the homepage was a poor choice for finding safety or accident info.

On the other hand Google is able to find information (yellow and green bars) on both subjects and in some cases quite a lot of it. It should be noted that I only counted to 10, if there were more links I ignored them. This document of the raw data shows in more detail which links were accepted to this data set.

Links to anything that the passengers would find out during the trip, such as pre-flight safety announcements, were rejected. Another category that was not accepted were links to insurance terms and conditions.The reason being that I am interested in what “extra” information is available at the website.

Figure 1. Number of links found for both searches and words.

I’d like to be a little cautious when making conclusions based on this data, mainly due to the low number of airlines, but also due to the data gathering process. Namely it was done by me alone without much help. In my experience this leads to a less rigorous result than a group effort. But one thing seems to be pretty certain: Google is better in finding this information than the search functions on the airline web sites.

This is true even if those 12 airlines that didn’t have a search are removed from the zero column. For the whole set, when the number of links found for one airline by one search is summed; Google finds more links in 39 cases while in only two cases the homepage search returns more results (Qantas 5 vs. 4 and Czech Airlines 7 vs. 5)

At least one of the airlines uses Google to power their search (US Airways). This offers an interesting comparison: US Airways homepage search found 3 safety and 1 accident related link, while the general google search found 1 safety and 7 accident related links.

While I was not logged in to my Google account, it is possible that Google had picked up on the fact that the same computer had been intensely searching for accident info for several days and used this knowledge to show what was most interesting to me.

A more sinister explanation is that the results by the search provided at the homepage have been filtered not to include what I was looking for. Searching the US airways site with the site’s search for “1549” gives (18 March 2012 ) one result about a general chronology of the airline and tells that some results have been omitted. If one includes those, four more links to the same chronology are included. It is still possible that this is a result of some more general decision not to include parts of the web site in the site search, but I’d say there is a good possibility that this is intentional.

In the case of Kenya Airways, Google search gave two links to the accident of KQ 507 but when I followed those links they gave a 404 (i.e page not found). This could be due to several reasons and need not be intentional. The accident was mentioned in an annual report.

Table 1. Mean and median number of links found Google for different sub populations

 

Table 1 shows the mean and median number of links found by google for different sub populations. “Whole set” includes all the airlines, while “Google and Homepage” includes only those cases where both searches were available and “Google only” includes only the cases where there was no homepage search.

In all cases there are more links related to safety than to accidents, but the difference is not massive. Results for the word “Safety” show no definite differences between the populations. For “Accident” airlines with their own search show more info. This difference could be explained if the airlines with no homepage search had had fewer accidents, but in only 3 cases out of 12 I couldn’t find a fatal accident in the history of the airline. Four out of the 12 airlines without homepage search function are low cost airlines which might have less expansive websites and therefore less information. This result is similar to what Jakke saw in his analysis of airline homepages.

I compared the link counts against a data set ( or here ) with info on

  • number of employees
  • number of yearly passengers
  • revenue
  • year the airline was founded
  • GDP (PPP) per capita of the airlines home country
  • global integrity report overall score of home country
  • corruption perception index
  • IATA membership
  • date of latest accident

It was difficult to find all the data for all the airlines so there are some gaps. The data is also unreferenced and from various sources. Some plots with very short description are available here. There is a modest correlation between the date of latest non fatal accident and total number of links found,  which just might be significant. There is also a modest correlation between the Global Integrity Report overall score and total number of links. But the plots show that in addition to the set being quite small there might be other data related difficulties that make this type of analysis less trustworthy.

Overall the small numbers in table 1 suggest that openness is not the approach chosen for these subjects. Further, there is accident related information at many airline websites but you might not find all of it with the search provided by the airline.

In the third part of this series I will attempt to rate the links and see if any info comes out of that

“Opacity”: Search engines / Hakukoneet

 

How to use small languages to study search engines //// Kuinka pieniä kieliä voisi hyödyntää hakukonetutkimuksessa?

[English text in normal font / suomenkielinen teksti vinofontilla]

Search engines is are very opaque; it is difficult to know what to is happening, how to study it, and how to interpret the results. Even in a field field filled with more professional researchers, we feel there are some niches to be explored. We have currently focused on Finnish, as it provides an interesting “laboratory”: a small language with a unique grammar in a high-tech and highly networked country. The amount of raw material is huge.

Hakukoneet ovat käytännössä läpinäkymättömiä: on vaikea tietää mitä tapahtuu, miksi, miten sitä pitäisi tutkia, ja miten tulkita. Alue on täynnä tutkimusta, mutta uskomme löytävämme pieniä erikoisalueita itsellemme. Keskitymme tässä vaiheessa suomen kieleen, koska Suomi on loistava “laboratorio”: pieni ja kummallinen kieli kehittyneessä ja verkostoituneessa maassa. Raakamateriaalia on valtavasti. 

#1/2012: “Onko Google ainoa käyttökelpoinen hakukone suomen kielellä?”[Is Google the only usable search engine in Finnish?]

Täysi raportti / full report (Finnish): Mäkelä et al- Suomalainen Bing_Google 2012- raportti

Haluamme tutkia, onko totta että “Google on ainoa käyttökelpoinen hakukone suomen kielellä”. Tilastojen valossa näin todella on; Googlen osuus Suomessa on noin 98%.  Tämä on käytännössä monopoli, ja sille on syytä etsiä syitä. Lausetta e voitu tutkia analyyttisesti, joten kysymyksenasettelu rajattiin seuraavasti: “Google on merkittävästi parempi hakukone kuin Bing suomen kielellä haettaessa”. Tutkimuksessa vertailtiin osumamääriä, jotka saatiin kun tiettyjä hakusanoja laitettiin Googlen ja Bingin suomalaisversioihin. Todettiin, että Bing palauttaa merkittävästi vähemmän tuloksia kuin Google, keskimäärin alle 10% Googlen osumista. Lisäksi vaikuttaa siltä, että Google reagoi nopeammin nouseviin uutisaiheisiin. Suomen kielen erikoispiirteistä löytyy ainakin kaksi ilmiötä, jotka vaikuttavat hakuihin. Google korvaa skandinaaviset kirjaimet (ä,ö) systemaattisesti yleisesti käytetyillä vastineilla (a,o). Bing sen sijaan ei toimi yhtä systemaattisesti, ja tältä osin voidaan sanoa että Bingin haku ei toimi ainakaan niin kuin on totuttu. Suomen kielessä tavalliset yhdyssanat tuottavat molemmille hakukoneille lieviä ongelmia.Tulokset eivät suoraan kerro mitään hakukoneiden laadusta. Osumien määrä on kuitenkin se subjektiivinen mittari, jota uskomme useimpien käyttävän  määrittelemään kuinka “hyvä” hakukone on. Tällä mittarilla Bing jääkin dramaattisesti jälkeen Googlesta. Lisäksi skandien käsittely toimii Googlessa johdonmukaisemmin. Vaatisi tarkempaa sisältöanalyysiä jotta voitaisiin arvioida onko Google “oikeasti” parempi hakukone; näiden tulosten perusteella on kuitenkin helppo ymmärtää, miksi yleisö näin ajattelee.Googlen osuus maailmanlaajuisesti on noin 90%. Muutamaa poikkeusta lukuunottamatta se on kaikissa Euroopan maissa yli 90%, usein yli 96%. (Vertailun vuoksi USA:ssa osuus on 80%, Venäjällä n 60%, Kiinassa n 30%). Vastaava tutkimus olisi siis hyödyllistä tehdä myös muilla pienillä kielillä.

English summary: We studied the statement “Google is the only feasible search engine for searches in Finnish”. The claim is supported by the 98% market share Google has in Finland. To analyze the question, we studied results from searches made in Finnish by Google and Bing (which with Yahoo the only credible alternative). We found that in terms of number of hits, Google is crushingly dominant, with Bing finding typically less than 10% of the results. Bing seems especially “slow” in finding trending news, which is a serious drawback for a search engine. It is apparent that Google is reasonably well optimized for some quirks of the Finnish language, while Bing is not. The clearest difference is in the processing of Scandinavian characters (ä,ö), where Bing’s performance is unpredictable. Both search engines have some problems with another Finnish quirk, compound words, but neither is clearly superior. Other potential differences were found relating to the agglutinative character of Finnish grammar, but this could not be studied systematically so far. This study did not analyze the “true” quality of Bing vs Google searches at the content level. However, the statistical results alone are sufficient to explain why Bing is not generally considered a viable option in Finnish. Such dominance of a small language by a single search engine should be considered a national concern. The situation is very similar for other small European and other languages, and it is recommended that similar studies be performed in other countries.

 

Translate »