An old project from about 2012.
How to use small languages to study search engines / Millä tavoin pieniä kieliä voisi hyödyntää hakukonetutkimuksessa?
Search engines is are very opaque; it is difficult to know what to is happening, how to study it, and how to interpret the results. Even in a field field filled with more professional researchers, we feel there are some niches to be explored. We have currently focused on Finnish, as it provides an interesting “laboratory”: a small language with a unique grammar in a high-tech and highly networked country. The amount of raw material is huge.
Hakukoneet ovat käytännössä läpinäkymättömiä: on vaikea tietää mitä tapahtuu, miksi, miten sitä pitäisi tutkia, ja miten tulkita. Alue on täynnä tutkimusta, mutta uskomme löytävämme pieniä erikoisalueita itsellemme. Keskitymme tässä vaiheessa suomen kieleen, koska Suomi on loistava “laboratorio”: pieni ja kummallinen kieli kehittyneessä ja verkostoituneessa maassa. Raakamateriaalia on valtavasti.
Aliprojekti: “Onko Google ainoa käyttökelpoinen hakukone suomen kielellä?”[Is Google the only usable search engine in Finnish?]
Täysi raportti / full report (Finnish): Download the report (pdf)
English summary: We studied the statement “Google is the only feasible search engine for searches in Finnish”. The claim is supported by the 98% market share Google has in Finland. To analyze the question, we studied results from searches made in Finnish by Google and Bing (which with Yahoo the only credible alternative). We found that in terms of number of hits, Google is crushingly dominant, with Bing finding typically less than 10% of the results. Bing seems especially “slow” in finding trending news, which is a serious drawback for a search engine. It is apparent that Google is reasonably well optimized for some quirks of the Finnish language, while Bing is not. The clearest difference is in the processing of Scandinavian characters (ä,ö), where Bing’s performance is unpredictable. Both search engines have some problems with another Finnish quirk, compound words, but neither is clearly superior. Other potential differences were found relating to the agglutinative character of Finnish grammar, but this could not be studied systematically so far. This study did not analyze the “true” quality of Bing vs Google searches at the content level. However, the statistical results alone are sufficient to explain why Bing is not generally considered a viable option in Finnish. Such dominance of a small language by a single search engine should be considered a national concern. The situation is very similar for other small European and other languages, and it is recommended that similar studies be performed in other countries.