All posts by Jakke Mäkelä

Physicist, but not ideologically -- it's the methods that matter. Background: PhD in physics, four years in basic research, over a decade in industrial R&D. Interests: anything that can be twisted into numbers; hazards and warnings; invisible risks. Worries: Almost everything, but especially freedom of speech, Internet neutrality, humanitarian problems, IPR, environmental issues. Happiness: family, dry humor, and thinking about things.

Being rigorous at being what?

 

Can the Zygomatica blog serve any useful purpose, when we have made a conscious decision to not focus on anything for any significant periods of time?

I do not speak for the other members of Zygomatica, although they have peer-reviewed this posting (see below). This is my own question, and my own answer.

Most bloggers have no need for such a question. If writing a blog is something that comes naturally and causes real joy, then no question: just do it.  For me, writing does not come naturally and is not a real joy. It is difficult. The Zygomatica team has a policy of internally peer-reviewing every posting, meaning we are lucky to end any given day on speaking terms. What, then, is the point?

The point, as I see it, is to maintain and develop my skills of rigorous thinking. It would be exhilarating to create something that people actually enjoy reading, of course. But above all this is an exercise in self-discipline. But self-discipline to what purpose?

A famous paper  (Ericsson et al 1993) notes that it takes 10,000 hours of practice to achieve exceptional proficiency in any task, meaning 10 years of more or less obsessive practice started at childhood.  That is completely excessive though. I have no desire to operate at the “grandmaster level” of thinking. Collegiate level perhaps.

At slightly less obsessive levels, Cal Newport has argued that one should focus tightly on exceeding at one skill at a time, rather than diluting one’s focus. An interesting example is provided by an analysis of the skills of comedian Steve Martin:

“But when you study people like Martin, who really do live remarkable lives, you almost always encounter stretches of years and years dedicated to honing craft.”

Somewhat depressing. I do not have a passion or dedication for any particular craft. On the other hand: neither do I desire to lead a “remarkable life”. Feeding my family, being an passable father and husband, and dying more or less content is enough. But for me a content death requires some intellectual stimulation, which I think requires skill. And even a minimal level of skill does not come easily.

In addition, my idea of fun intellectual stimulation is not exactly shared by normal society. Apparently, I would bore a Vulcan. But a commentary by Cory Doctorow on the concept of “too much time on his hands” warms my heart:

“‘That guy has too much spare time’ is one of the most odious, intellectually dishonest, dismissive things a person can say. It disguises a vicious ad-hominem attack as a lighthearted verbal shrug…..  [T]he slur brooks no possibility that the speaker has failed to appreciate some valuable, fulfilling element of the subject’s hobby.”

I love the attitude. An Asperger-like stubbornness to do what you do and ignore the ridicule is something that I admire and respect very much. Genuinely. But I do not have such a clear-cut hobby. (Nor do I exactly have very much time on my hands).  Coming closer to home, John D. Cook writes optimistically of the concept jack of all trades and a master of none.

“Calling someone a jack of all trades could be a way of saying that you don’t have a mental category to hold what they do.”

The negative connotations might come from the fact that some seemingly unfocused people have skills that are simply not recognized. This is potentially reassuring. In a similar vein, Venkatesh Rao writes on the  “calculus of grit”. A crucial passage:

“So what does the inside view of grit look like?…. It simply feels like mindful learning across a series of increasingly demanding episodes that build on the same strengths.“

I would like to take some comfort from this. My career interests seem to have no coherent pattern at all (space plasmas; memory optimization; temperature sensors; lightning detection; other stuff). But in fact there is a unifying theme. Any problem, literally any problem, can be attacked by the basic tools given by a scientific training — but with a catch.  A perfect metaphor for half my thinking is the study on the fastest lane in the supermarket by Dan Mayer. The following line resonates:

“This problem has obsessed me for years. It’s my DaVinci code. It’s my love for math, for mathematical reasoning, for the relentless deconstruction of something that seems simply intuitive into data, models, and computation.”

I love the quote, but it is only half the truth.  Meaningful real-world phenomena cannot be reduced to simple mathematical/scientific models. You can always prune down the problem until it can be modeled. But at some point the model is too simple to describe the original problem statement. To me, only half the work is done when the math is done. The other half is to evaluate whether the solution actually has any relevance to the problem. As often as not, it does not, and then it is back to the drawing board.

Perhaps that is what makes Zygomatica a meaningful exercise. Using rational methods to skeletonize a seemingly intractable problem into a scientifically solvable one; trying to solve the problem; and then relentlessly and ruthlessly deconstructing whether the solution has any real-life meaning whatsoever. That is actually a skill set that is not taught at university. This is a mindset that I have applied to problem after problem, project after project, year after year. And that plodding is just perhaps what separates Zygomatica from being a mere exercise in dilettantism.

I may be overoptimistic and covering for an inner insecurity. On the other hand, this is where an Asperger-like attitude comes in handy: I really do not care if this sounds ridiculous. It is my thing. And the Zygomatica thing.

Zygomatica blog and web site launched

 

The Zygomatica.com web site has now been launched. In brief: Zygomatica.com is a demonstrator for thinking differently.

Please see Who we are to see what the team is about. This blog is bilingual in Finnish and English, depending on the subject matter. Materiaali on joko englanniksi tai suomeksi, aiheesta riippuen.

The group has four general interest areas. Blog postings will made quasi-randomly, whenever any member has anything interesting to say (or anything in general). When a question becomes particularly interesting, we may launch a collaborative project around it, inviting others to join. We have four interest areas. One blog posting has already been written for each area.

  • Creativity. The light side of Zygomatica. These are the “problems in search of a solution” of the title. Any projects in this area will be quick and short. First blog posting: Catapult camera. We present an example of a solution which did not find a problem.
  • Opacity. Addresses our value “anything can be a source of information, if properly interpreted”. These will mostly be case studies on how to  mine out information from sources which look opaque. Blog posting: “Onko Google ainoa järkevä suomenkielinen hakukone / Is Google the only useful Finnish search engine?” Yritämme selvittää, johtuuko Googlen 98%:n markkinaosuus siitä, että se olisi hakukoneena valtavasti parempi kuin muut. Pikatutkimuksemme löytää jotain hypoteeseja, mutta lähinnä nostaa esiin jatkokysymyksiä ja -tutkimuksen aiheita. Ennen muuta esitämme kysymyksen, pitäisikö yhden yhtiön monopoliasemasta olla huolissaan. //////////////////   We study whether Google’s 98% share in Finnish search is justified by a uniquely high search quality. Our quick study develops some hypotheses, but mostly raises issues for further study. Above all we ask whether a 98% monopoly by a single company should be cause for national concern.
  • Security. The serious side of Zygomatica. Security is a complex issue that benefits from unconventional thinking. First blog posting / kirjoitus:  Laivaonnettomuuden analyysi, osa 1Näkemys Costa Concordian laivaonnettomuudesta ja ennen muuta sen nostattamista systeemitason kysymyksistä. Myöhemmissä artikkeleissa tullaan käsittelemään artikkelin esiin nostamia uusia kysymyksiä tarkemmin, erityisesti avoimemman datan arvoa turvallisuustekijänä.    //////// A system analysis of the Costa Concordia accident [In Finnish]. Future postings will answers questions raised in this essay, in particular the possible value of information transparency as a safety factor.

“Transparency”: Weathercaching

 

One of the first projects done partly with the Zygomatica team is a prime example of what we are calling “transparency” projects. The idea combines geocaching and open weather data to create what we called “weathercaching“.

The concept is simple indeed:  Let’s enhance geocaching so that you get more points for finding a cache in really horrible weather.

There are two catches that made this a very difficult project indeed:

1) How do you actually define “horrible weather”? We wanted something that would be global, unambiguous, and based on valid physiology and meteorology. We further wanted to define this “horribleness” as a single weather factor (W) between 0.5 and 5, analogous to the difficulty/terrain (D/T) points in normal geocaching.

2) How can you measure the “horribleness” of the weather automatically? Having the user measure the weather at the cache location would sound like a “trivial” solution, but this was felt to be a cop-out; the technological question only becomes interesting if open weather data are used.

Full analysis: A full analysis is on the project page.  (Note that the text is rather academical and dry).

Summary: There are “weather corridors” along Finnish highways which have enough weather stations to allow sufficiently accurate monitoring of the weather (see map below, adapted from www.geocaching.com). Even more importantly, the majority of caches are within these corridors.  Problem 2 is therefore technically solvable. However, we could not find a reasonable solution for problem 1. Meteorology has good parameters to define hazardous weather; it does not have any tools to define miserable weather. Also, there is no unambiguous way to determine W from the weather data; misery is culturally defined.

Conclusion: The technology and data sources exist. The specific application itself is, however, not worth implementing, and no demonstrator was made. Other uses for the weather data might well be possible.

Team:  Jakke Mäkelä, Pertti Sundquist, Gavin Treadgold, Kalle Pietilä, Niko Porjo.

Map adapted from www.geocaching.com

“Opacity”: Search engines / Hakukoneet

 

How to use small languages to study search engines //// Kuinka pieniä kieliä voisi hyödyntää hakukonetutkimuksessa?

[English text in normal font / suomenkielinen teksti vinofontilla]

Search engines is are very opaque; it is difficult to know what to is happening, how to study it, and how to interpret the results. Even in a field field filled with more professional researchers, we feel there are some niches to be explored. We have currently focused on Finnish, as it provides an interesting “laboratory”: a small language with a unique grammar in a high-tech and highly networked country. The amount of raw material is huge.

Hakukoneet ovat käytännössä läpinäkymättömiä: on vaikea tietää mitä tapahtuu, miksi, miten sitä pitäisi tutkia, ja miten tulkita. Alue on täynnä tutkimusta, mutta uskomme löytävämme pieniä erikoisalueita itsellemme. Keskitymme tässä vaiheessa suomen kieleen, koska Suomi on loistava “laboratorio”: pieni ja kummallinen kieli kehittyneessä ja verkostoituneessa maassa. Raakamateriaalia on valtavasti. 

#1/2012: “Onko Google ainoa käyttökelpoinen hakukone suomen kielellä?”[Is Google the only usable search engine in Finnish?]

Täysi raportti / full report (Finnish): Mäkelä et al- Suomalainen Bing_Google 2012- raportti

Haluamme tutkia, onko totta että “Google on ainoa käyttökelpoinen hakukone suomen kielellä”. Tilastojen valossa näin todella on; Googlen osuus Suomessa on noin 98%.  Tämä on käytännössä monopoli, ja sille on syytä etsiä syitä. Lausetta e voitu tutkia analyyttisesti, joten kysymyksenasettelu rajattiin seuraavasti: “Google on merkittävästi parempi hakukone kuin Bing suomen kielellä haettaessa”. Tutkimuksessa vertailtiin osumamääriä, jotka saatiin kun tiettyjä hakusanoja laitettiin Googlen ja Bingin suomalaisversioihin. Todettiin, että Bing palauttaa merkittävästi vähemmän tuloksia kuin Google, keskimäärin alle 10% Googlen osumista. Lisäksi vaikuttaa siltä, että Google reagoi nopeammin nouseviin uutisaiheisiin. Suomen kielen erikoispiirteistä löytyy ainakin kaksi ilmiötä, jotka vaikuttavat hakuihin. Google korvaa skandinaaviset kirjaimet (ä,ö) systemaattisesti yleisesti käytetyillä vastineilla (a,o). Bing sen sijaan ei toimi yhtä systemaattisesti, ja tältä osin voidaan sanoa että Bingin haku ei toimi ainakaan niin kuin on totuttu. Suomen kielessä tavalliset yhdyssanat tuottavat molemmille hakukoneille lieviä ongelmia.Tulokset eivät suoraan kerro mitään hakukoneiden laadusta. Osumien määrä on kuitenkin se subjektiivinen mittari, jota uskomme useimpien käyttävän  määrittelemään kuinka “hyvä” hakukone on. Tällä mittarilla Bing jääkin dramaattisesti jälkeen Googlesta. Lisäksi skandien käsittely toimii Googlessa johdonmukaisemmin. Vaatisi tarkempaa sisältöanalyysiä jotta voitaisiin arvioida onko Google “oikeasti” parempi hakukone; näiden tulosten perusteella on kuitenkin helppo ymmärtää, miksi yleisö näin ajattelee.Googlen osuus maailmanlaajuisesti on noin 90%. Muutamaa poikkeusta lukuunottamatta se on kaikissa Euroopan maissa yli 90%, usein yli 96%. (Vertailun vuoksi USA:ssa osuus on 80%, Venäjällä n 60%, Kiinassa n 30%). Vastaava tutkimus olisi siis hyödyllistä tehdä myös muilla pienillä kielillä.

English summary: We studied the statement “Google is the only feasible search engine for searches in Finnish”. The claim is supported by the 98% market share Google has in Finland. To analyze the question, we studied results from searches made in Finnish by Google and Bing (which with Yahoo the only credible alternative). We found that in terms of number of hits, Google is crushingly dominant, with Bing finding typically less than 10% of the results. Bing seems especially “slow” in finding trending news, which is a serious drawback for a search engine. It is apparent that Google is reasonably well optimized for some quirks of the Finnish language, while Bing is not. The clearest difference is in the processing of Scandinavian characters (ä,ö), where Bing’s performance is unpredictable. Both search engines have some problems with another Finnish quirk, compound words, but neither is clearly superior. Other potential differences were found relating to the agglutinative character of Finnish grammar, but this could not be studied systematically so far. This study did not analyze the “true” quality of Bing vs Google searches at the content level. However, the statistical results alone are sufficient to explain why Bing is not generally considered a viable option in Finnish. Such dominance of a small language by a single search engine should be considered a national concern. The situation is very similar for other small European and other languages, and it is recommended that similar studies be performed in other countries.

 

“Creativity”: catapult camera

 

Catapult camera

The “catapult camera” is an example of a project that produced no useful outcome whatsoever. It is included here as an example of a solution that did not find a problem.

The idea was inspired by mast (telescope) camera systems that can be used to map and monitor for example disaster zones. Typical weights for such systems appear to be a few tens of kg, and are capable of supporting camera weights of 4 kg or more. Typical costs for commercial systems appear to be some thousands of EUR. Typical heights that can be reached 10 meters. There is a technology which is capable of reaching altitudes well above 10 meters: small remote- controlled aircraft (helicopters or gliders). These are however not cheap technologies, and are not necessarily very robust in extreme circumstances.

We proposed building a catapult which is capable of launching a camera up to about 40 meters altitude, taking images while it is in the air, and stitching a panorama image of the pictures.

Full report: Download: CatapultCamera-Final.pdf

Outcome: The solution has far to many issues to be useful in real life. Projectiles are likely to get lost or broken; the image quality is far too poor to be useful. Most problematically, the cost of radio-controlled drones is plummeting, and these will be more competitive in every imaginable way. The problem is valid and important, the solution is not.

Team: Jakke Mäkelä, Niko Porjo, Kalle Pietilä.