Comment on "How Should We Analyse Our Lives?"

 

I just heard about a new book, called "Social Physics", being released at the end of the month by Alex "Sandy" Pentland (the famous professor of the MIT media lab). According to the synopsis, the book is about "idea flow" and "the way human social networks spread ideas and transform those ideas into behaviours".

More thoughts on this once it's actually released. But at first glance, the question of how ideas spread is typical popular science fare, so I hope Prof. Pentland's unique perspective will make it truly different from what's already out there. Also, I'm always dubious of attempts to raise the spectre of "physics" in the context of behaviour analysis. The intended analogy is clearly between finding universal laws of human behaviour and finding universal laws in physics. But scientists in every field are trying to find universal laws! Should we now rebrand economics as "currency physics", computer science as "algorithmic physics", and biology as "organic physics", etc.?

With those (rather superficial) caveats, the book is clearly relevant to my research topic of trying to analyse and predict location behaviour using the digital breadcrumbs left by humans in daily life. I'm looking forward to reading the book as I have often found inspiration in Prof. Pentland's research.

The book was given an interesting review by Gillian Tett of the Financial Times last weekend. Her main point of agreement with the book is that the difference between new and old research on human behaviour analysis is due to the size of the data, plus the extent to which that data is interpreted subjectively v.s. objectively (e.g., anthropologists analysing small groups of people v.s. the 10GB that Prof. Pentland collected on 80 call centre workers at an American bank).

So far so good. But she goes on to criticise computer scientists for analysing people like atoms "using the tools of physics" when in reality they can be "culturally influenced too". As someone who uses the tools of physics to analyse behaviour (i.e., simulation techniques and mean field approximations that originated in statistical physics) I think this is false dichotomy.

There are two ways you can read the aforementioned criticism. The first is that statistical models of human behaviour are unable to infer (or learn) cultural influences from the raw data. The second is that the models themselves are constructed under the bias of the researcher's culture.

In the first case, there's nothing to stop researchers from finding new ways of allowing the models to pick up on cultural influences in behaviour (in an unsupervised learning manner), as long as the tools are rich and flexible enough (which they almost certainly are in the case of statistical machine learning).

The second case is more subtle. In my experience, the assumptions that are expressed in my, and my colleagues', models are highly flexible and don't appear to impose cultural constraints on the data. How do I know? I could be wrong, but my confidence in the flexibility of our models is due to the fact that model performance can be quantified with hard data (e.g., using measures of precision/recall or held-out data likelihood). This means that any undue bias (cultural or otherwise) that one researcher imposes will be almost immediately penalised by another researcher coming along and claiming he or she can do "X%" better in predictive accuracy. This is the kind of honesty that hard data brings to the table, though I agree that it is not perfect and that many other things can go wrong with the academic process (e.g., proprietary data making publications impervious to criticism, and the fact that researchers will spend more effort on optimising their own approach for the dataset in question v.s. the benchmarks).

My line of argument of course assumes that the data itself is culturally diverse. This wouldn't be the case if the only data we collect and use comes from, for example, university students in developed countries. But the trend is definitely moving away from that situation. Firstly, the amount of data available is growing by a factor of 10 about every few years (driven by data available from social networking sites like Twitter and Foursquare). At a certain point, the amount of data makes user homogeneity impossible, simply because there just aren't that many university students out there! Secondly, forward thinking network operators like Orange are making cell tower data from developing countries available to academic researchers (under very restricted circumstances I should add). So in conclusion, since the data is becoming more diverse, this should force out any cultural bias in the models (if there was any there to start with).

Leave a Reply

Your email address will not be published.