During my work, I've often noticed similarities between language and individual daily life location behaviour (as detected by GPS, cell towers, tweets, check-ins etc.). To summarise these thoughts, I've compiled a list of the similarities and differences between language and location below. I then mention a few papers that exploit these similarities to create more powerful or interesting approaches to analysing location data.
Similarities between Location and Language Data
- Both exhibit power laws. A lot of words are used very rarely while a few words are very frequently used. The same happens with the frequency of visits to locations (e.g., how often you visit home v.s. your favourite theme park). This is not a truism. The most frequently visited locations or words used are *much* more likely to be visited/used than most other places/words.
- Both exhibit sequential structure. Words are highly correlated with words near to them on the page. The same for locations on a particular day.
- Both exhibit topics or themes. In the case of language, groups of words tend to co-occur in the same document (e.g., two webpages that talk about cars are both likely to mention words from a similar group of words representing the "car" topic). In the case of location data, a similar thing happens. I mention two interpretations from specific papers later in this post.
- The availability of both language data and location data has exploded in the last decade (the former from the web, the latter from mobile devices).
- There are cultural differences in using language just as there are cultural differences in location behaviour (e.g., Spanish people like to eat out later than people of other cultures).
- Both are hierarchical. Languages have letters, words, sentences, and paragraphs. A person can be moving around at the level of the street, city, or country (during an hour, day, or week).
- Both exhibit social interactions. Language is exchanged in emails, texts, verbally, or in scholarly debate. Friends, co-workers, and family may have interesting patterns of co-location.
Differences between Location and Language Data
- Many words are shared between texts (of same language) but locations are usually highly personal to individuals (except for the special cases of friends, co-workers, and family).
- There are no periodicities in text but strong periodicities in location (i.e., hourly, weekly, and monthly).
- Language data is not noisy (except for spelling and grammar mistakes) while location data is usually noisy.
- Language analysts do not usually need to worry about privacy issues whilst location analysts usually do.
Work that Exploits These Similarities
Here are a few papers that apply or adapt approaches that were primarily used for language models to location data:
They use topic modelling to capture the tendency of visiting certain locations on the same day. This is similar to using the presence of words like "windshield" and "wheel" to place higher predictive density on words like "road" and "bumper" (i.e., topic modelling bags of words). I have talked previously about why I think this is a good paper.
This paper uses a similar application of topic modelling as the one by Farrahi and Gatica-Perez.
This paper uses a model that was previously used to capture sequential structure in words and applies it to Foursquare checkins.
In my own work, I've used the concept of topics to refer to location habits that represent the tendency of an individual to be at a given location at a certain time of day or week. This way of thinking about locations is useful in generalising temporal structure in location behaviour across people, while still allowing for topics/habits to be present to greater or varying degrees in different people's location histories (just as topics are more or less prevalent in different documents).
Both language and location data are results of human behaviour, so it is unsurprising to find similarities, even if I think some of the similarities are coincidental (e.g., power laws crop up in many places and often for different reasons, and the increasing availability of data is part of the general trend of moving the things we care about into the digital domain). The benefits of analysis approaches seem to be flowing in the language -> location direction only at the moment, though I hope one day that will change.