Below are some of the most useful papers for location prediction from mobile data (GPS, cell towers, Foursquare checkins):
- Ashbrook, Daniel, and Thad Starner. "Using GPS to learn significant locations and predict movement across multiple users." Personal and Ubiquitous Computing 7.5 (2003): 275-286.
- A classic in the field, introducing a vision for finding significant places (with respect to individuals) and learning structure from past behaviour to make predictions. Although there are some shortcomings (e.g., their method of finding significant locations involved manual calibration, and they only considered a first-order Markov model to make predictions), I think they provide a compelling vision that stands the test of time 10 years later.
- Eagle, Nathan, and Alex Sandy Pentland. "Eigenbehaviors: Identifying structure in routine." Behavioral Ecology and Sociobiology 63.7 (2009): 1057-1066.
- In this paper, Eagle and Pentland represent each day of an individual's mobility as a point in high dimensional space, then use dimensionality reduction (principle component analysis) to find a set of "eigenbehaviors" that best characterises the location behaviour of that individual. Prediction can then be done by finding the mix of eigenbehaviors that best recreates a partially-seen day. The general form of the eigenbehaviors also allows comparison of habits between people, and they found some nice results showing how students from different faculties have similar location habits. I prefer the exploratory applications of this approach more than the predictive aspect, because I think the high accuracy results they report are more a function of people staying a long time at the same location in their dataset (since they consider a person to always be in 1 of 4 locations: home, work, other, or no signal). Still, I was always inspired by this paper. An extension that considered richer data sets was done by Sadilek and Krumm (2012), which found good results when also incorporating temporal information (day of week, whether the day was a national holiday).
- Gao, Huiji, Jiliang Tang, and Huan Liu. "Exploring Social-Historical Ties on Location-Based Social Networks." ICWSM. 2012.
- I think this is a strong paper that deserves more attention. It is an extension of the Ashbrook and Starner paper in the sense that the authors provide a more sophisticated way of doing sequential location prediction (i.e., "given your recent locations, where are you likely to be next?") using hierarchical Pitman-Yor processes (HPY). HPY assumes that the pattern of individual daily life locations follows a power law, with the rich getting richer (i.e., there are a few locations that are highly visited, and lots of locations that are hardly ever visited), which has been empirically observed in Gonzalez et al. (2008). Furthermore, HPY is a Bayesian way of varying the number of recent locations considered in the prediction (similar to variable-order Markov models, but with a more principled motivation that doesn't require a max-depth parameter). I suggest reading the paper for a more detailed description than I can give here.
- Horvitz, Eric, and John Krumm. "Some help on the way: Opportunistic routing under uncertainty." Proceedings of the 2012 ACM Conference on Ubiquitous Computing. ACM, 2012.
- The authors propose an anticipatory mobile computing system that recommends interesting places to drivers on their way to a predicted location. The prediction part of the system is assumed (using a previous approach that assumes a rational driver) so the focus is on how to calculate the places to recommend using expected utility. They also consider how to balance the expected benefit of asking a driver to confirm their destination (making the suggested places more relevant) against the cost of interruption. Clearly, there will be times when knowing the user's destination for certain would not change the recommendation very much (v.s. only having a probabilistic prediction), so this approach is useful in avoiding the "talking paperclip" syndrome where anticipatory applications interrupt users too much.
- Farrahi, Katayoun, and Daniel Gatica-Perez. "A probabilistic approach to mining mobile phone data sequences." Personal and Ubiquitous Computing (2013): 1-16.
- This work provides a Bayesian probabilistic model of individual location behaviour that is a hybrid of latent Dirichlet allocation (LDA) and the eigenbehaviors approach of Eagle and Pentland. The similarity to eigenbehaviors comes from the assumption that there exists a set of characteristic motifs that repeat themselves in the data (e.g., leaving home to go to the bus station, then to work). The authors' comparison to n-gram models appears to be a bit of a red herring to me, as the next location in their model is not dependent, in general, on the most recent previous locations (that is not to say a mobility model requires the n-gram assumption to be useful). The benefit of having LDA underneath it all (as opposed to, say, a mixture model) is to express the assumption that motifs are locally correlated within a single day. Intuitively, if I follow the aforementioned "going to work" motif in the morning, then I am probably more likely to follow other workday-related motifs (e.g., going to the gym, then home) than other types of motif later that day. With hierarchical Bayesian modelling, this type of structure can be learnt in an unsupervised way from the data, and then be used to make predictions about future behaviour.
I had to leave a lot of very good papers out, so I hope to make a longer list in future.