The Data of a Thousand Tales
Who owns your mobile phone data?
In the spring of 2016, France's former President François Hollande officially opened Orange Gardens, an innovation campus near Paris built to facilitate French telecom operator Orange’s R&D activities. Addressing around three thousand employees that roll out digital services like mobile banking or eSports tournaments for a living, Hollande, who at the time had a whopping 4% approval rating, tried to hit the right notes. "There is a great satisfaction in being able to say that French technology is among the best [in the world], especially in the services you present," he said. While I watched the 62-year old President sweating in the midday sun, I could not shake the feeling that he had no idea how an undercurrent of innovation taking place around mobile phone data is drastically changing relations between European operators, governments, and the general public.
Want to keep reading?
This story is free! But if you want to support us, you could spoil yourself with a printed version of this story.
The telecom industry is still surfing on the wave of the digital transformation that surged in 2007 with the popularization of the smartphone, and which is still entering our daily lives as an endless chain of digital services that are being developed, launched, altered, and killed by telecom giants all over the world. At the same time, innovation in the industry is backed by a continuous state of technical development, like the rise of 5G. But now, there is also a silent revolution underway in the way mobile phone data is being treated, analyzed, and marketed. It is an area of innovation that is so new that experts still aren’t sure exactly where it is heading or what consequences it might bear in time.
Mobile phone data are “digital breadcrumbs produced by the information technologies that humans use in their daily activities,” writes Luca Pappalardo, an Italian specialist in big data analysis in his 2016 study that uses mobile phone data to nowcast well-being. Telecom operators have collected these types of “breadcrumbs” for years now, using the information gathered—like how many phones are connected to a given cell tower at peak usage, or how many minutes an individual user has spent on the phone in a month—to improve their networks or to provide correct billing. In other words, every time you make or receive a call or text message, a data line is created that contains information on the used mobile phone, the intended receiver, the deployed cell tower and its location, and the time in which the activity took place. This data is known as “Call Detailed Records,” or CDR data.
CDR data collected from an individual mobile phone over time can be used to reconstruct movement patterns (by connecting the different cell towers a phone has been using over time) or even social networks (by connecting the initiating and receiving mobile phones with each other). Take into account the time of collection, and you’ll be able to distinguish among locations that were visited at different times of the day, the week, the year. Or you could analyze what other mobile phones were contacted during working hours or late at night. CDR, in other words, gives you a glance into what I like to call the thousand tales lived by the people that create the data. But they do not allow you to capture the content of these tales—the breadcrumbs are there, but it’s impossible to tell who left them, or what they were doing at the time.
CDR data aren’t just collected for individual mobile phones, but for all phones contracted by a single operator, which, because of the widespread adoption of mobile phones, can easily result in millions of mobile phones in a single country, or even a single city. But what could the data from millions of phones tell us? What value might it have for science, public policy, or commercial purposes? For Dr. Zbigniew Smoreda, a Polish-French sociologist who works at Orange Gardens, and who is considered one of the world’s main experts in mobile phone data, that horde of information is “invaluable.”
“Imagine for example you want to study the day-to-day presence of the population from an entire country, you could turn to figures from official statistics,” he said. “These numbers normally cover the entire population, but they are about residential population, so they only tell us something about where people are supposed to live, not where they spend most of their time, what places they visit, who they interact with, and so forth. Mobile phone data might only cover a part of the population, but their [informational] richness is so much deeper compared to official statistics that it would be either impossible or extremely costly to gather similar information at such a large scale with other data collection methods like surveys or questionnaires.”
As a consequence, huge investments have been made over the past decade into analyzing mobile phone data. So far, it’s produced insights and estimates on domestic tourism patterns, long-distance trips (which are typically hard to collect in surveys), and commuting patterns. It has shed light on the daily functioning of contemporary cities, on their nightlife patterns, and on the effects of large-scale events on urban transport. And it has opened up ways to empirically verify old sociological theories like the six degrees of separation—a theory that states all people are six, or fewer, social connections away from each other.
If you think that sounds far-reaching, then you are in for another surprise. Mobile phone data is being incorporated everywhere, from urban planning to transport planning, political analysis, measuring carbon emission, geo-marketing, official statistics, crime prevention, observation of informal economies, disaster response, psychological research, climate change research, migration research, the development of digital services like mobile banking, and to aid the envisioning of future technological developments like electric or autonomous cars.
Under current technologies which allow the analysis of “individual and collective behavior at an unprecedented scale, detail, and speed,” according to Pappalardo, applications of mobile phone data are not only relevant at local, regional, and national scale, but are also becoming easier for telecom operators to deliver, opening up an entire new market for services and products.
In a way, academic groundwork has paved the way for operators to appreciate the (economic) value of their own data. Simultaneously, it has also alarmed governing bodies and the public when it comes to privacy, ownership and the use of big data for the public good. “Privacy, by far, has been our main concern,” said Fernando Reis, a senior member of the Task Force Big Data set up at Eurostat, the official statistics office of the European Union. “After that we are actively supporting collaboration between official statistics, government bodies, and operators to ensure the added value of big data can also be used for the best of the general public.”
The public is right to be concerned about privacy since operators are collecting more information, and at higher spatial and temporal resolutions, than ever before. A French CDR dataset from 2007 (the year the first iPhone came on the market), covering almost 18 million persons, had an average amount of four data points per person per day. Newer mobile phone data, such as Data Detailed Records (DDR), which capture roaming interactions with the cell tower network, or signalling data, which pings the location of cell phones on the network roughly every few hours, would hold tens or even hundreds of data points per day per person.
Sure, none of these data types come close to the invasiveness of continuous GPS traces that are gathered by smartphone applications like Google Maps or Facebook, but they still form a clear threat to privacy. As a result, the EU has severely regulated their use—at least for companies that fall within European jurisdiction. For example, the EU has long forbidden the coupling of mobile phone data with any individual identifying characteristics about a phone’s user, thus effectively rendering mobile phone data useless for consumer advertising or socioeconomic research at individual level. This specific regulation means that Europeans are protected against practices like individual targeting by bounty hunters, or credit scoring, in a way that American and Chinese citizens are not.
However, recent rules set by national data protection authorities—the data watchdogs that enforce European regulations and guidelines—are inflicting far-reaching consequences on innovation in ways that legislators probably didn’t consider. One example is restrictions on the length of time that information from individual phones can be saved under the same identifier in databases. In France, individual phone records can currently only be stored under the same identifier for a period of maximum 24 hours. After that, CNIL, the French data protection authority, is forcing operators to change identifiers in their database so that no continuous tracing can be done. At first glance, this rule seems reasonable, because it prevents the long-term tracking of individual phone trajectories, which reduces the threat to privacy.
But these restrictions bear consequences. They sap the competitiveness out of European companies vis-à-vis other global players that collect similar, or even more detailed data from smartphone apps, without facing similar restrictions. They also drastically reduce the scientific value of datasets, which is one of the reasons that researchers in the field are redirecting their work towards other continents where similar regulations are not (yet) in place, causing a soft brain drain. Third, and most importantly, restrictions on the use of mobile phone data have provided European telecom companies with an extra argument for selling pre-packaged, aggregated data to public parties, instead of providing them cooperative access to the raw datasets as, in their view, that would run an bigger risk of infringing privacy directives. The latter argument is paradoxical, as shared access to the raw data would actually demand a higher degree of transparency, which in the long term would almost certainly reduce risks of infringement both by private and public parties. But it turns out that current restrictions are actually driving companies away from cooperation and into service provision, which they happily deliver with a bill.
Take a step back and you’ll realize this raises some basic questions regarding ownership of data. When CDR data is purchased (as it frequently is) by local, regional, and national governmental agencies or academic institutions, are taxpayers unfairly paying for something that they themselves have produced? On top of that, blocking access to mobile phone data is not beneficial for the development of applications that serve the public good. Official statistics offices across Europe, for example, have widely recognized the potential for mobile phone data to complement and augment their services, such as a timelier production of figures that are useful to policymakers. The ESSnet Big Data (a joint project of national official statistics offices supported by Reis’ task force at Eurostat) has been trying to develop pilot projects to integrate big data and official statistics for years, but progress has been slow because of limited access to the data sources.
There is more than one way to reconcile the tensions between privacy, ownership, and the public good. One appealing solution is the construction of a centralized database that is filled with data from individual operators, but is governed by an external party—perhaps a government agency, or an official statistics office. Access to the centralized database could be granted only to parties who submit their algorithms for review by experts in privacy and legislation. Such a system of data access by governed algorithms would have several advantages—all analytical activity would be monitored and logged, European companies would regain their ability to compete (especially if global players were obliged to hand over data they collect on European citizens), and universities and other research organizations could gain access to these datasets. Finally, a centralized data set would increase the overall quality of all analysis performed, because it would combine datasets that currently belong to different operators. A ministry that buys data from a single operator to, say, better understand mobility in a region, shouldn’t create public policy based on that single dataset because it will be unrepresentative of the population at large.
Establishing a system of centralized databases and governed algorithms will not be easy though. It will face numerous political, legal, and practical barriers, not to mention the pushback from businesses that have invested in—and are profiting from—the way data is currently stored, treated, and sold. But the potential public benefit is immense, and it would be a way for governments to extend the principles of democratically legitimate public action to what is now a data Wild West.
What is intriguing about the case of mobile phone data in contemporary Europe, is that it reflects some of the very challenges the European Union is currently facing. Add a dash of abstraction and you’ll find that the problems with regard to mobile phone data are not uncommon to what occupies the minds of policymakers in Brussels or Strasbourg—the protection of individual privacy, the economic gains from integration and nation-wide cooperation, the tension between private actors and public offices, the need for global competitiveness amongst others to prevent brain drain, and the importance of general education.
For years to come, big data is set to affect all of our lives, from the seasoned legislator, to the marketing-targeted consumer and the sub-populations that are going to be excluded by data-driven policy. And yet, general education around the topic is heavily skewed towards a few highly educated experts that either work for data businesses, or who (in the case of researchers) are often pushed to collaborate with them in exchange for access to their data. It is a rather uncanny observation that while European universities are training the first ever generation of big data masters, their knowledge is often very technical and lacks an ethical component. Financial incentives are steering talent towards generating profits, rather than to do what is best for the public at large. The consequence is that active, critical voices are limited, and expertise in big data is rare in fields that are nonetheless highly relevant to current debates like politics, law, or sociology.
What is even worse is that we seem unaware of how the digital breadcrumbs we leave behind are influencing decisions that have a real impact on our lives. When it comes to informing the public on the use of their own data, companies remain silent out of fear for repercussions. Journalism hasn’t yet risen to the occasion, politicians either remain silent or ignorant, and general education is lagging behind by decades.
Taken together, there is real reason for concern. It will take bravery and cooperation from both the younger generation of tech talent and those who hold political power to ensure future developments around the use of European citizens’ mobile phone data are steered in the right direction. Peacocking and parading our technologies in innovation campuses won’t do the trick. As much as ignorance might be bliss, we shouldn’t be waiting around for yet another wake-up call on how important data in our society is, and how poorly we are currently managing it.
This article appears in Are We Europe #5: Code of Conscience