Ever since the Internet became a mass social phenomenon in the 1990s, people have worried about its effects on their privacy. From time to time, a major scandal has erupted, focusing attention on those anxieties; last year’s revelations concerning the U.S. National Security Agency’s surveillance of electronic communications are only the most recent example. In most cases, the subsequent debate has been about who should be able to collect and store personal data and how they should be able to go about it. When people hear or read about the issue, they tend to worry about who has access to information about their health, their finances, their relationships, and their political activities.
But those fears and the public conversations that articulate them have not kept up with the technological reality. Today, the widespread and perpetual collection and storage of personal data have become practically inevitable. Every day, people knowingly provide enormous amounts of data to a wide array of organizations, including government agencies, Internet service providers, telecommunications companies, and financial firms. Such organizations -- and many other kinds, as well -- also obtain massive quantities of data through “passive” collection, when people provide data in the act of doing something else: for example, by simply moving from one place to another while carrying a GPS-enabled cell phone. Indeed, there is hardly any part of one’s life that does not emit some sort of “data exhaust” as a byproduct. And it has become virtually impossible for someone to know exactly how much of his data is out there or where it is stored. Meanwhile, ever more powerful processors and servers have made it possible to analyze all this data and to generate new insights and inferences about individual preferences and behavior.
This is the reality of the era of “big data,” which has rendered obsolete the current approach to protecting individual privacy and civil liberties. Today’s laws and regulations focus largely on controlling the collection and retention of personal data, an approach that is becoming impractical for individuals, while also potentially cutting off future uses of data that could benefit society. The time has come for a new approach: shifting the focus from limiting the collection and retention of data to controlling data at the most important point -- the moment when it is used.
In the middle of the twentieth century, consumers around the world enthusiastically adopted a disruptive new technology that streamlined commerce and made it possible for ordinary people to do things that, until then, only businesses and large organizations could do. That technology was the credit card. In return for a line of revolving credit and the convenience of cashless transactions, credit card users implicitly agreed to give financial institutions access to large amounts of data about their spending habits. Companies used that information to infer consumers’ behavior and preferences and could even pinpoint users’ whereabouts on any given day. As the rapid global adoption of credit cards demonstrated, consumers mostly thought the tradeoff was worth it; in general, they did not feel that credit card companies abused their information.
In order to ensure that companies used all this new data responsibly, the Organization for Economic Cooperation and Development produced a set of guidelines on the protection of privacy and the flow of data across borders. Those principles, established in 1980, created the general framework that corporations still rely on when it comes to protecting individual privacy. The guidelines directed companies on the proper way to collect and retain personal data, ensure its quality and security, and provide meaningful opportunities for individuals to consent to the collection and have access to the data collected about them. According to the OECD, the guidelines helped increase the proportion of member countries with privacy laws from one-third to nearly all 34 of them, while also influencing EU privacy laws.
Thirty-four years later, these well-intentioned principles have come to seem ill suited to the contemporary world. They predated the mainstream adoption of personal computers, the emergence of the Internet, and the proliferation of cell phones and tablet computers. They were developed when only science-fiction writers imagined that someday soon more than a billion people would carry pocket-sized computers that could track their locations down to the meter, every second of every day; nearly all communication and commerce would be mediated electronically; and retailers would use data and computer modeling to determine what a particular consumer wants even before he is aware of it.
Such changes have exposed the limitations of the late-twentieth-century approach to protecting personal data, which focused almost entirely on regulating the ways such information was collected. Today, there is simply so much data being collected, in so many ways, that it is practically impossible to give people a meaningful way to keep track of all the information about them that exists out there, much less to consent to its collection in the first place. Before they can run an application on their smartphones or sign up for a service on a website, consumers are routinely asked to agree to end-user license agreements (EULAs) that can run to dozens of pages of legalese. Although most EULAs are innocuous, some contain potentially unwelcome clauses buried deep within them. In one recent example, a Web application included a clause in its EULA that granted the software maker permission to use the user’s spare computing power to generate Bitcoins, a form of digital currency, without any compensation for the user. Another example is a popular flashlight application called Brightest Flashlight Free, which collected its users’ location data and then sold it to marketing companies -- without revealing that it was doing so. (Last December, the U.S. Federal Trade Commission forced the application’s maker to abandon this deceptive practice.)
In both cases, users technically had an opportunity to consent to these practices. But that consent was effectively meaningless, since it did not offer a clear understanding of how, when, and where their personal data might be used. The Bitcoin-mining application’s dense, 5,700-word EULA was so vague that even a user who made the unusual choice to actually read it carefully might not have understood that it gave the application’s maker the right to virtually hijack the computing capacity of the user’s device. Although the flashlight application explicitly requested access to users’ location data (a request most people reflexively approved), that was more of a ruse than an honest business practice, since the company hid the fact that it provided the data to others.
Other forms of data collection can be even more challenging to regulate. More and more collection happens passively, through sensors and on servers, with no meaningful way for individuals to be made aware of it, much less consent to it. Cell phones continually share location data with cellular networks. Tollbooths and traffic cameras photograph cars (and their occupants) and read license plates. Retailers can track individuals as they move around a store and use computer-backed cameras to determine the gender and approximate age of customers in order to more precisely target advertising. In a scenario reminiscent of the 2002 film Minority Report, stores might soon be able to use facial-recognition technology to photograph shoppers and then match those images with data from online social networks to identify them by name, offer them discounts based on their purchasing histories, and suggest gifts for their friends.
Using powerful new computing tools and huge data sets gathered from many different sources, corporations and organizations can now generate new personal data about individuals by drawing inferences and making predictions about their preferences and behaviors based on existing information. The same techniques also make it harder to keep personal information anonymous. Companies that have access to multiple sources of partial personal information will find it increasingly easy to stitch pieces of data together and figure out whom each piece belongs to, effectively removing the anonymity from almost any piece of data.
GOOD INTENTIONS, BAD EFFECTS
Many people understandably find this state of affairs troubling, since it seems to suggest that their privacy has already been compromised beyond repair. But the real issue is not necessarily that their privacy has been violated -- just because the information is out there and could be abused does not mean that it has been. Rather, it is that people do not know who possesses data related to them and have no way to know whether the information is being used in acceptable ways.
One common reaction is to demand stricter controls on who can collect personal information and how they can collect it by building user consent into the process at every stage. But if an individual were given the opportunity to evaluate and consent to every single act of data collection and creation that happens, he would be forced to click “yes” or “no” hundreds of times every day. Worse, he would still have no way to easily verify what happened to his data after he consented to its collection. And yet it would be very hard for most people to opt out altogether, since most people enjoy, or even rely on, services such as social networks that require personal data about their users in order to function or applications and services (such as e-mail software, productivity tools, or games) that they use for free in exchange for agreeing to receive targeted advertising.
Officials, legislators, and regulators all over the world have yet to grasp this reality, and many well-meaning attempts to address public concerns about privacy reflect an outdated understanding of the contemporary data ecosystem. Consider, for instance, the EU’s General Data Protection Regulation, which is expected to take effect in 2016. This new regulation requires individual consent for the collection of data and the disclosure of the intended use at the time of collection. It also creates a “right to be forgotten” (the requirement that all data on an individual be deleted when that individual withdraws consent or his data is no longer needed), ensures the availability of personal data in a form people can easily access and use, and imposes fines on companies or organizations that fail to comply with the rules.
Although well intentioned, this new regulation is flawed in its focus on the collection and retention of data. It will help unify laws and practices regarding privacy, but it does not adequately address the practical realities of data collection today. It requires valid consent for collecting data, but it does not consider sensitive information that is created by algorithms using data from completely public sources that can infer an individual’s age, marital status, occupation, estimated income, and political leanings based on his posts to various social networks. Nor will the new rules apply when data is collected passively, without a meaningful opportunity for consent. And besides the “right to be forgotten,” the rules will not do much to address the crucial question of how data can and cannot be used.
Such efforts to restrict data collection can also produce unintended costs. Much of the information collected today has potential benefits for society, some of which are still unknown. The ability to analyze large amounts of aggregated personal data can help governments and organizations better address public health issues, learn more about how economies work, and prevent fraud and other crimes. Governments and international organizations should not prevent the collection and long-term retention of data that might have some as-yet-undiscovered beneficial use.
For instance, in 2011, researchers at the health-care giant Kaiser Permanente used the medical records of 3.2 million individuals to find a link between autism spectrum disorders in children and their mothers’ use of antidepressant drugs. They determined that if a mother used antidepressants during pregnancy, her child’s risk of developing such a disorder doubled. The researchers had access to those medical records only because they had been collected earlier for some other reason and then retained. The researchers were able to find a particularly valuable needle, so to speak, only because they had a very large haystack. They would almost certainly not have made the discovery if they had been able to conduct only a smaller, “opt-in” study that required people to actively consent to providing the particular information the researchers were looking for.
Further medical breakthroughs of this kind will become more likely with the emergence of wearable devices that track users’ movements and vital signs. The declining cost of genome sequencing, the growing adoption of electronic medical records, and the expanding ability to store and analyze the resulting data sets will also lead to more vital discoveries. Crucially, many of the ways that personal data might be used have not even been imagined yet. Tightly restricting this information’s collection and retention could rob individuals and society alike of a hugely valuable resource.
When people are asked to give a practical example of how their privacy might be violated, they rarely talk about the information that is being collected. Instead, they talk about what might be done with that information, and the consequences: identity theft or impersonation, personal embarrassment, or companies making uncomfortable and unwelcome inferences about their preferences or behavior. When it comes to privacy, the data rarely matters, but the use always does.
But how can governments, companies, and individuals focus more closely on data use? A good place to start would be to require that all personal data be annotated at its point of origin. All electronic personal data would have to be placed within a “wrapper” of metadata, or information that describes the data without necessarily revealing its content. That wrapper would describe the rules governing the use of the data it held. Any programs that wanted to use the data would have to get approval to “unwrap” it first. Regulators would also impose a mandatory auditing requirement on all applications that used personal data, allowing authorities to follow and observe applications that collected personal information to make sure that no one misused it and to penalize those who did. For example, imagine an application that sends users reminders -- about errands they need to run, say, or appointments they have to keep -- based on their location. Such an application would likely require ongoing access to the GPS data from users’ cell phones and would thus have to negotiate and acquire permission to use that data in accordance with each user’s preferences.
Such approaches are feasible because data and applications always work together. The raw materials of personal information -- a row of numbers on a spreadsheet, say -- remain inert until a program makes use of them. Without a computer program, there is no use -- and without use, there is no misuse. If an application were required to tell potential users what it intended to do with their data, people might make more informed decisions about whether or not to use that application. And if an application were required to alert users whenever it changed the way it used their data and to respond to their preferences at any time, people could modify or withdraw their consent.
A progenitor of this approach emerged in the past decade as consumers began listening to music and watching movies online -- and, in many cases, illegally downloading them. With profits threatened by such widespread piracy, the entertainment industry worked with technology firms to create digital rights management systems that encrypt content and add metadata to files, making it much harder for them to be illegally opened or distributed. For example, movies purchased from Apple’s iTunes store can be played on only a limited number of computers, which users must authorize by linking them to their Apple accounts. Such mechanisms were given legal weight by legislation, including the 1998 Digital Millennium Copyright Act, which criminalized the circumvention of copyright-protection systems. Although there was some resistance to early forms of digital rights management that proved cumbersome and interfered with legitimate and reasonable consumer behavior, the systems gradually matured and gained acceptance.
Digital rights management has also worked well in other areas. The rights protections built into Microsoft’s Office software uses encryption and metadata to allow users to specify who can and who cannot read, edit, print, or forward a file. This control gives individuals and organizations a simple and manageable way to protect confidential or sensitive information. It is not difficult to imagine a similar but more generalized scheme that would regulate the use of personal data.
Focusing on use would also help secure data that is already out there. Software that works with personal data has a shelf life: it is eventually upgraded or replaced, and regulators could require that programmers build new protections into the code whenever that happens. Regulators could also require all existing applications to officially register and bring their data usage into compliance.
Any uniform, society-wide effort to control the use of data would rely on the force of law and a variety of enforcement regimes. Requiring applications to wrap data and make it possible to audit the way they use it would represent a major change in the way thousands of companies do business. In addition to the likely political fights such a system would engender, there are a number of technical obstacles that any new legal regime would have to overcome. The first is the issue of identity. Most people’s online identities are loosely connected to their e-mail addresses and social networking profiles, if at all. This works fine for digital rights management, in which a single entity owns and controls the assets in question (such as a digital copy of a film or a song). But personal data lives everywhere. A person’s expressed preferences cannot be honored if they cannot be attached to a verifiable identity. So privacy protections focused on the use of personal data would require governments to employ better systems for connecting legally recognized online identities to individual people.
The winds of online identity are already blowing in this direction. Facebook requires that people sign up for the service using their real names, and Twitter takes steps to verify that certain accounts, such as those connected to celebrities and public figures, actually represent the people they claim to represent. One intermediate step toward a more systemic creation of verified online identities would allow people to designate which of their online personas was the authoritative one and use that to specify their privacy preferences.
But even if governments could devise ways to more rigorously connect individuals with verifiable online identities, three additional kinds of validated “identities” would need to be created: for applications, for the computers that run them, and for each particular role that people play when they use an application on a computer. Only specific combinations of identities would be able to access any given piece of personal data. For example, a researcher working on an epidemiologic study might be allowed to use a specific application to work with a specific data set on his institution’s computer system, but an actuary would not be permitted to use that application or that data on his computer to set prices for health insurance. Or a physician attending to a patient in the emergency room might have access to information for treatment that he would not have access to in another role or circumstance.
Lawmakers would have to put in place significant penalties to make sure people played by the new rules. The only effective deterrents would be punishments strong enough to give pause to any reasonable person. Given the value that can be extracted from personal data, a fine -- even a significant one -- could be perceived by a bad actor (whether an individual or a corporation) as merely part of the cost of doing business. So privacy violations would have to be considered serious criminal offenses, akin to fraud and embezzlement -- not like mere “parking tickets,” which would not deter rogue operators and companies.
If someone suspected that his personal data had been misused, he could contact the appropriate regulator, which would investigate and prosecute the abuse, treating it as a crime like any other. A suspect incident could include receiving a targeted advertisement informed by something a person had not agreed to allow advertisers to know or noticing that one’s health insurance premiums had gone up after posting about participating in extreme sports on a social network.
Moving from the current model to this new way of controlling privacy would require political will and popular support. It would also require people to constantly reevaluate what kinds of uses of their personal data they consider acceptable. Whether a particular use is appropriate depends on the context of the use and the real or perceived value that individuals and society get in return. People might gladly consent to the use of their social networking activity by researchers in a population-wide health study, but not necessarily by insurers who want to use information about their health, recreation, and eating habits to determine their coverage or premiums.
Another challenge would be figuring out practical ways for individuals to express their preferences about personal data, given the wide range of possible uses and the many different contexts in which such data comes into play. It would be impossible to write all the rules in advance or to craft a law that would cover every class of data and every potential use. Nor would it be sensible to ask people to take a few hours and write down how they might feel about any current or theoretical future use of their information.
One potential solution might be to allow people to delegate some choices about their preferences to organizations they trust. These organizations would serve as watchdogs, keeping track of applications and uses as they emerge and change and helping guide regulators’ investigations and enforcement. If someone were concerned about the use of personal data by advertisers, for instance, he might choose to delegate the management of his preferences to an organization that specialized in keeping an eye on marketers’ evolving techniques and behaviors. If this user’s preferences changed or he was not satisfied with the organization’s work, he would be able to withdraw his specific consent or delegate this work to a different organization.
A system of this sort would require a combination of innovative new national and international laws and regulations, since the infrastructure of the Internet does not stop at national borders. Here again, the example of digital rights management is instructive: the U.S. Digital Millennium Copyright Act put into law the provisions of two treaties signed by members of the World Intellectual Property Organization in 1996 and created mechanisms for the federal government to oversee and enforce the law.