Since nearly all smartphones in America use Google or Apple operating systems, the joint Google-Apple coronavirus Exposure Notification API was an important tool in the fight against the coronavirus. It took privacy seriously: the Bluetooth information it collected could only be stored locally on smartphones and could not be combined with GPS data.
Though over 20 state governments eventually used the API to build contact tracing applications, the applications never gained critical mass, partly due to surveillance concerns. This highlights an urgent need to improve public trust in technology by addressing the friction between privacy rights and data-hungry applications.
Americans are increasingly concerned about surveillance. In a 2018 opinion, Chief Justice John Roberts wrote: “when the government tracks the location of a cell phone it achieves near perfect surveillance”. Supreme Court rulings have extended Fourth Amendment protection to many kinds of cell phone information, requiring warrants for searches. While no comprehensive federal privacy legislation exists, sector-specific laws cover many kinds of sensitive information and some states are enacting laws based on the E.U. GDPR.
Existing regulations leave plenty of room for creative and controversial data practices. Robinhood – a commission-free stock trading service recently valued at $11.2 billion – reportedly makes about 40% of its revenue from directing trades to high-frequency trading (HFT) companies for execution. A growing number of commentators believes that the HFT companies are paying Robinhood for the right to execute the trades so that they can gain data about how Robinhood customers are trading. HFT companies can then use this data to inform their own trades.
The Existing Situation is Untenable
Most organizations are seeking relevant and high-quality user data to turbocharge their AI efforts. Meanwhile, people are concerned about privacy, but they also want to benefit from better services.
Paradoxically, this often results in applications that are somehow perceived as both prying and ineffective. Germany has a population of 84 million, but its relatively successful Corona-Warn-App – which is built on the Google-Apple API and stores data locally – only had 28 million downloads by December 2020, a full year into the coronavirus pandemic. Privacy concerns on one hand (due to the application’s nature) and effectiveness concerns on the other hand (due to its privacy protections) prevented greater uptake.
Idealists have sought to reconcile privacy with data-hungry business models in new ways. The MIT-affiliated Solid Project recommends that each person control access to their data by carrying it around in a secure ‘pod’. Andrew Yang’s Data Dividend Project hopes to force social media companies to pay users for personal information. Though commendable, such efforts do not envision a paradigm shift that could spur economic growth while protecting privacy.
A Decentralized and Advanced Internet
A decentralized internet is one in which people control their information, but can easily, fully and remotely access cutting-edge internet capabilities – such as remote transmission of real-time health data from wearables to physicians. Only relevant bits of personal information are shared, consensually, with as few people as possible.
Self-sovereign identity and federated learning are two promising approaches to decentralizing the internet.
Self-sovereign identity envisions an internet in which people hold and control their data, using it to transact in a peer-to-peer fashion (on a network of interoperable distributed ledgers). For example, a person could store a proof from an issuer (such as a proof of coronavirus vaccination from a health authority) and share it with someone for a specific purpose (such as an airline staff member, to board an international flight), without having to consult the issuer again.
One person could control multiple digital identities, each of which can hold multiple proofs. Each person’s sensitive data could thus be dispersed across multiple digital identities, improving privacy. Information would be stored and transmitted only in encrypted forms, but would be instantly and completely auditable with the right permissions.
Ironically, achieving a decentralized internet based on self-sovereign identity will likely require buy-in from a government or a major technology company, to drive user adoption and set standards for smaller players. Any such system would also require enough flexibility to comply with evolving laws – such as those that combat money laundering.
Federated learning is a decentralized approach to machine learning in which an algorithm trains on data stored on multiple local devices – such as on millions of smartphones – without data having to leave each device. This enables more privacy than traditional machine learning, which requires information to be aggregated from multiple devices on to one server. Federated learning could allow AI to tackle big problems – like developing drugs for rare diseases, preventing automobile accidents and fighting pandemics – while protecting privacy.
In the short term, federated learning is more likely to gain widespread acceptance than self-sovereign identity, because it is already on policymakers’ radars, is supported by deep-pocketed companies and does not require major changes in consumer behavior. Both approaches are in their infancy and face many ethical, design, legal, technical and security challenges.
Even if one of these approaches gains traction, a decentralized internet can only achieve its potential in tandem with robust public institutions. For example, decentralized, smartphone-enabled contact tracing can help fight a pandemic only if combined with competent political and technocratic leadership, capable health care workers, competent contact tracers and timely distribution of personal protective equipment, tests and vaccines.