Necessary and inherent limits to internet surveillance

Information technologies now play a huge role in both personal and institutional life, playing the role of a global communications medium. As our means of interaction increasingly centre on the internet, there is a desire from nation states to exercise control and obtain access to the communications of citizens. The stated reasons for this access and control are to prevent or investigate crimes, and to protect national security. This article argues that mass untargeted surveillance of internet-based communications is an excessive tool with respect to its potential for abuse against both society and individuals, and that its ability to prevent crime or terrorism are limited. By looking at existing technologies and example cases where surveillance has been applied, this article demonstrates that there are both inherent mathematical and technical limits to the potential for surveillance to achieve broad-scale prevention of crime and terrorism. In addition, the potential of surveillance to result in real harm to society necessarily places severe limits on how this technique should be applied in a free and democratic society.

globally.At the same time, these revelations have demonstrated the ease with which surveillance policies can extend to a large proportion of the world population, and the difficulties of providing meaningful levels of oversight and transparency for these policies.
The technical capabilities of the internet not only allow for surveillance, they encourage us, through convenience, to place more and more of our lives into the spotlight.We now read news, search for information, talk to friends, organise social and business life, bank, and meet potential partners via the internet.There is no precedent that can even approximate a model for the pervasiveness of the internet in our lives -not the phone network, not post or telegraph, not CCTV surveillance.Equating the internet with historical technologies when making policy is not simply wrong, it is dangerously misleading.
In making use of the internet to underpin so many aspects of our lives, unprecedented levels of data can now be collected, stored, and analysed, and are increasingly combined and controlled in a largely centralised manner.While sharing this data is, in some cases, contentious, we are increasingly unable to interact with internet services without explicitly or implicitly revealing personal information.The algorithms used to infer future behaviour from this data have improved; the computers that run these algorithms have become faster and more able to handle larger volumes of data; and the results of these inferences are being used to drive decisions, from customising search results to offering, or refusing to offer, tailored products and services.Despite this, there are both ethical and technological limits to data-driven surveillance.Simply because data is generated and can be stored does not suggest that states should abandon fundamental principles and surveil entire populations rather than targeted individuals (United Nations Human Rights Council, 2013).The improvements in predictive algorithms over recent years are still subject to fundamental limitations in their accuracy and applicability.
From the state's perspective, the desire for surveillance is easy to understand.Such a wealth of data seems to promise an oracle allowing security services not only to investigate, but also to detect, predict and prevent crimes -and ubiquitous surveillance can, certainly, achieve some of these goals.From the perspective of citizens the balance between an abstract notion of privacy and a more direct feeling of security, whether actual or perceived, can lead to popular support for invasive surveillance measures (YouGov Poll, 2012).
The wealth of data that surveillance reveals, however, tips the balance decisively from its power to help towards its power to harm.Vast amounts of information can be handled by faster and faster computers, but the power and accuracy of predictive algorithms are not so scalablewhen applied blindly to entire populations the ability to identify suspicious patterns is lost in the flood and becomes either worthless or actively harmful.
Pervasive and detailed information on individuals is a powerful tool.When investigating a crime the details of suspects' activities, communications, and habits can be highly valuable.This tool, however, can be used just as effectively against all those individuals who are not under suspicion -blackmail, fraud, stalking, and simple invasion of privacy are all enabled by such collections of data just as effectively as the investigation of crime.Whether such abuse comes from institutional malpractice, malicious insiders, or external attackers who gain access to stored data, it is the gathering of such data that enables abuse.

LAW AND TECHNOLOGY
There is a significant disparity between the rate at which technology develops and the rate at which new laws can be brought into force.The application of outdated laws to technologies that were inconceivable when the laws were drafted is now a common occurrence.The United Kingdom's Computer Misuse Act was passed in 1990, in response to a number of high-profile hacking incidents that took place in 1988, the year before the world wide web itself was first proposed.Even with significant amendments made fourteen years later this legislation, with no mention of continuously connected smartphones reporting our location and data and services moving to cloud computing, already appear worryingly outdated.
The inability of the law to keep pace with technology can lead to significant difficulties in appropriate interpretation of laws.The ongoing debates concerning the separation between metadata and content of communications are one example, but there have been significant discussion regarding when emails are considered to have been delivered, rather than being in transit, as well as the specifics of whether viewing an image on a website is considered to be 'viewing' or 'creating' an image1.
It would arguably be far more dangerous for the law to attempt to track new developments too rapidly.A common principle in software engineering, particularly in the world of open source software, is summarised by the phrase 'release early, release often'; imperfect software is often made available as soon as it is even partially functional, allowing bugs to be discovered and fixed by the community.Laws, thankfully, are not typically created and tested in the same way.There is clearly a fine balance to be struck between applying increasingly out of date laws to new technologies, and in overly reactionary lawmaking.This is, however, an ongoing and dynamic process that neither has, nor is amenable to, a fixed solution.
Surveillance in particular, however, presents a particular concern within this landscape.Not only do the details of the technology have serious and subtle effects on the efficacy of the laws, the laws themselves are often driven by emotive and high-profile events such as terrorism or child abuse.
Where the discrepancy between the pace of technological change and the pace of legal change can have severe effects, such as the development of surveillance tools and methods far beyond the scope considered possible when laws are drafted, lawmakers must consider carefully the risks that arise from the future development and application of technologies.Where limitations to the power of a surveillance approach are inherent in the current state of technology, legal restrictions may be absent.As technology develops, the technological limitations erode, which can lead to the extension of the surveillance far beyond its original intent.
The erosion of implicit limitations is exemplified in the ongoing debates, both as part of the NSA PRISM revelations and in rhetoric related to surveillance programmes such as the UK's proposed Communications Data Bill, regarding the difference between content data and communications data.A traditional view has held that communications data, or metadata, about communications without the content of those communications, is relatively harmless and so subject to much lower protections than content data.However the expanded range of services for which metadata are available, the uncertain delineation between content and communications data, and the improved capacity to store and analyse such data, have greatly increased the power of metadata analysis.
Crucially, and challengingly, it is necessary to differentiate between short-term limitations that exist in current technologies and that will disappear as technology develops, and those limitations that are fixed and inherent.

RISKS AND BENEFITS
The use of surveillance to investigate and prevent crime almost unavoidably carries with it the risk of infringing on individual rights to privacy and freedom of expression, such as those set out in the European Convention on Human Rights.It is crucial to consider both the benefits of the proposed approach in terms of achieving its stated goals, against the risks of failure, abuse and misapplication.
The benefits of surveillance technologies are easily expressed -an improved ability to detect, investigate, and prevent crime.By gathering more data, it is argued, crime can be reduced.
While this notion is intuitively appealing, such a sweeping generalisation hides complexities.
What types of crime are prevented by increased surveillance?How effective is surveillance in solving crimes, and how does this effectiveness change as levels of surveillance increase?These factors have been particularly studied (PDF) in the UK context of CCTV coverage; evidence has shown that the benefits are variable, and the effectiveness low when compared with the costs (Armitage, 2002).It is important both to understand the limitations of an increase in surveillance and to balance it against the risks to society of overly-invasive surveillance practices.
When investigating crime, the ability of police and intelligence services to gain information regarding a suspect's communications and activities is clearly a great benefit.The parties with whom an individual has been communicating, the frequency and duration of those communications, the content, the websites accessed, all provide significant insight.To a large extent, the ability to gain such information concerning a suspect is already provided for in laws such as the UK's Regulation of Investigatory Powers Act.This information is not, generally, available for a suspect's past beyond the provisions of the EU Data Retention Directive, which is restricted largely to user account information and the email account provided by the internet service provider (ISP).There are, however, significant and growing powers to obtain data of individuals that have come to the attention of the relevant bodies, such as those set out in the French LOPPSI2 and German use of trojan horse programmes to gain access to the computers of suspects (Chaos Computer Club, 2011).
A separate issue is that of using blanket surveillance, as exemplified by the US PRISM and XKeyscore or UK Tempora programmes to detect and prevent crime.Intuitively, a similar argument applies: by data mining large volumes of population data, indicators of criminal activity can be isolated and the suspicious parties can be placed under closer surveillance.This argument, however, highlights an inherent flaw in the ability to detect and identify such patterns in large populations, and thus suggests that such an approach is not as beneficial as it might first appear.
When applied over large scale populations, a well-known statistical effect known as the base rate fallacy highlights that even extremely accurate detection methods quickly become unable to identify persons of interest meaningfully.This effect is due to high rates of false positives, in which an innocent individual is identified as suspicious; and false negative, in which a suspicious individual is not detected.A full discussion of this issue is outside of the scope of this article; see this link for a useful summary of the problem (Bar-Hillel, 1980).
We therefore see a significant distinction between the power of increased surveillance to be of use in investigating crime, and in detecting crime.For the former, increased access to information is likely to be of use; in the latter case there is a strong argument to be made that too much information over too great a population is counterproductive.Even for the investigation of crime, however, the risks of the misapplication of technology, and the implications of creating an infrastructure that allows for the gathering of such information, must be seriously considered.

ELEMENTS OF SURVEILLANCE
The internet is a complex and partially decentralised network of networks that is largely dissimilar to older technologies such as the telephone network.The services that run on this network -websites, email, voice calls, instant messaging, peer-to-peer networking -present a range of challenges for interception and analysis.This is further complicated by the increasing use of strong encryption as a default for online services.
There are, therefore, two major sources of data concerning the activities of internet users: data stored by service providers, such as Google and Facebook as well as ISPs; and data stored on devices, such as smartphones and computers, of users themselves.Both of these sources of information are targeted by online surveillance, and present their own difficulties and risks.A similar approach has been implied by the discussions surrounding the United Kingdom's Communications Data Bill.The proposed legislation sought to provide a means for law enforcement and intelligence services to extend monitoring of communications data to services such as GMail and Facebook, although the specific details of how this would have been achieved at a technical level were not revealed at the time.It was hypothesised that, due to the use of encryption, this would require direct cooperation of the service operators; this view has been supported by the revelations surrounding PRISM and Tempora.
What is clear, however, is that surveillance on the modern internet requires interaction with private corporations, who use strong encryption to protect their users' communications from hackers.With strong encryption on transfers, supported by strong authentication of the services to which users connect, the only realistic option2 is requiring companies, including Facebook and Google but also smaller providers of similar services, to install infrastructure in order to perform interception.We must ask ourselves who is to purchase, install, control, and maintain this infrastructure?With whom does liability for misuse or misapplication of these technologies lie, and how can such a system provide transparency and safeguards against abuse?The recent details concerning these programmes has shown that there are few satisfactory answers in existing approaches.
A second approach targets user devices directly.This is an extremely powerful approach, as user devices are both a source and recipient of the users' communications and provide a single point of access for all services that a user accesses, as well as potentially providing further access to online services.Direct access to user devices is, of course, a difficult technical challenge and lends itself less readily to mass surveillance; software must be installed on user devices, presumably without their knowledge, which is infeasible for an entire population.This approach has been taken in Germany, as we shall see in §5.1, as well as being provided for in recent French security legislation.
What is perhaps a greater concern than the specific approach taken to surveillance, is that of the infrastructure that supports that approach.As has been discussed above, surveillance based in the service provider requires the installation of physical devices that allow access to communications, which must either be directly controlled by the state, or which are managed by the provider in response to requests from the state.In either case, a technical capability has been created to log and make available large amounts of user data, which in turn creates the potential for such data to be accessed without the correct authorisation, either by company employees, by members of the intelligence services without correct authorisation, or by third parties that gain access.
In the second case, which targets user devices, appropriate invasive software must be developed, along with the means for deploying it on user devices.As such software must be installed without user knowledge, there must either be provision for physical access to the device or for remote compromise of the machine via the internet.Again, with such software and procedures in place the potential for abuse is great.
Most notably, where police and intelligence services are directly engaged in hacking into remote machines, as has been proposed in France, Germany, and The Netherlands, the means to do so must be obtained.A subject of increasing concern is that of governments, notably that of the United States of America, entering the market for "0-day exploits" (Reuters, 2013) -newly discovered software vulnerabilities that enable remote compromise of machines.The existence and promotion of such a market is certainly a cause for concern (PDF) (Reporters Without Borders, 2013).The European Parliament, in late 2012, adopted a resolution against the export It becomes increasingly clear, therefore, that strict and independent audit, of the means of surveillance, surveillance requests, and data handling should be a key element of any proposed surveillance framework.Due to the inherent lack of transparency associated with these approaches this must be supported by stringent penalties for misuse of either powers or data, and supported by independent oversight.A key concept, certainly as expressed by the rhetoric that surrounds these measures, is the need to maintain capabilities to observe and analyse the communications of suspected criminals.The capability for wiretapping telephones already exists and, it has been repeatedly argued, unless this ability is extended to the internet then important communications will be beyond the reach of investigators.The recent NSA leaks have shown that this extension of scope to the internet is far more developed and widely deployed than had been thought.

SCOPE OF SURVEILLANCE
As has been argued above, the telephone network cannot form a reasonable allegory for the internet in terms of surveillance policy.The number of daily activities and services, and the amount of data about individuals that is centralised in computers makes any access to this system inherently far more invasive than access simply to communications.Restricting the level of this access, so that reasonable levels of investigation into criminal activities can take place without violating the rights of internet users, is arguably the most important policy challenge in developing such schemes.
One important limitation of surveillance powers, that has rightly been the subject of much confusion, is the distinction between content data and communications data.This has often been expressed as the ability to see who is emailing whom, the communication, but without being able to read the content of those emails.This intuitive explanation, which seeks to alleviate concerns of invasion of privacy, hides a great deal of complexity.While communications and content can easily be separated in a simple model such as email, the distinction is by no means so clear in other forms of internet traffic.
For access to a website it is easy to see that the URL 'google.com' is distinct from the page of search results that you see when querying the site.In reality, however, a Google search for 'government surveillance' will result in a URL of the form 'http://www.google.com/search?q=government+surveillance'.With the search terms, and potentially other identifying information, so prominently highlighted, should this be considered communication or content data?
Within the UK, when these matters were first debated leading up to the Regulation of Investigatory Powers Act Great Britain, 2000, the UK's Foundation for Information Policy Necessary and inherent limits to internet surveillance Internet Policy Review | http://policyreview.info 3 August 2013 | Volume 2 | Issue 3 Necessary and inherent limits to internet surveillance Internet Policy Review | http://policyreview.info 7 August 2013 | Volume 2 | Issue 3 trade in information technologies, particularly surveillance, that can be used to restrict or impinge on human rights (EU Parliament, 2012).
Focusing on internet-based surveillance of communications, many surveillance powers currently being debated or brought into force in Europe address, as has been noted above, two major approaches: increased powers to investigate the communications of targeted individuals, and increased blanket surveillance of large-scale population without suspicion.Both of these concepts, which are by no means mutually exclusive, bring their own risks.It is important to consider why these proposals are being made.