Efforts by the US government to gain access to records from the world's leading search engines highlight the issue of
holding onto huge amounts of internet data, argues law professor Michael Geist.
The internet community has been buzzing for the past 10 days about the US Department of Justice's demand for search data from the world's leading search engines.
Google has refused to hand over its search records
Yahoo, AOL, and Microsoft have all reportedly complied with the request, however, Google refused, paving the way for a major court battle.
While much of the focus has been on the privacy implications of the Department of Justice request, the story highlights a much bigger issue - the significant risks and rewards that arise from retaining enormous amounts of data.
People have become accustomed to protecting their personal information by safeguarding their identification cards, shredding bank statements, or trusting their health provider to protect their medical files.
But they have limited control over search engines, internet service providers, and e-commerce companies that retain an ever-expanding mountain of data that can reveal personal preferences, interests, and habits.
The US demand stems from an attempt to prove that legislation, rather than technologies such as content filtering, would be more effective at blocking children's access to "harmful" materials.
In order to prove its case, it sought data from the leading search engines that would allow it to gauge the amount of available pornography on the internet, as well as the frequency with which users search for such content.
The authorities' initial data request was stunning for its sheer breadth. The Justice Department requested all web addresses contained in the Google database as well as a record of "all queries that have been entered into your company's search engine between June 1, 2005 and July 31, 2005".
In other words, it wanted a list chronicling every website in Google database along with literally every search request over a two-month period.
When it faced resistance, it agreed to a narrower request that included a random sample of one million web addresses as well as a list of every search string during a one-week period.
Although none of this data relates to a specific individual, the request has still produced a chilling effect as many begin to question whether search requests thought to be anonymous could ultimately be tracked back to them.
In a broader context, the demand also highlights the growing challenge associated with data retention. All companies, particularly those operating online, recognize the value of retaining information about their users.
Some information is essential to providing customer service, while other data can be used to provide users with a customised experience by eliminating the need to re-enter passwords, automatically posting relevant content, or sending permission-based e-mail marketing that accurately reflects the users' interests.
The value of information extends beyond personal data. Once aggregated, retailers can spot trends among demographic groups, internet providers can gauge usage patterns, and search engines can identify what is on the minds of the world's net users.
Given its value, it comes as little surprise to find that companies retain such data for lengthy periods, using sophisticated data mining technologies to analyse the information.
Yahoo was also approached by the Justice Department
While these previous examples illustrate the rewards of data retention, significant risks also exist.
The same data can be mined for purposes that extend far beyond the reasons for which it was initially provided. The Google case provides a classic illustration in this regard as mere search terms take on a new significance in the hands of Department of Justice lawyers.
Some data is not consciously provided at all - it is simply gathered automatically with little thought given to its potential uses.
For example, private parties may demand internet server logs that are generated automatically to assist with new defamation or copyright lawsuits.
One of the biggest risks associated with data retention comes not from requests that proceed through the legal system, but from security vulnerabilities that puts sensitive data into the hands of hackers.
>Last year, more than 50 million people in North America received notifications that their personal information had been placed at risk due to a security breach.
Policy makers worldwide have scarcely begun to reconcile the risks and rewards of data retention.
In the immediate aftermath of the Google issue, at least one US politician has called for new legislation to set limits on data retention and establishes a positive obligation to destroy data under certain circumstances.
In Europe, the debate has centered on mandating data retention to assist law enforcement. The European Parliament engaged in a contentious debate on the issue during the fall, though a compromise was ultimately reached that will result in the retention of a vast amount of data.
The Google case highlights the need for a clear legal framework that balances the risks and rewards associated with data retention. In light of recent events, it is time to search for some solutions.
Michael Geist holds the Canada Research Chair in Internet and E-commerce Law at the University of Ottawa, Faculty of Law.