Suppose a laptop were found at the apartment of one of the perpetrators of last year’s Paris attacks. It’s searched by the authorities pursuant to a warrant, and they find a file on the laptop that’s a set of instructions for carrying out the attacks.
The discovery would surely help in the prosecution of the laptop’s owner, tying him to the crime. But a junior prosecutor has a further idea. The private document was likely shared among other conspirators, some of whom are still on the run or unknown entirely. Surely Google has the ability to run a search of all Gmail inboxes, outboxes, and message drafts folders, plus Google Drive cloud storage, to see if any of its 900 million users are currently in possession of that exact document. If Google could be persuaded or ordered to run the search, it could generate a list of only those Google accounts possessing the precise file — and all other Google users would remain undisturbed, except for the briefest of computerized “touches” on their accounts to see if the file reposed there.
A list of users with the document would spark further investigation of those accounts to help identify whether their owners had a role in the attacks — all according to the law, with a round of warrants obtained from the probable cause arising from possessing the suspect document.
Jonathan Zittrain on the aggregator’s dilemma.
This, incidentally, is what is really meant by “big data”. And, if this scenario freaks you out, you have to keep in mind that this is essentially exactly how Gmail advertising works.