This post is inspired by a NYTimes article.
Something worth pondering is a method that could allow companies like Facebook to share their data without revealing any personal privacy.
There are a lot of problems with both sharing and not sharing Facebook’s (and other companies alike)’s data. As users, we don’t want people to look into these data and find things pertaining to us individually. As scientists, we want transparency and the release of any research data for verification. For the sake of science.
While working at the Mobile Experience Lab (@MIT Media Lab) with Avea, a mobile company of Turkey, on methods of identifying “power users” in their mobile network, our research ability was severely limited by the lack of real data to analyze. Our collaboration also became very tedious. The workflow included generating random data in order to test our algorithm. Yet we would have nothing to look at, no patterns to find once our algorithm is ran.
However, in a pure theoretical point of view, there need to be a way to allow scientists to analyze user data without the ability to pin down a particular individual.
On a very high level, I see two different ways to do this. One is to eliminate information such as name, address, phone number when these data are being sent for analyze. Another is a method that restricts researches to analyze individual data, only allow experiments to run on batches of data. Any attempt to hone down on a specific person or small group of people will result in inaccurate data.
As for the first suggestion, there is a clear problem. For most of us, our name, address, phone number are our identifications. But for others, they could be in a company of 1 or few, thus their company became another identifying. This also extends to their network, their selection of “likes”. Their statuses may reveal things about them. By this argument, we soon eliminates almost all information about a person, and therefore sharing no data. It is clear that this will not work.
As for the second suggestion, there could be hope. If we can add a virtual layer between data, and researchers — a layer that encrypts or smudges the data so it’s impossible for us, or our programs to align individual data and find anything particular about a single person, or a small group of people, then we will be close to fixing the tension between companies, users, and scientists.
I haven’t thought enough to figure out if this is possible or not. I could totally be bluffing.