internet data privacy xray tool
August 19, 2014

XRay Tool To Combat “Data Frenzy”

Eric Hopton for - Your Universe Online

Researchers at Columbia Engineering are developing a new tool named XRay which aims to combat what they see as “data frenzy” and is “a first step in understanding how personal data is being used on web services.”

Data is potentially the most valuable commodity in the modern world. It is the lifeblood of the web giants like Google, Yahoo, Amazon, YouTube and countless other online businesses. Personal information to feed the data beast is collected every time we use a digital device. We know it happens, but we cannot see it working. As individuals and collectively we might expect or demand transparency but we have no idea in most cases where our personal information goes and how it is used. We have no knowledge of the process and therefore no control. XRay’s creators hope to change that.

Roxana Geambasu and Augustin Chaintreau, assistant professors of computer science at Columbia Engineering, will be launching a prototype of XRay at the USENIX Security Symposium in San Diego on Wednesday, August 20th. They claim that XRay “reveals which data in a web account, such as emails, searches, or viewed products, are being used to target which outputs, such as ads, recommended products, or prices.”

Though they cite some ways in which this type of data use can be beneficial such as “Amazon offerings, Netflix suggestions, and emergency response Tweets,” they are concerned that, if left uncontrolled, the dangers of privacy abuse and unfair or dishonest commercial use could flourish unchecked.

When Geambasu and Chaintreau began their research they realized that this was almost virgin territory. There was no science of the capture and use of personal data at what they call the “fine grain” level – individual emails, photos, web posts and so on. They were starting from scratch but, encouraged by their initial theoretical results, they began to put XRay into practice and using the results to fine tune the tool.

The current incarnation of XRay is set up only for use with Google, Amazon, and YouTube but the developers describe it as “service-agnostic” allowing it to be easily reassigned to other web services.

XRay works like this. Emails are created containing keywords with associations to specific topics which might, for example, include certain types of illness, race, divorce, debt and so on. XRay is then used to collate and analyze the “targeting associations” of the resulting ads. The initial results have been impressive and the creators have summarized the findings in three main bullet points:

1. It is definitely possible to target sensitive topics in users’ inboxes, including cancer, depression or pregnancy.

2. For many ads, targeting was extremely obscure and non-obvious to end-users - which opens them up to abuse.

3. The researchers have already seen signs of such abuse. For instance, a number of subprime loan ads for used cars targeting debt on users’ inboxes.

The developers hope that XRay will open up this obscure world and help promote greater “voluntary” transparency from data-collecting sites. If that doesn’t work, they say the tool could be a powerful and previously unavailable tool for investigators and watchdog bodies and would enable those organizations to delve far deeper into the opaque workings of data collection, use, and abuse.

Some of the results the XRay demo version threw up illustrate how Gmail targets some of its ads. A lot of their results demonstrate strong correlation between ads and keywords. Others are less than clear. You can view a table of some of these here. Emails containing the keyword "Divorce" threw up ads for law firms. Using the word “Alzheimer,” while bringing up predictable results for “Adult Assisted Living” and “Affordable Assisted Living,” also drew ads for “Black Mold Allergy Symptoms?” and “Expert to remove black mold.”

Is this an example of a sophisticated targeting? The symptoms of Mycotoxicosis (a serious allergy to mold) and Alzheimer’s can be very similar and there is some debate around the potential for one disease to be confused with the other and whether in fact Mycotoxicosis is actually a potential cause of Alzheimer’s.

The keywords “debt” and “broke” attracted ads such as “Great Credit Card Search,” “Apply for Visa, Mastercard…” and “loan” resulted in “Car Loan without Cosigner 100% Accepted” and “Car Loans w/Bad Credit 100% Acceptence!”

The demo version of XRay is available to view here but comes with a number of warnings about “limitations and caveats”. A detailed breakdown of the science and limitations of XRay is available in pdf form but the developers sum it up more precisely for the layman when they warn “please use XRay’s data to gain intuition and not as absolute truth!”