Reading Time: 4 minutes

Assume you’re an investigator. Or maybe a compulsive stalker. Or perhaps simply an advertiser. You want to figure out some personal attributes of an individual for… reasons (objectionable ones more often than not). So, you log in to Facebook, search them up, and scan their profile.

Bad luck, though. They have set every important bit of information behind a wall of privacy or have refrained from mention it at all. In a simple and boring world, the story should end here.

But ours is a world of intrigue and wonder and you’re not someone who gives up easy. So you dig some more within the user’s profile and actually manage to acquaint yourself with some personal interests that they’ve openly disclosed on Facebook.

If you had the deductive capabilities of Sherlock Holmes, perhaps it would have been enough to figure out everything you need to know about the person in question based on their interests alone.

sherlock

While none of us are Sherlocks here, Artificial Intelligence is a reasonable approximation to the great detective’s talents.

This is what researchers actually managed to do with the help of an NLP (Natural Language Processing) algorithm: they predicted hidden personal attributes of Facebook users using their publicly disclosed interests alone.

Particularly, the researchers were able to predict the following personal attributes of users based solely on their music interest:

  • Age
  • Location
  • Gender
  • Relationship status

Although the paper in which the researchers published their findings is a little old now (published in 2012), it serves as an excellent demonstration of how private information can leak through social media networks even when you have made pretty much every data about yourself invisible to third-parties.

You Are What You Like

The privacy settings on social media websites today give users much greater control over their information visibility (the countless privacy violations of social media networks have had some use after all).

But is that really enough to prevent your hidden information from being exposed? Unfortunately, the research of scientists from INRIA France proves otherwise.

The researchers showed that by analyzing a user’s musical preferences and interests as disclosed on Facebook, it is possible to predict their age, gender, relationship status, and location. 

you-are-what-you-like

Since most users’ interests, likes, and dislikes are publicly viewable, a potential stalker simply needs to log in to Facebook and dig inside a Facebook profile to get a glimpse of their interests. From there, it is a matter of statistical correlation and probability to figure out the user’s more personal attributes like relationship status and location, even if these are not publicly shared by the user.

The methods of statistics can be fed into an AI program that can make these predictions about a user’s personal attributes on the basis of information about their likes on Facebook. But how exactly would such an analytical algorithm work?

From Music Taste to Personal Details

Suppose a person has taken an interest in finding out some details about you. For this example, let’s assume they want to figure out your age.

They log in to Facebook to try and find this information through your profile, but they see no hint about your age. However, this person can see that you’ve liked the official page of Metallica and it is clear that you are a fan of the heavy metal band from your profile.

Since this person has a vast dataset of millions of Facebook users, they can filter through this dataset and single out those individuals that like Metallica, based on their profile activity. For simplicity’s sake, assume that the person has found 1,000 user profiles on Facebook that like Metallica. Moreover, these users have also openly disclosed their age on their Facebook profile.

Their analysis shows that there are 900 profiles out of 1,000 who share an interest in Metallica are in the age group of 18-24. Thus, by majority voting, there is a good probability that your age is also 18-24.

majority-voting-technique

(This example is only used for illustrative purposes and has been simplified for clarity)

This is basically how the algorithm designed by the researchers works. Even though this approach appears too simple to be effective, the researchers obtained an inference accuracy of 72.5% when predicting some user attributes using this technique.

It bears mention that the researchers obtained this level of accuracy using only the single predictive factor of music interest. More sophisticated models capable of taking into account multiple user interests will not only be able to infer a larger range of personal attributes of the user, but do so with an even higher accuracy than originally obtained.

And that is a cause for alarm, because privacy-sensitive attributes of users on social media, even when hidden, are not as safe as conventional wisdom would have us believe.

The Consent Dilemma

The most powerful characteristic of this method of predicting personal attributes is that it solely relies on Facebook users’ self-disclosed information about music interests,

There is absolutely nothing dodgy involved in the approach, such as the use of malware or forced breaches of information. The technique simply makes the most of what’s available online, and information about our interests just happens to be something that’s profusely available in the age of social media networks.

This makes the said technique outstandingly challenging to tackle with any privacy protection legislation, especially since user consent is implicit in information that’s publicly accessible. The law can protect our privacy if there’s evidence of a breach or abuse of user information, such as when your private information is being accessed by third-parties.

privacy-exposure

In the present case, the technique solely relies on sophisticated guesswork enabled entirely by publicly available user information, which carries with it the implicit consent of public accessibility.

Against this, any legal defense to me seems inconceivable. You’d be asking to outlaw the practice of guessing itself in order to sue anyone for accurately predicting your relationship status from your musical preferences, which you happily disclosed with your own will.

Nonetheless, the researcher’s model can be used for purposes more nefarious and legally defensible than mere guessing games.

Preparing for a Privacy Emergency

Some of the possible abuses of the researchers’ technique involves doxing. With the ability to piece together personal details of a user based on their interest, doxing will become easier for attackers, leaving a greater number of users vulnerable to potential leakages of information and exposure online.

privacy-alert

Spammers could also have a field day with the power to match a user’s Facebook profile with their email address and spam their inbox with targeted ads sculpted to be in line with a user’s predicted interests.

For advertisers, these predictive techniques might be just what they need to efficiently profile users for ad targeting, while interest-based predictions of individual attributes might become a crucial component of upcoming adtech, with browser cookies on the verge of extinction.

However you look at it, it is the user who gets the short end of the stick as our privacy is reduced to a loosely hanging thread that will blow away any direction the wind takes it.

And thus, just when you thought you couldn’t be more hard-pressed to maintain some semblance of privacy, our advancing technological capabilities deal another blow to ensure we remain naked online.

We are inching ever closer to a situation where the only way forward might be to accept a world stripped of digital privacy; a right we traded to feed our own growing technophilia. Privacy may very well be a necessary sacrifice for further technological progress, but how prepared are we to take a step into a world where shadows only exist to protect the aggressor and the invasive spotlights remain glued to the aggressed?