Reading Time: 4 minutes

Assume you’re an investigator. Or maybe a compulsive stalker. Or perhaps simply an advertiser. You want to figure out some personal attributes of an individual for.. reasons. So, you log into Facebook, search them up, and scan their profile.

Bad luck, though. They have set every important bit of information behind a wall of privacy. In a simple and boring world, the story should end here.

But ours is a world of intrigue and wonder and you’re not someone who gives up easy. So you dig some more within the user’s profile and actually manage to acquaint yourself with some personal interests that they’ve openly disclosed on Facebook.

If you had the deductive capabilities of Sherlock Holmes, perhaps it would have been enough to figure out everything you need to know about the person in question based on their interests alone.


While none of us are Sherlocks here, Artificial Intelligence is a reasonable approximation to the great detective’s talents.

This is what researchers actually managed to do with the help of an NLP (Natural Language Processing) algorithm: they predicted personalities of popular Facebook users using their publicly disclosed interests.

Particularly, the researchers were able to predict the following personal attributes of users based solely on their music interests:

  • Age
  • Location
  • Gender
  • Relationship status

Although the paper in which the researchers published their findings is a little old now (published in 2012), it serves as an excellent demonstration of how private information can leak through social media networks even when you have made pretty much every data about yourself invisible to third parties.

You Are What You Like

The privacy settings on social media websites today give users much greater control over their information visibility (the countless privacy violations of social media networks have had some use after all).

But is that really enough to prevent your hidden information from being exposed? Unfortunately, the research of scientists from INRIA France proves otherwise.

The researchers showed that using a user’s musical preferences and interests as disclosed on Facebook, it is possible to predict their age, gender, relationship status, and location. 



Since most users’ interests, likes, and dislikes are publicly available, a potential stalker need not toil too hard to see what most users like on Facebook. From there, it is a matter of statistical correlation and probability to figure out your more personal attributes like relationship status and location.

So, how exactly does an AI program figure out personal details about you on the simplistic basis of what you like?

From Music Taste to Personal Details

Suppose a hidden personal attribute in your Facebook profile is your age (which is what we want to infer) and you’ve liked the official page of Metallica (which you’ve decided to make public). Further assume for simplicity’s sake that there are 5 other public Facebook profiles interested in Metallica who also happen to have openly disclosed their age for all to see. If 4 out of 5 of these are in the age group 18-24, then by majority voting your hidden attribute (in this case, your age) is also likely to be 18-24. 


(This example is only used for illustrative purposes and doesn’t necessarily correspond to reality)

While the approach appears too simple to be effective, the researchers obtained an inference accuracy of 72.5% for some user attributes. This was achieved using the single predictive factor of music interest out of a much larger pool of publicly viewable personal interests. More sophisticated models capable of taking into account multiple user interests will not only be able to infer a larger range of personal attributes, but they’ll do so with even better accuracy.

And that is a cause for alarm, because privacy-sensitive attributes of users on social media, even when hidden, are not as safe as conventional wisdom would have us believe.

The Consent Dilemma

The most powerful characteristic of this method of predicting personal attributes is that it solely relies on Facebook users’ self-disclosed information about music interests,

There is absolutely nothing dodgy involved in the approach, such as the use of malware or forced breaches of information. The technique simply makes the most of what’s available online, and information about our interests just happens to be something that’s profusely available in the age of social media networks.

This makes the said technique outstandingly challenging to tackle with any privacy protection legislation, especially since user consent is implicit in information that’s publicly accessible. The law can protect our privacy if there’s evidence of a breach or abuse of user information, such as when your private information is being accessed by third-parties.


In the present case, the technique solely relies on sophisticated guesswork enabled entirely by publicly available user information, which carries with it the implicit consent of public accessibility.

Against this, any legal defense to me seems inconceivable. You’d be asking to outlaw the practice of guessing itself in order to sue anyone for accurately predicting your relationship status from your musical preferences, which you happily disclosed with your own will.

Nonetheless, the researcher’s model can be used for purposes more nefarious and legally defensible than mere guessing games.

Preparing for a Privacy Emergency

Some of the possible abuses of the researchers’ technique involves doxing. With the ability to piece together personal details of a user based on their interest, doxing will become easier for attackers, leaving a greater number of users vulnerable to potential leakages of information and exposure online.


Spammers could also have a field day with the power to match a user’s Facebook profile with their email address and spam their inbox with targeted ads sculpted to be in line with a user’s predicted interests.

For advertisers, these predictive techniques might be just what they need to efficiently profile users for ad targeting, while interest-based predictions of individual attributes might become a crucial component of upcoming tech, with browser cookies on the verge of extinction.

However you look at it, it is the user who gets the short end of the stick as our privacy is reduced to a loosely hanging thread that will blow away any direction the wind takes it.

And thus, just when you thought you couldn’t be more hard-pressed to maintain some semblance of privacy, our advancing technological capabilities deal another blow to ensure we remain naked online.

We are inching ever closer to a situation where the only way forward might be to accept a world stripped of digital privacy; a right we traded to feed our own growing technophilia. Privacy may very well be a necessary sacrifice for further technological progress, but how prepared are we to take a step into a world where shadows only exist to protect the aggressor and the invasive spotlights remain glued to the aggressed?