Identifying People by Metadata

Interesting research: “You are your Metadata: Identification and Obfuscation of Social Media Users using Metadata Information,” by Beatrice Perez, Mirco Musolesi, and Gianluca Stringhini.

Abstract: Metadata are associated to most of the information we produce in our daily interactions and communication in the digital world. Yet, surprisingly, metadata are often still categorized as non-sensitive. Indeed, in the past, researchers and practitioners have mainly focused on the problem of the identification of a user from the content of a message.

In this paper, we use Twitter as a case study to quantify the uniqueness of the association between metadata and user identity and to understand the effectiveness of potential obfuscation strategies. More specifically, we analyze atomic fields in the metadata and systematically combine them in an effort to classify new tweets as belonging to an account using different machine learning algorithms of increasing complexity. We demonstrate that through the application of a supervised learning algorithm, we are able to identify any user in a group of ,000 with approximately 96.7% accuracy. Moreover, if we broaden the scope of our search and consider the most likely candidates we increase the accuracy of the to 99.22%. We also found that obfuscation is hard and ineffective for this type of : even after perturbing 60% of the training , it is still possible to classify users with an accuracy higher than 9%. These results have strong implications in terms of the design of metadata obfuscation strategies, for example for set release, not only for Twitter, but, more generally, for most social media platforms.

Posted on July 30, 2018 at 6:35 AM


Source link

No tags for this post.


Please enter your comment!
Please enter your name here