A language Model for Compromised User Analysis

Accounts of social network users can be compromised in a number of ways. When malware infects a user account, that account can be used to spread spam and malware as well as to gather personal information. Current online systems employ authentication mechanism to verify user’s identities and determine their corresponding authorities. Once authenticated, users are not considered as a threat by the system they are using. However, attackers can takeover one’s account and impersonate the original user.

In this study, Tien Phan and Nur Zincir-Heywood proposed a forensic analysis system, which employs artificial neural networks to identify users based on their writing styles. This approach is based on the assumption that each person has its own writing style and if we could analyze enough data generated by different users, we can distinguish different styles of writing. Such a system would be capable of differentiating a compromised account where the attackers imitate the legitimate users. The researchers used three different datasets (Reuters Corpora, Enron and Twitter) to evaluate their system. The results showed that the proposed system outperforms the other ones (Naive Bayes, C5.0 and LibSVM). The results also indicate that it is possible to generalize the language model learned to different datasets with a 85% accuracy.

This study shows promising results as it demonstrates the overall consistency of the proposed model compared to other learning systems.

Cite: Phan, T. D. & Zincir-Heywood, A. N. (2018). A language model for compromised user analysis. NOMS 2018 – 2018 IEEE/IFIP Network Operations and Management Symposium, Taipei, pp. 1-4.

Source: https://ieeexplore.ieee.org/document/8406317/