Cybercrime threat intelligence: A systematic multi-vocal literature review

Threat Intelligence focuses precisely on the organization, analysis, and development of detailed information to protect and prevent an organization (governmental or private) from suffering a cyber attack (Tounsi & Rais, 2018). The lifecycle of this discipline generally involves 6 phases: direction, collection, processing, analysis, dissemination, and feedback (see Fig. 1). The sources of information can be gathered from the clear web or the dark web. All public and open-source information are relevant to prevent attacks or inform a company of a data brief that has already been completed. Being a recent discipline, the literature around that field is contemporary and would benefit from being more developed.

Researchers Cascavilla, Tamburri, and Van Den Heuvel (2021) carry out a systematic review of articles and reports from gray and white literature. Specifically, they aimed to answer five research questions: 1) Which online depth levels are assessed and to what extent? 2) What degrees of anonymity exist for web-crawling? 3) What policies exist to varying degrees of anonymity? 4) Which website features are most indicative of cyber threats? 5) Which risk assessment techniques exist today?


Figure 1. Threat intelligence lifecycle

To select relevant articles, the authors used various inclusion or exclusion criteria for gray literature texts. Three hundred seventy-four studies were selected through the inclusion/exclusion process.

Regarding the results, the researchers found the following main conclusions from their research questions:

1) What online depth levels are assessed and to what extent?

 Deep and dark web cyber threat engineering and management have predicated much on network-based analysis and low-level artifact mining (e.g., packet mining, code analysis, etc.). Higher-order and multi-vocal data is remaining unused and deserves further attention.

2) What degrees of anonymity in web crawling is it possible to obtain?

There is no conclusive exploration/analysis anonymity procedure in state of the art; this avenue is open to further research opportunities and urgently needs to be addressed by leading law enforcement agencies across the European Union.

3) What website features are most indicative of cyber threats?

Surface, web analysis literature predicts software code features over appearance metrics for online source risk assessment; conversely, deep and dark web analysis literature seems to predilect appearance features, e.g., website text content mining. Little to no cross-fertilization between the two fields has been investigated so far and may require further attention.

This study answers several fundamental questions in the discipline of threat intelligence. The authors discuss the relevance of encouraging practitioners and researchers to focus on creating a holistic tool aimed at helping law enforcement in the fight against cybercriminals, fostering the creation of a community open to sharing knowledge between analysts, and creating tools that aim to assess the risks associated with criminal activities for massive databases.

To cite: Cascavilla, G., Tamburri, D., Van Den Heuvel, W. (2021). Cybercrime threat intelligence: A systematic multi-vocal literature review. Computers & Security, 105. https://doi.org/10.1016/j.cose.2021.102258