Chapter 4 - Facilitators and relevant factors

4.4 The future is already here

4.4.1 Big Data - Overview

The umbrella term Big Data has been used to describe very large data sets that are difficult to filter, analyse and interpret using standard database management tools or traditional data processing applications.

The driving factors behind Big Data [160] are not just technology and the wide-spread adoption of the Internet but also large-scale projects to digitise information that was previously not available in electronic format [161] as well as datafication, i.e. the gathering of data about everything, including people, objects and locations. The concept of datafication is closely linked to the Internet of Everything, which is characterised by a steadily growing number of devices, sensors and actuators connected to the Internet. An example of datafication is the Quantified Self movement, which is a growing group of individuals who use wearable technology to constantly measure and share online data about their lives [162].

Big Data is often collected as a by-product of people’s actions, movements, contacts and social interactions. The term used to describe this is ‘data exhaust’. From a cybercrime perspective these are electronic trails that criminals create as part of their online activities and interactions using electronic devices. These trails may often not be immediately visible but may be ‘hidden’ in the data in the form of relationships and correlations. For example, credit card companies use Big Data analysis to identify potential credit card fraud. While in the past these companies could only monitor a small number of aspects of a transaction at once, they can now analyse several hundreds of aspects of a transaction at once and do so for a much larger set of data, which also means that credit card companies can now see patterns or anomalies that were not visible with smaller data sets [163]. And since these credit card transactions happen in real time, the analysis needs to happen in real time too. A simpler example of interest to LE would be geolocation data that is often automatically included when taking a picture.

Big Data supports a different approach to gathering intelligence; one that is not necessarily targeted and hypothesis-based but focuses on gathering as much data from as many sources as possible. While an organisation may not be able to process all data yet, it can immediately investigate a case or criminal rather than having to start gathering the information from scratch.

Big Data supports a data-driven approach to identifying and exploiting patterns and correlations among the data collected, an approach known as predictive analytics. Predictive analytics reveals correlations – the what – but is usually not suitable to explain causality – the why. While there are many examples where correlation is good enough it is sometimes important to look for causal relationships, particularly when it comes to the investigation of crimes.

Criminal exploitation

Criminals already use basic Big Data analytical approaches to increase the value of stolen data. This is done, for instance, by splitting the stolen data into better-quality sets before selling them underground. Big Data analysis will be increasingly used by cybercriminals to maximise the monetary value of stolen data, which will lead to a more competitive cybercriminal market [164].

Big Data together with the Internet of Everything provides cyber criminals with new attack vectors and an increased attack surface.

Big Data allows for better profiling, which is used by criminals and LE alike, for instance to identify potential targets or to improve social engineering attacks.