Like other domains, cybersecurity has been eyeing the promise of artificial intelligence (AI) and machine learning (ML) algorithms for some time.
Even back in 2019, nearly two-thirds of respondents to a Capgemini survey thought AI would ‘help identify critical threats’. Additionally, 73% of Australian organisations believed they would eventually be unable to respond to attacks without the assistance of AI.
Analysis by PwC since then confirms that AI is ‘making organisations more resilient to cybercrime’.
There’s been a steady movement for certain cybersecurity functions to harness AI/ML since then. It is now particularly prevalent in malware detection and increasingly for automating analysis and decision-making in data-intensive fields like incident detection and response.
But cybersecurity is also a particularly challenging space for AI/ML use. It’s fast-moving, and the stakes are high. Algorithms put to work in security need themselves to be trustworthy and secure, and this in itself is not easy to solve.
Through our work in this space over the past seven-plus years, some ground rules have emerged. In particular, some keys to success in this space include clean data, a good business case, and the ongoing involvement of domain and technical experts.
Getting these right will solve many of the current challenges and bring cybersecurity functions closer to the AI-augmented operating model they need.
The deal on data
Data is obviously a critical input into AI and machine learning algorithms, both initially as those models are being trained, and then on an ongoing basis when the models are put into production and are expected to constantly get better at what they do.
Most organisations don’t have clean, unified and consistent data to feed a model from the outset, and so many AI/ML projects begin with a period of data preparation before any of the actual AI work can proceed.
This has been understood for some time. The Capgemini survey, for example, noted that ‘buying or building a data platform to provide a consolidated view of data should be a first step for organisations that want to use AI in cybersecurity successfully.’ Still, that probably understates the level of effort involved.
In the cybersecurity world, it’s important to collect data from a range of different sources — from antivirus, firewalls, databases, cloud, servers and end-user devices — because that’s ultimately what will enrich data science and models for AI.
That data must then be treated: processed, analysed, and pulled into a format that can be consumed by the model.
Consistency matters at this stage, and in all likelihood, most organisations won’t have it. Not all logs contain the same metadata fields. Not all vendors even define log fields in the same manner.
Organisations will need to establish consistent definitions and contextualise the metadata present in their logs to understand what to train the model on and what it should be looking for.
Just add domain expertise
As mentioned in the previous section, context raises another important point about AI/ML adoption in cybersecurity.
Domain expertise is a crucial input. The first step in an AI/ML project is to get a domain expert to define what is happening in the environment that gives rise to the need for algorithmic assistance. The need may stem from a missed or bad detection of a threat, or from the existence of too many false positives. Whatever it is, it needs to be written down.
If AI/ML is considered the best way to address this challenge and a model is then developed, domain expertise will still be needed while the model finds its feet. The expert understands what is normal and what is unusual, even if the model doesn’t yet.
For example, in the days before mass remote work, it was easier to define what normal and unusual might look like in the traffic logs and patterns of a workday. Large-scale work-from-home introduced considerable unpredictability and complexity. A model will not be able to detect some of this nuance immediately, and oversight will be required.
A domain expert will be able to look at a model’s output and understand what data is missing or how the model is trying to detect a cybersecurity threat. This is important in understanding how to then tune the model.
That need for tuning will be ongoing. Attack tactics and techniques are constantly changing, so organisations that employ AI/ML for cybersecurity have to constantly look at what’s happening in the field to understand how that impacts the AI model and its detection ability.