AI based cyber security for UK government networks

13th May 2020

Sam Cheadle
Data Scientist, Nominet

A recent report on the use of Artificial Intelligence (AI) for UK national security has highlighted its role in the next generation of cyber security.

To combat ‘adversarial AI’ techniques that are increasingly being used by cyber criminals to create adaptable and ever-changing malware, effective AI is needed for network defence. When looking at the Domain Name System (DNS), for example, machine learning techniques can detect patterns in the source and destination of internet queries and help to identify new and evolving threat types.

DNS based cyber defence protects UK government networks today, notably through the Protective Domain Name Service (PDNS), which Nominet implements on behalf of the National Cyber Security Centre (NCSC). The PDNS service was built to obstruct the use of DNS for malware distribution and communication, ensuring that as malicious techniques evolve, so does network defence. Through close collaboration, Nominet and the NCSC are developing world leading cyber analysis and incident management capabilities, helping to make the UK public sector and wider UK cyber ecosystem a safer place.

Detection via machine learning techniques 

Machine learning techniques are needed to understand what ‘normal’ behaviour looks like and can consequently allow us to identify unusual or suspicious behaviour that could indicate a cyber attack.   

For example, a suspicious website could be identified based on the geographic spread of connections from across the globe. While malicious behaviour might be identified due to a spike in queries to ‘known bad’ top level domains (TLDs), commonly used for malware communication. In this context, the value of AI is in being able to learn the rich variety of ‘normal’ behaviour; something that is much harder to do using traditional statistical techniques. 

Specifically, for the UK Government, some of the machine learning techniques used in PDNS include:

Network threat detection: Patterns connected to the source, volume and destination of DNS queries are closely monitored in order to flag anomalous activity.
Highlighting of machine generated domain names, that are commonly associated with botnets and domain generation algorithms (DGAs): Fast identification using natural language processing (NLP).
Fraudulent website detection through image analysis: Convolutional neural networks used for image recognition. An AI technique aimed at replicating the flexibility of the human visual system. For example, phishing websites attempting to steal usernames and passwords often masquerade as legitimate sign-in pages, reusing recognisable organisational logos.
Cluster analysis to find previously unknown associations between websites or clients within a network: For example, grouping malicious domains into ‘threat families’ by aggregating data from across the PDNS network. Recent work uses machine learning to link heterogenous datasets (e.g. sinkhole logs) to parent DNS queries, enabling more accurate labelling of threats, and faster remediation.

Bringing AI and humans together 

While automation of the entire threat detection process is a central aim of most AI research, it is also important to consider scenarios in which human users may be placed ‘in the loop’; otherwise known as ‘augmented intelligence’ analysis.  Machine learning algorithms can find associations between datapoints in vast quantities of data, when rules or characteristics that may link the datapoints are not well defined. AI is therefore being used to transform ‘high dimensional’ data into simpler representations, more manageable for human decision making.

For example, Nominet has developed techniques for clustering newly observed domains (often used for malicious purposes), helping a human analyst to group together similar domain names based on a range of characteristics. By capturing the most important information and representing this in an intuitive way, AI can be used in combination with expert human analysts to speed up the process of threat detection and response. The example plot below groups together newly observed domains into a number of distinct threat families, based on a range of characteristics, enabling the analyst to rapidly find associations.

AI and machine learning techniques have huge value for threat detection and response, when balanced appropriately with the ethical considerations around data. As cyber criminals are constantly developing new ways of evading detection within a rapidly evolving threat landscape, AI will not only give us the opportunity to identify threats faster, but also take action far more rapidly.