Technical Overview
Darktrace’s transformative approach to cyber defense
relies on probabilistic methods developed by Cambridge
mathematicians. Employing multiple unsupervised, supervised,
and deep learning techniques in a Bayesian framework, the
Enterprise Immune System can integrate a vast number
of weak indicators of anomalous behavior to produce a single
clear measure of threat probabilities.
For each unique environment, Darktrace generates millions
of interrelated mathematical models which are correlated to
ensure that only truly anomalous behavior is detected without
a profusion of false positives. Unlike rules-based computation,
the results that probabilistic mathematics generate cannot
simply be categorized as ‘yes’ or ‘no’ but instead indicate
degrees of certainty, reflecting the ambiguities that
inevitably exist in dynamic data environments.
Ranking threat
The Enterprise Immune System accounts for ambiguities by
distinguishing between the subtly differing levels of evidence
that characterize network data. Instead of generating the
simple binary outputs ‘malicious’ or ‘benign’, Darktrace’s
mathematical algorithms produce outputs marked with
differing degrees of potential threat. This enables users of
the system to rank alerts in a rigorous manner, and prioritize
those which most urgently require action, while removing
the problem of numerous false positives associated with a
rule-based approach.
At its core, Darktrace mathematically characterizes what
constitutes ‘normal’ behavior, based on the analysis of a
large number of different measures of a device’s network
behavior, including: ——基于行为异常发现威胁。
Server access
Data volumes
Timings of events
Credential use
Connection type, volume, and directionality
Directionality of uploads/downloads
File type
Admin activity
Resource and information requests
Clustering devices
In order to model what should be considered as normal for a
device, its behavior is analyzed in the context of other similar
devices on the network. Darktrace leverages the power of
unsupervised machine learning to algorithmically identify
significant groupings of devices, a task which is impossible
to do manually on even modestly-sized networks.
To create a holistic image of the relationships within the
network, Darktrace employs a number of different clustering
methods, including matrix-based clustering, density-based
clustering, and hierarchical clustering techniques. The
resulting clusters are then used to inform the modeling of
the normative behaviors of individual devices.
Network topology
A network is far more than the sum of its individual parts,
with much of its meaning contained in the relationships
among its different entities. Darktrace employs many
mathematical methods to model the multiple facets of a
network’s topology, allowing it to track subtle changes in
structure that are indicative of threats.(识别网络拓扑结构中的些微变化)
One approach is based on iterative matrix methods that
reveal important connectivity structures within the network,
in a similar way to advanced page-ranking algorithms.
In tandem with these, Darktrace has developed innovative
applications of models from the field of statistical physics,
which allows the modeling of a network’s ‘energy landscape’
to reveal anomalous substructures that could represent
the first symptoms of compromise.(发现异常子结构)
Network structure
A further important challenge in modeling the behaviors of a
dynamically evolving network is the huge number of potential
predictor variables. For the observation of packet traffic and
host activity within an enterprise LAN or WAN, where both
input and output can contain many inter-related features
(protocols, source and destination machines, log changes,
and rule triggers etc.协议,源和目标机器,日志更改,
和规则触发器等), learning a sparse and consistent
structured predictive function is crucial.——预测网络流量吗?
In this context, Darktrace employs a cutting-edge large-scale
computational approach to understand sparse structure
in models of network connectivity based on applying L1-
regularization techniques (the lasso method). This allows
the Enterprise Immune System to discover true associations
between different elements of a network(发现网络元素之间的关系) which can be cast
as efficiently solvable convex optimization problems and
yield parsimonious models.
Recursive Bayesian Estimation
To combine these multiple analyses of network behavior, (生成网络设备的全面状态图)
generating a single comprehensive picture of the state of the
devices that comprise a network, Darktrace leverages the
power of Recursive Bayesian Estimation (RBE). Using RBE,
Darktrace’s mathematical models are able to constantly
adapt to new information as it becomes available to the
system. Continually recalculating threat levels in the light
of new data, the Enterprise Immune System can discern
significant patterns in data flows indicative of attacks, where
conventional signature-based methods see only chaos.传统的签名方法只能看到混乱。
Darktrace & Deep Learning
Darktrace also uses deep learning to enhance modeling
processes. Deep learning is a subset of machine learning
that uses the cascading interactions of layered mathematical
processes – known as neural nets – to give intelligent
systems a higher degree of insight. Multi-layered neural
nets can improve the detection and remediation of certain
threats, for example, in the identification of DNS anomalies,
which are less effectively tracked by other machine learning
methods. Darktrace’s deep learning system assigns a score
to all DNS data from a device, with the purpose of identifying
suspicious activity even faster.(识别DNS异常,其他机器学习不太有效地跟踪它们。 分析来自设备的所有DNS数据,用于识别
Darktrace also clusters devices into peer groups, based on
its own understanding of how those devices behave, and
uses supervised learning to uncover sequences of breaches,
unusual patterns, or to detect aberrant activity at a higher,(对这些设备的行为方式的理解,以及使用有监督的学习来发现违规行为,
more holistic level. For example, the WannaCry ransomware
was easily detected by Darktrace as it breaches a number of
different ‘pattern of life’ models. Using supervised learning,
Darktrace can replicate the process of a human interpreting
various sets of breaches for a device or network over time
and so present correlated alerts instead of a multitude.
Supervised learning is also used by Darktrace to understand
more about the environment, without a human having to label
it. By observing millions of different smartphones, for example,
Darktrace gets faster and faster at identifying a new device as a
‘smartphone’, and even what type of smartphone it is.
Using deep and supervised techniques to complement its core
unsupervised machine learning algorithms, Darktrace builds
up unique, contextual knowledge about network activity and
integrates the insights of our global deployments to improve
threat detection.
Finally, Darktrace also uses deep learning techniques to
automate repetitive and time-consuming tasks carried out
during investigation workflows. By analyzing how seasoned
cyber analysts interact with the Threat Visualizer, triage
alerts, and leverage third-party sources, Darktrace is able
to replicate those expert behaviors and automate certain
analyst functions.(Darktrace还使用深度学习技术
Darktrace’s technology has become a vital tool for security
teams attempting to understand the scale of their network,
observe levels of activity, and detect areas of potential
