Deep Learning for Cyber

By luck, I was born to a computing pioneer. When I was in kindergarten, my Dad completed his PhD in computer science focusing on tessellation automata (see graphic above). Perhaps your Dad fixed cars, or sold stocks, or taught English, but mine was an expert in massively parallel computation. It’s what I was born into – and as a kid, I thought everyone used the blank sides of automata theory reports as scrap paper.

It is thus with more than just passing interest that I now dive enthusiastically into the use of layered neural networks to detect malware. Parallel processing is like an old friend to me, and learning its role in cyber defense is like visiting part of my childhood. Advanced computing is ridiculously comfortable for me, and I feel uniquely qualified to comment on its application to our industry.

Last week, I sat down for whiskeys and tech talk in the dim bar room of Fraunces Tavern in Manhattan with Eli David and his team at Deep Instinct. I was keen to learn more about their advanced use of deep learning methods for cyber security. I’d been hearing much about the company and I wanted to know if this was the real deal. At the risk of wrecking any suspense, I can tell you that the evening was worth the time.

First recognize that it's beyond our scope here to provide a detailed tutorial on neural networks or artificial intelligence. Good resources are available for free, and Google can point you to some decent reads. But I would like to share what I learned last week about how deep learning has emerged as a promising alternative to traditional machine learning for the reliable detection of malware.

Most of you know that machine learning was invented to teach automation to recognize patterns. This is easier said than done, because it requires up-front work to develop crafted cases that define the parameters of what the machine should understand. In addition, the intense computational requirements can easily overwhelm platforms, resulting in weak results and high data loss.

Several years ago, Geoffrey Hinton – the Henry Ford of this area – created improved algorithms that enabled training deep neural networks without signal loss. Meanwhile, the community had begun massively parallelizing machine learning using highly efficient graphic processing units (GPUs). The collective result was a quantum leap in capability that led to the emergence of the field we now refer to as Deep Learning.

Since then, deep learning has enabled solutions to complex problems in fields such as machine vision and speech recognition. It has also resulted in computers becoming more skilled at games like chess. As you would expect, it also became obvious to scientists such as Eli David that deep learning would be an excellent candidate for enabling world-class cyber security.

As historical context, Eli shared a conceptual maturity model in which signature-based anti-virus methods were supplanted by behavioral analytic tools. These solutions were then one-upped by machine learning, which has since been bested by the efficiency and accuracy of deep learning. While some might quibble with the details, this progression seemed roughly digestible to me.

Here is what a security professional must understand: Machine learning tools use specially designed training data to recognize malicious patterns in targeted software. As outlined above, the computational requirements for this method are high, and the training data is hard to build. This combination often leads to false negatives and slow processing, especially for malware variants.

Using GPUs, massively parallel algorithms, and enormous neural networks, however, deep learning tools, such as from Deep Instinct, accept data directly - and lots of it. Hundreds of millions of cases are fed directly to the neural network as training. This allows skipping the time-intensive and error-prone pre-processing step, and allows for harder problems, like detecting variants, to become tractable.

When we say that cases are fed to neural networks, this means that raw byte values comprising the file are directly ingested by the first layer of neurons. Each neuron accepts inputs, and produces an output, which is why many refer to them as little decision-makers. When you arrange a massive grid of little decision-makers (like my Dad's tessellation structures), what you get is a big decision-maker.

The result of this, Eli explained, is that these crazily powerful neural networks are better than signatures (duh), better than behavioral heuristics, and better than traditional machine learning at detecting complex malware. The Deep Instinct endpoint solution implements this advanced recognition concept for enterprise teams hoping to get their arms around malware on PCs.

I don't know if everything plays out exactly as Eli proposes. Endpoint security trends have been tough to predict, and the possibility exists, as he was willing to admit, that an adversary might try to outwit a neural network by nefarious methods. This can include writing programs to evade learning, or creating phony malware that would be ingested during training. Neither would be easy, by the way.

I am truly optimistic that deep learning using neural networks will reduce risk in a consequential manner. What our community needs now is the benefit of real experience, live application, and scientific investigation. Just as advances in automata theory from my Dad’s generation led to compilers and programming languages, perhaps advances in deep learning from our generation might help us overcome cyber security attacks.

Let me know what you think.