Learn and Catch Hackers

Hackers in Darkweb forums rely heavily on reputation. Previous postings, technical knowledge, possession of code samples, and on and on – are the currencies that allow them to manage and grow influence. The paradox, however, is that no hacker would ever consider exposing their identity to the group – regardless of any presumed anonymity controls. Participants in the Darkweb must therefore stake their positions some other way.

In most cases, this is done by showcasing skills – something Steven Levy explained back in the 1980’s. While specific behaviors vary between forums, clear patterns can be identified in this collage of hacker activity, discussions, and sharing. From these patterns, insights can be gained on how the exploit winds are blowing. This allows effective determination of which vulnerabilities are most likely to be used next – and that is super-valuable information.

A research team at Arizona State University has been examining this fascinating world and how it can be used as the basis for deriving useful cyber threat intelligence. Their excellent book Darkweb Cyber Threat Intelligence Mining (Cambridge Press, 2017) has been my tour-guide over the past month to this interesting discipline, and I can tell you that there are rich, useful ideas here – ones that can be applied to practical cyber security immediately.

Not surprisingly, a wonderful new company called CYR3CON has been recently organized to exploit this insight and to create a platform for intelligence gathering and prioritization of vulnerabilities. The company’s co-founder, Paulo Shakarian, is a principal of the academic research, and he was kind (and patient) enough to spend considerable time with me these past few weeks. I’m glad I made the effort, because there is something important here.

Here is how CYR3CON works: Dark web discussions are fed to the platform with assistance from human experts to direct and curate the best locations for useful intelligence. The text of these posts is automatically parsed and tagged if meaningful reference is made to a relevant and exploitable vulnerability. The best case, obviously, would be some capable hacker bragging about the details of an important exploit – which is both common and likely.

“Predicting which vulnerabilities will be exploited is not an easy process,” explained Shakarian. “But we’ve developed a platform based on our research that combines data ingest, analytics, and automation – much of it based on hacker discussions held in the recesses of the dark web. We analyze the content of these discussions to make priority estimates about which vulnerabilities have the highest likelihood of exploitation.”

The platform performs correlative analysis to connect the tagged post to previously collected, categorized, and correlated information. As one would expect, this can be done directly by software that compares specific attributes such as the source and time of a post. Or it can involve machine learning, where tagged posts and other information become labeled training data used to teach the software what to look for in future posts.

“What we are doing,” explained Shakarian, “involves leveraging machine learning technology to create actionable information about exploits. Our goal is to enable better decisions based on a more realistic assessment of security posture. The presumption is that if you’ve made bad determinations of the relative importance of your existing vulnerabilities, then you will create incorrect estimates of aggregate cyber risk.”

I asked Shakarian about the technical challenges of this correlation and machine learning method and he pointed to several, all based on the improving, but still relatively immature nature of artificial intelligence for security applications. Causality, for instance, can be established between irrelevant factors if the overall process is not properly curated – but this will improve as commercial platforms learn to generate better conclusions.

I like to ask cyber start-ups about the foundational basis for their work – and the Arizona State research offers a convincing narrative. Did you know, for example, that finding a preferred attack strategy is NP-complete? This implies that hard work will always be required to prioritize exploitation paths – and helps justify the CYR3CON approach. If you are a mathematician, you will find this complexity result to be profound. I sure did.

My discussion with Shakarian shifted to business, and it was clear that two paths are being followed: First, we agreed that enterprise teams continue to perform this task of prioritizing vulnerabilities. Larger companies such as global banks, for instance, will typically include a team of experts doing this work – and as you’d expect, the CYR3CON platform will help them produce more accurate priority estimates. So, this is a great sales opportunity.

But the second business path for CYR3CON looked especially interesting. It recognizes the growing importance of application, cloud, and SDN-enabled service providers in the coming years. “Our strategy is to establish a best practice of ‘CYR3CON-inside’ for any service providers who must keep track of vulnerabilities.” I liked this idea, and agreed that if providers prioritize vulnerabilities more effectively, then we all benefit.

So, if you assign priorities as part of your vulnerability management, or if you offer any type of service to customers who would benefit from your team operating at a higher and more accurate level for this essential security task, then give a call to Shakarian and his team. (And, by the way, if you’re excited at the prospect of poring through theorems on admissible heuristic functions in the context of tree-search variants, then I suggest you buy their book.)

As always, please let us know what you learned.