The Quest to Marry Human and Machine Learning

When I was a reporter at The National Law Journal, I covered the Enron scandal. A few years later I interviewed some of the lawyers who had worked there. One subject we discussed was attorney-client privilege. Very early on, the company had decided to waive the privilege. After that, the attorneys lamented, it was almost impossible to assert that anything was confidential.

That included all the emails employees had sent and received. In time, they all entered the public domain. Anyone can read them—not just the ones that incriminated executives who were eventually convicted of crimes, but highly personal emails written by people who were never accused of anything.

A few years later, this trove of documents began to be used by companies that sold e-discovery software. They were a convenient exhibit to demonstrate how their products were able to sort and make sense of it all.

They’re still quite useful for this purpose—as I saw for myself the other day when a team from AppGate presented an analyst briefing for several of us from TAG Cyber.

AppGate, Inc. was the cybersecurity business of Cyxtera Technologies, which it spun off in January. And a big part of AppGate’s business is investigative analytics. If email was challenging to deal with in 2001, when Enron’s data was first being sifted, those days must look like child’s play compared to today, when (according to 2018 statistics) an estimated 130 billion emails are sent every day.

How do investigators analyze very large databases? AppGate’s answer is Brainspace, a product rooted in AI. But Nazir Shwayhat was quick to explain that in this instance, AI doesn’t stand for artificial intelligence. It stands for augmentedintelligence.

There’s a reason they came up with a different term, said Shwayhat, who is AppGate’s product manager for Brainspace (which he’s been working on since 2013). Their software is used to augment the intelligence of subject matter experts, he said. They recognize that humans are the critical decision makers, and their expertise is required in order to maximize the value of AI. Brainspace doesn’t replace the human intelligence, Shwayhat said. It amplifies it.

But more important, he continued, that isn’t all that Brainspace runs on. Four elements make it go: the core technology, interactive visualizations, machine learning and human intelligence—usually supplied by investigators and attorneys.

What makes it “augmented,” he explained, is that humans help the machine learn what the humans are looking for.

Shwayhat gave us a demonstration. And he used the Enron email dataset, which contains about a million records. And that starts with the visualizations, which begin with the Cluster Wheel.

The Cluster Wheel automatically organizes data into clusters, based on similar vocabulary, leading to topics of interest. What makes it particularly powerful, Shwayhat explained, is that investigators can search it in ways limited only by their knowledge and creativity.

In addition, visualizations such as the Communications Analysis can identify who was communicating with whom. It can also surface the subjects they were talking about.

Using Concept Search, an investigator can enter a search term like Raptor, Shwayhat said, referring to a special purpose entity (named after velociraptors in the movie “Jurassic Park”) that was designed to hide and profit from weaknesses on Enron’s balance sheet. Enron CFO Andrew Fastow (who eventually went to prison) and his team used terms like Raptor to hide large amounts of debt from investors. Brainspace’s Concept Search automatically retrieves related concepts, allowing users to uncover valuable insights and intersections between specific terms.

That’s where human recognition guides machine learning. It can be as simple, Shwayhat demonstrated, as highlighting “Raptor” and then searching for “more like this.”

Shwayhat told us that Brainspace had an opportunity to show off the application to investigators who had been involved in the actual Enron investigation. When the team showed them how fast he was able to find information they had struggled mightily to learn, “they were blown away,” he said.

Of course, visual depictions of data clusters aren’t new. The Enron dataset has been displayed in cluster tools for years. But AppGate argues that no competitor breaks down themes into so many topics, nor automates the process so thoroughly.

For email in particular, one of the program’s strengths is that investigators can literally traverse an email thread to follow the progress of a conversation. And as they do, they can see when specific individuals joined the dialogue, and how they influenced and altered it. And the investigators can tag important documents, and train the program to recognize important words in them, and focus the search for more of those.

Obviously the Enron investigators would have loved to have had this, but who are today’s buyers? Consulting companies, the big accounting firms, law firms, Fortune 500 companies doing self-policing and internal investigations, Shwayhat said. Government agencies also buy Brainspace. They use the software in proactive research, monitoring topics of interest around the world to aid in national defense and the fight against terrorism.

What many investigators particularly appreciate about Brainspace, he concluded, is that the program takes a huge and complicated dataset and makes it easy to explore interactive data visualizations. And these enable users to make smarter, faster and more informed decisions.