The earliest examples of modern data centers date back to the 1940s, when large rooms containing large computers were built to centralize computing processes. The first public concern about the use of mass data processing was noted in the Lewiston Daily Sun[i] in June 1966. According to the article, members of a U.S. congressional task force opened an inquiry about the government’s plans to build a data center to house “all governmental records”—including Social Security numbers, fingerprints, tax records, birthdates, addresses, schooling, military service, and more—about private citizens. Then-representative Cornelius Gallagher is quoted as saying, “Comprehensive information of this kind, centralized in one agency, could constitute a highly dangerous dossier bank.”
The concern wasn’t so much about the collection of data; the data was already being collected and stored by the U.S. government, albeit in separate data centers. Instead, the committee worried about the privacy of centralized data and how it could be used negatively against an individual. If only the committee had prescience, their 1966 heads would have spun with the knowledge of today's data centers and big data.
The term “big data” was first used by O’Reilly media in 2005, right around the time Hadoop was built by Yahoo! to collect and index all data from the World Wide Web. Shortly thereafter, private enterprises recognized the opportunity to use the huge amounts of data they were collecting to increase revenue. The thought, as with the centralization of citizens’ data in the mid-1960s, wasn’t to exploit people’s privacy. It was about getting ahead—making money through enhanced analysis of data. However, big data and mass data processing created a perfect storm of opportunity for unauthorized access, misuse, and abuse.
Over the years, the amount of data collected, processed, and shared has only grown. As a result, we’ve seen large-scale scandals (Facebook/Cambridge Analytica[ii]), blatant abuse (Sequoia One[iii]), and inappropriate sharing (DMV[iv]), along with the regular, old selling, sharing, and processing that goes on every time we transact with a company. It’s how we’re targeted for advertisements and upselling, and how organizations find trends, anomalies, and new opportunities to serve customers. Obviously, big data use cases aren’t all (or even mostly) malicious, but the possibility of exploitation is great.
Thus, as the use of big data has evolved, so too have the number of laws and regulations. Privacy is at the heart of the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA), and thousands of cyber security regulations related to data protection have been passed—not all directly correlated to mass data collection, processing, and analysis, yet the laws still apply. Violations can result in multimillion-dollar fines and breaches can result in even worse.
In 2016, the co-founders of Duality Technologies recognized a growing need to protect big data analysis and processing. As PhDs, researchers, inventors, and educators in data science and cryptography this expert group wanted to build a solution that enabled trust through technology. They knew that most companies weren’t processing data at scale in-house—data owners often don’t have the skills or computation infrastructure, thus they need to outsource—so the goal was to ensure private and secure data sharing across organizations. “You don’t need to sacrifice privacy for collaboration and analytical capabilities,” said Rina Shainski, Co-founder and chairwoman of Duality Technologies, during a recent briefing. Yet, the founders didn’t see any products on the market that could reduce the conflict.
The company's impressive team had helped develop several revolutionary cryptographic technologies and felt the solution to private data sharing could be found in homomorphic encryption linked with analytics. What they developed are two products, SecurePlusTM Platform and SecurePlusTM Query, that allow organizations to share data with processors (model owners) without the need to ever decrypt the data anywhere along the chain. “Encryption” said Shainski, “is like locking your data in a box. Currently, to use the data in the box, you have to open the box, and that exposes the data. But homomorphic encryption makes it possible to use the contents of the box while it’s encrypted. With our solutions, companies’ data is transmitted to the analyzer in an encrypted format, insights are derived from its encrypted form, and then returned to the data owner who can then decrypt them and see the insights.”
Shainski is quick to point out, though, that the product isn’t the encryption itself; she calls it a data science platform optimized for encryption. Duality’s platform allows computing on encrypted data, without ever revealing the data to the outside party or during transmission. This means that organizations can extract insights using the most advanced analytics providers' solutions, but don’t increase their risk of data exposure, misuse, or abuse. It’s essentially a zero-trust data sharing solution. No one gets access to the data except the data owner, and no one gets access to the model except the model owner.
Duality offers four collaboration models: Secure data analysis, secure model deployment, secure data linkage, and its most recent offering, secure querying. While highly regulated industries like healthcare or financial services might seem the most obvious buyers of Duality’s solution, Shainski says the offerings are relevant to any organization looking to analyze sensitive data. With the number of potential vulnerabilities that could result from any data exchange, it’s wise to implement technology that ensures end-to-end protection, auditability, and which works in any computing environment. This is the promise of Duality. “With homomorphic encryption, we can host and use data in untrusted environments,” said Shainski. “It’s good for the world to be able to collaborate on sensitive data without exposing it.”