Five Things to Know about Databases that Leverage Partially Homomorphic Encryption
Lab41 is passionate about data protection. We shared our perspective on several emerging trends in our previous post about Technology's Opportunity to Ease the Burden of Bulk Data. This time we want to share findings from our recent survey on two databases that leverage partially homomorphic encryption. The math and terminology that form the foundation of these systems can be daunting and most people have never heard of partially homomorphic encryption. So, we whittled our findings down to five things to know about databases that leverage partially homomorphic encryption and aimed to explain everything in plain English.
Encrypted databases have the potential to provide solutions to long standing information security challenges associated with the insider threat and outsourcing sensitive data storage.
As both the frequency and severity of network intrusion attacks have increased in recent years, organizations have had to rethink their strategy to secure their sensitive data. Difficulties in protecting sensitive data clearly persists for many organizations and will likely continue to be a challenge in the future, especially as the vast amounts of data collected become even more valuable. Organizations face the challenge of trying to share this valuable data with those that should have access to it, while prohibiting access to those that want to steal and exploit it.
Data encryption is a logical solution to the problem. Even in the worse case where encrypted data is compromised through a breach, hypothetically it would be useless to anyone without the corresponding decryption key. Databases that have traditionally used encryption have only done so when the data is at rest or on disk. Once the database needs to operate on that data, it must decrypt it first, meaning that in memory, the data is in its plaintext form. This process is typically referred to as Transparent Data Encryption (TDE). Organizations may often favor TDE given its fairly contained functionality, requiring little to no changes to external applications or IT infrastructure, however TDE does not prevent malicious hackers or curious database administrators from observing the database contents.
In light of these challenges, one encryption scheme that stands out as a potential solution is homomorphic encryption. When applied to a database, homomorphic encryption distinguishes itself from TDE by enabling an environment where sensitive data remains confidential to the database tasked with operating on it. Under these conditions, not only could a curious database administrator be prevented from discovering sensitive data, but this sensitive data could also be stored in a completely untrusted environment.
In academia, the reference to homomorphic encryption is typically limited to multiplication and addition. A system which is capable of encrypting a value that can be added and multiplied by another encrypted value is known known as Fully Homomorphic Encryption (FHE). This cryptosystem was first proposed in 1978 by Rivest, Adleman, and Dertouzos, however a practical and efficient implementation has largely remained elusive, despite the FHE Craig Gentry proposed in 2009. As an alternative, researchers have leveraged Partially Homomorphic Encryption (PHE) algorithms, which unlike FHE, can only perform either multiplication or addition. In order for a plaintext value to accommodate both multiplication and addition, it must be encrypted separately by two different PHE schemes.
Because much of the execution that a database conducts relies on the primitive operations of multiplication and addition, PHE implementations combined with other cryptographic algorithms can resemble many of the critical operations of databases today.
The true magic of partially homomorphic encryption is that the mathematical operations do not require the encrypted values to be revealed or decrypted at any point in the process. The sensitive data remains confidential to the database tasked with operating on it and a compromise of the database would only reveal encrypted values. Partially homomorphic encryption schemes can accomplish addition or multiplication but not both.
The Paillier cryptosystem is an example of an approach that is partially homomorphic for addition. Two encrypted values can undergo a series of Paillier mathematical operations and the encrypted result can be decrypted to reveal the sum of the original plaintext values. The well-known encryption standard RSA is partially homomorphic for multiplication. In this case a series of RSA mathematical operations can be conducted to find an encrypted result that can be decrypted to reveal the product of the original plaintext values. Perhaps cryptography legend Bruce Schneier explained it best: "The RSA algorithm is different. Encrypt P to get C, multiply C by 2, and then decrypt 2C -- and you get 2P. That's a homomorphism: perform some mathematical operation to the ciphertext, and that operation is reflected in the plaintext." That's. Crazy.
Encrypted databases maintain a balance between cipher strength and query performance. These competing requirements can require database columns to use one or more encryption strategies depending on the expected query usage. As a result, multiple partially homomorphic encryption schemes such as Paillier and RSA might be combined with deterministic, random and order-preserving techniques.
Deterministic encryption is a scenario where the encryption of a given keyword always results in the same encrypted value. Database indexing leverages sorting strategies that require a predictable structure to the all of the column contents. Therefore deterministic encryption is ideal for database columns that will be indexed for content matching or equality queries (ex: WHERE X='10'). Unfortunately this predictability can allow adversaries to analyze the distribution and frequency of the plaintext values for columns with a limited range of values.
Random encryption is designed for a given keyword to be combined with random padding before encryption so that equal plaintext values map to different ciphertexts, typically through deterministic encryption of plaintext values that are injected with random padding or a random initialization vector. The variation in ciphertexts significantly complicates cryptanalysis through frequency and pattern analysis but prevents searchable indexing in database formats for the same reason.
Order-preserving encryption (OPE) preserves the numerical ordering properties of the associated plaintexts. More specifically, Encryption(x) > Encryption(y) for a given key k if and only if x > y. OPE is particularly valuable for encrypted database comparison, sorting and range operations (ex: MAX, MIN, ORDER BY, and SORT). An ideal OPE scheme reveals only the order of the ciphertexts. OPE is considered less secure than standard deterministic encryption because OPE reveals element order and potentially some information about their relative distance.
We reviewed the research publications and associated prototypes for two systems that leverage partially homomorphic encryption techniques, both of which hail from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL). We previously touched on CryptDB in our Technology's Opportunity to Ease the Burden of Bulk Data post.
CryptDB is an encrypted database prototype that enables data to be stored securely on an untrusted SQL database by employing an intermediate trusted proxy which is responsible for encryption and decryption of user queries and subsequent results. The publication of the CryptDB research in 2011 was significant because it was one of the first implementations of an encrypted database that used Paillier partially homomorphic encryption techniques in a SQL environment.
CryptDB uses onions of encryption such that the database adds or removes layers of deterministic, order-preserving and random encryption based on whether the SQL operation is conducting a join, equality, ordering, searching or addition. The onion layers are not equal in terms of cryptographic privacy, so CryptDB maintains the highest strength onion layer for each column as long as possible. Google developed the Encrypted BigQuery extension of their BigQuery client based on CryptDB and several other companies have integrated portions of CryptDB's design into their systems.
(Onion Layers diagram from CryptDB)
Monomi was developed as a system for analytical query processing on encrypted databases. Monomi distinguishes itself from other encrypted databases by optimally splitting query execution between the client and server and its use of Paillier partially homomorphic encryption techniques. Monomi was built on CryptDB's foundation and the research was published in 2013. Monomi encrypts the entire database, assumes the database server is untrusted and runs queries over the encrypted data. Monomi's split query process partitions a user's SQL statement into distinct segments that are designated to be executed either on the client or the server. The database operates on an initial query, while queries against the subsequent results are executed on the client in order to prevent information leakage. This approach also offers significant advantages when Monomi is confronted with SQL analytics that involve incompatible encryption schemes or inefficient encryption schemes.
(Encryption schemes diagram from Monomi)
Perfomance and design requirements take on increased importance when implementing an encrypted database that leverages partially homomorphic encryption. Both the CryptDB and Monomi publications cited approximately 20% performance impact and neither offers full, standard SQL functionality.
(TPC-C Benchmark from CryptDB, Confidentiality for Database Apps)
Analysis should be conducted to determine important queries and common analytics before implementing an encrypted database. The current state of partial homomorphic encryption imposes significant limitations on certain SQL operations. The original prototype of CryptDB could execute only 4 of the 22 TPC-H benchmarking queries. The Monomi publication indicated that it could execute 19 of the 22 TPC-H benchmarking queries due to its split query execution strategy.
(Example of TPC-H benchmarking query 11 from Monomi)
Encrypted databases can also limit or significantly complicate database administration operations such as building indexes and altering table structure. Monomi's original implementation had cumbersome support for database design changes and required that each table to be updated had to be downloaded and decrypted via a "SELECT *" query, and then each encrypted row would be re-inserted using the new encrypted design.
Security considerations are shifted but still exist in an encrypted database architecture. If an adversary compromises CryptDB's proxy server, the adversary can recover the encryption keys of a logged-in user and access any data that user is authorized to view. Adversaries may be driven to expand targeting of user devices (computers, tablets, phones) that interface with the encrypted database instead of focusing on the database server itself. Perhaps most importantly, misconfiguration by the data owner and common user error can cause serious headaches.
Data owners and administrators should carefully review any planned use of deterministic encryption for highly sensitive fields in property-preserving encrypted databases. The sameness of ciphertext values within a non-unique column can reveal the distribution and frequency of the plaintext values for that column. When combined with the knowledge of the type of data stored in a column and publicly available auxiliary information, research has shown that scarily accurate statistical inferences can be made about the plaintext data of that column.
CryptDB uses a key chaining strategy to manage access to sensitive data. Decrypting a data item in the database requires a chain of keys that cascade from a data owner's password. If an adversary does not know the data owner's password, the adversary cannot decrypt the data even if the database and server are fully compromised. However, this also creates a scenario where a lost user password might cause encryption keys and associated data to be unrecoverable. That would put a new spin on the usual password reset request to the IT help desk!
The CryptDB and Monomi prototypes signal that optimizations around partial homomorphic encryption might be able to provide most of the required functionality at an acceptable performance penalty for some use cases going forward. Ongoing rigorous review of proposed encrypted data processing techniques and defining best practices for handling both sensitive and non-sensitive data fields in these systems will be critical to success. We expect that continuing regulatory pressure around privacy from local and national governments, the compelling benefits of securely sharing sensitive data such as electronic medical records, and high profile media coverage of data exposures will spur additional commercial investment in this space in the coming years.
Get more insight into how we workLets go