Cloudize Logo

The pros and cons of MongoDB CSFLE

Looking ahead at the next horizon

Introduction

It was my privilege to once again present at a MongoDB User Group meeting yesterday (this time in Wellington). It was a brilliant event, and it was wonderful to meet some incredible engineers working for Kiwi companies who are doing amazing things with MongoDB.

On this occasion, I presented a deep dive into MongoDB CSFLE (Client Side Field Level Encryption)

MongoDB CSFLE is a client application-level encryption mechanism that provides engineers with the ability to encrypt data in a way that can deliver a key-shredding capability if implemented correctly;

So, what is MongoDB CSFLE?

  • It's a Data Encryption Mechanism
  • It's NOT a MongoDB database server-level capability
  • It's a MongoDB driver-level capability
  • It solves real-world problems but does introduce a few challenges...

How does it work?

How does it work?

In short, CSFLE utilizes a master key to encrypt Data Encryption Keys, which in turn are used to encrypt the data itself.

KMS options from AWS, Azure and GCP are all supported, or if you have a compelling reason to (more on why you might need to do that later), you can implement a local master key yourself.

Understanding what the Data Encryption Context is is key to successful implementation. In short, it defines the boundary that YOU want to define around some data within your database that should be encrypted with a single Data Encryption Key. A good example might be all the data related to a customer.

Defining your Data Encryption Context well is the key to success.

Please note: This is one of those measure-twice-cut-once situations. Altering your Data Encryption Context in the future will be difficult, so think it through carefully, and if you're still unsure, give us a call.

MongoDB, and in fact, CSFLE itself, does not impose any restrictions on how you do this (which might be why people struggle with the concept), so it's really up to you. However, your decision on what defines a Data Encryption Context within your application may impact your ability to implement a key-shredding strategy if that is something you want to do.

What is key shredding?

Before I define key shredding, let's consider the problem it aims to solve. Consider a database full of customer data, which gets backed up regularly and where, for business reasons, those backups use a long-lived backup retention policy (by way of example, let's say several months' worth of daily backups). Now consider the scenario where the customer has the right to be forgotten, perhaps because you offer that to them in your terms of service or, alternatively because you are required to provide that capability by law (e.g. GDPR).

Now, in that case, if a customer exercises the right to be forgotten, you have a huge problem. You can obviously delete all the customer's data from your live database, but you may also be required to delete all the customer's data from all of your long-lived backups.

That is a nightmare!!

Now, if the customer's data (at least the PII parts of it) were encrypted using CSFLE, for example, and if your Data Encryption Keys are held separately and on a separate short-lived backup retention policy (by way of example, let's say a week of daily backups), then when a customer exercises the right to be forgotten, you can once again delete all the customer's data from your database, and in addition, delete the Data Encryption Key associated to the customer. Once the Data Encryption Key is no longer accessible (because the short-lived backup retention policy has rolled over), the customer's data in the primary long-lived backups becomes entirely unreadable, and there is no need to remove the customer data from those backups to comply with the right that the customer has to be forgotten.

It is an elegant and powerful solution to a complex problem, but it requires you to define the Data Encryption Context carefully upfront.

Implementation

A correctly indexed key vault is the first thing needed to implement CSFLE successfully. The key vault is where the MongoDB Driver will manage Data Encryption Keys. It's simply a collection, and here is an example of what that index should look like:

const db = new Database(databaseName, connectionParams, logger);
await db.EnsureIndex(
  '__keyVault',
  'key_idx',
  { keyAltNames: 1 },
  {
    unique: true,
    partialFilterExpression: { keyAltNames: { $exists: true } },
  },
);

Encryption

In terms of actually encrypting data, there are two ways this can be implemented;

  • The automatic approach
    • utilises a JSON-Schema definition within the database
    • requires MongoDB Enterprise Server or MongoDB Atlas
  • and the explicit approach
    • works on any version of MongoDB *(including MongoDB Community Server)*

Key shredding can really only be implemented using the explicit approach, so we suggest that customers adopt that approach.

Here is an example of what the code might look like for an explicit encryption implementation.

enum EncryptionAlgorithm {
  Deterministic = 'AEAD_AES_256_CBC_HMAC_SHA_512-Deterministic',
  Random = 'AEAD_AES_256_CBC_HMAC_SHA_512-Random'
}
 
async function EncryptValue(
  resourceName: string,
  subjectId: string,
  value: any,
  algorithm: EncryptionAlgorithm,
): Promise<Binary> {
  const db = await this.LeaseDatabaseConnection(resourceName);
  const encryptor = await db.encryptor;
  const keyId = await this.EncryptionSettingsHelper.GetEncryptionKeyId(db, subjectId);
  return encryptor.encrypt(value, { keyId, algorithm });
}
 
const unencryptedValue = 'SOMETHING THAT SHOULD BE ENCRYPTED';
document.encryptedField = await EncryptValue(
 'Customers',
 'MY-TENANT-IDENITFIER-OR-ENCRYPTION-CONTEXT',
 unencryptedValue,
 EncryptionAlgorithm.Random,
);

So, what will the encrypted data in the database look like? Well, here is an example based on the code above:

{
  _id: ObjectId('66d51a6ace2d5406c6a6e62d'),
  encryptedField: Binary.createFromBase64(‘Ain8tImUkk+2rPPGVr8xCGMCGW7uwYYkqpDUGN9jM0WqFKiCT5nZ
                  Y4jEmqs35Y+tw9Oohp7Zb5fTEdPVBvL0wjprnADjox5Fod0yfW55aKbF/DP4cY+Zqpmr3bP8m4FCb
                  5GN8XSMexeRunkm6bogzIFXzcObed/PeF/F997RsFduA9+PRI+oMe1qc/MiJnTYsxEAWEBD8YQB
                  5KofQcjV9uKXqLkH7t2plb+o2JVx+z7CEHkKqTsBMpvFHLuc65vMs5e+tN6O3nwfPD/iZd1CErPH', 6)
}

Decryption

So, what about decryption? Well, this is where I introduce you to a wonderful CSFLE superpower. Automatic decryption!!

How is that even possible?

Well, CSFLE doesn't just encrypt your data, but it also wraps that encrypted data within a metadata structure, which includes the identifier of the Data Encryption Key that was used to encrypt the data, and so the driver is then able to [and does] automatically decrypt the data by transparently fetching that key from the key vault, and using its key material to decrypt the data - all without you having to lift a finger.

That is enormously useful!!

Important things you should to know about CSFLE!

Firstly, and most importantly:

Losing your master key will have a very negative influence over your day / your week / your life

In addition:

  • Some MQL features cannot work on encrypted data.
    • Remember, the database server has no idea how to read the contents of encrypted fields, so asking it to do anything that requires it to understand a field's contents will not work.
  • Multi-region applications may require a local master key.
    • KMS services are regional, so a multi-regional application would effectively have multiple master keys (which may, and probably will be problematic).
  • Moving an application to another region or cloud provider may be impossible if you have used their KMS service for the master key.
  • An encryption-enabled database connection will throw when reading encrypted data if any of the keys required for decryption are not available.
    • This is a classic key-shredding gotcha.
    • Mitigations are to design your application to only read data within a single Data Encryption Context or alternatively to use an unencrypted connection when reading across many contexts (obviously, encrypted fields will remain encrypted, but this might not be a problem, depending on your application).

Final thoughts

I really like CSFLE. It is genuinely awesome, but it's something that you want to implement correctly and with careful consideration. This is not something you want to get wrong.

Here be dragons. Use with care.

If you're uncertain or require validation of your thinking, reach out we'll be glad to help.

Are you inspired?

Cloudize can enable you to innovate at the speed of thought 🚀