It was my privilege to once again present at a MongoDB User Group meeting yesterday (this time in Wellington). It was a brilliant event, and it was wonderful to meet some incredible engineers working for Kiwi companies who are doing amazing things with MongoDB.
On this occasion, I presented a deep dive into MongoDB CSFLE (Client Side Field Level Encryption)
MongoDB CSFLE is a client application-level encryption mechanism that provides engineers with the ability to encrypt data in a way that can deliver a key-shredding capability if implemented correctly;
In short, CSFLE utilizes a master key to encrypt Data Encryption Keys, which in turn are used to encrypt the data itself.
KMS options from AWS, Azure and GCP are all supported, or if you have a compelling reason to (more on why you might need to do that later), you can implement a local master key yourself.
Understanding what the Data Encryption Context is is key to successful implementation. In short, it defines the boundary that YOU want to define around some data within your database that should be encrypted with a single Data Encryption Key. A good example might be all the data related to a customer.
MongoDB, and in fact, CSFLE itself, does not impose any restrictions on how you do this (which might be why people struggle with the concept), so it's really up to you. However, your decision on what defines a Data Encryption Context within your application may impact your ability to implement a key-shredding strategy if that is something you want to do.
Before I define key shredding, let's consider the problem it aims to solve. Consider a database full of customer data, which gets backed up regularly and where, for business reasons, those backups use a long-lived backup retention policy (by way of example, let's say several months' worth of daily backups). Now consider the scenario where the customer has the right to be forgotten, perhaps because you offer that to them in your terms of service or, alternatively because you are required to provide that capability by law (e.g. GDPR).
Now, in that case, if a customer exercises the right to be forgotten, you have a huge problem. You can obviously delete all the customer's data from your live database, but you may also be required to delete all the customer's data from all of your long-lived backups.
Now, if the customer's data (at least the PII parts of it) were encrypted using CSFLE, for example, and if your Data Encryption Keys are held separately and on a separate short-lived backup retention policy (by way of example, let's say a week of daily backups), then when a customer exercises the right to be forgotten, you can once again delete all the customer's data from your database, and in addition, delete the Data Encryption Key associated to the customer. Once the Data Encryption Key is no longer accessible (because the short-lived backup retention policy has rolled over), the customer's data in the primary long-lived backups becomes entirely unreadable, and there is no need to remove the customer data from those backups to comply with the right that the customer has to be forgotten.
It is an elegant and powerful solution to a complex problem, but it requires you to define the Data Encryption Context carefully upfront.
A correctly indexed key vault is the first thing needed to implement CSFLE successfully. The key vault is where the MongoDB Driver will manage Data Encryption Keys. It's simply a collection, and here is an example of what that index should look like:
const db = new Database(databaseName, connectionParams, logger);
await db.EnsureIndex(
'__keyVault',
'key_idx',
{ keyAltNames: 1 },
{
unique: true,
partialFilterExpression: { keyAltNames: { $exists: true } },
},
);
In terms of actually encrypting data, there are two ways this can be implemented;
Key shredding can really only be implemented using the explicit approach, so we suggest that customers adopt that approach.
Here is an example of what the code might look like for an explicit encryption implementation.
enum EncryptionAlgorithm {
Deterministic = 'AEAD_AES_256_CBC_HMAC_SHA_512-Deterministic',
Random = 'AEAD_AES_256_CBC_HMAC_SHA_512-Random'
}
async function EncryptValue(
resourceName: string,
subjectId: string,
value: any,
algorithm: EncryptionAlgorithm,
): Promise<Binary> {
const db = await this.LeaseDatabaseConnection(resourceName);
const encryptor = await db.encryptor;
const keyId = await this.EncryptionSettingsHelper.GetEncryptionKeyId(db, subjectId);
return encryptor.encrypt(value, { keyId, algorithm });
}
const unencryptedValue = 'SOMETHING THAT SHOULD BE ENCRYPTED';
document.encryptedField = await EncryptValue(
'Customers',
'MY-TENANT-IDENITFIER-OR-ENCRYPTION-CONTEXT',
unencryptedValue,
EncryptionAlgorithm.Random,
);
So, what will the encrypted data in the database look like? Well, here is an example based on the code above:
{
_id: ObjectId('66d51a6ace2d5406c6a6e62d'),
encryptedField: Binary.createFromBase64(‘Ain8tImUkk+2rPPGVr8xCGMCGW7uwYYkqpDUGN9jM0WqFKiCT5nZ
Y4jEmqs35Y+tw9Oohp7Zb5fTEdPVBvL0wjprnADjox5Fod0yfW55aKbF/DP4cY+Zqpmr3bP8m4FCb
5GN8XSMexeRunkm6bogzIFXzcObed/PeF/F997RsFduA9+PRI+oMe1qc/MiJnTYsxEAWEBD8YQB
5KofQcjV9uKXqLkH7t2plb+o2JVx+z7CEHkKqTsBMpvFHLuc65vMs5e+tN6O3nwfPD/iZd1CErPH', 6)
}
So, what about decryption? Well, this is where I introduce you to a wonderful CSFLE superpower. Automatic decryption!!
How is that even possible?
Well, CSFLE doesn't just encrypt your data, but it also wraps that encrypted data within a metadata structure, which includes the identifier of the Data Encryption Key that was used to encrypt the data, and so the driver is then able to [and does] automatically decrypt the data by transparently fetching that key from the key vault, and using its key material to decrypt the data - all without you having to lift a finger.
I really like CSFLE. It is genuinely awesome, but it's something that you want to implement correctly and with careful consideration. This is not something you want to get wrong.