MongoDB Field Level Encryption
Hello. In this tutorial, we will explore Field Level Encryption in Mongodb. I will be using Docker to set up MongoDb and Mongo UI interface containers.
1. Introduction
MongoDB is a popular open-source NoSQL (non-relational) database management system that is designed to store, manage, and retrieve large volumes of unstructured or semi-structured data. MongoDB was first released in 2009 and has since gained significant adoption in the web development community due to its flexibility, scalability, and ease of use.
Key features of MongoDB include:
- Document-Oriented: MongoDB stores data in flexible, JSON-like documents called BSON (Binary JSON), which allows for a dynamic schema. This means each document in a collection can have different fields and data types, making it easy to evolve the data structure over time.
- Scalability: MongoDB can handle large-scale applications and high volumes of data through horizontal scaling. It supports sharding, a technique that distributes data across multiple servers, enabling horizontal scaling to meet increasing demands.
- Replication: MongoDB allows for the creation of replica sets, which are synchronized copies of a MongoDB database. Replica sets provide high availability and fault tolerance by automatically promoting a new primary node if the current primary fails.
- Indexing: MongoDB supports various types of indexes, including single field indexes, compound indexes, and geospatial indexes. Indexes help optimize query performance and improve the speed of data retrieval.
- Aggregation Framework: MongoDB offers a powerful aggregation framework that allows users to perform complex data transformations and analytics operations on the data stored in the database.
- Ad hoc Queries: MongoDB supports ad hoc queries, which means developers can query the database without needing to predefine the schema or structure of the data.
- ACID Transactions: Starting from version 4.0, MongoDB introduced support for multi-document ACID (Atomicity, Consistency, Isolation, Durability) transactions, providing data integrity and consistency for complex operations involving multiple documents.
MongoDB is commonly used in various applications, such as content management systems, e-commerce platforms, real-time analytics, and mobile applications, where flexible data structures and horizontal scaling are essential. It has extensive community support and a robust ecosystem of drivers and tools for integration with different programming languages and frameworks.
1.1 Field-Level Encryption
MongoDB Field Level Encryption is a powerful feature designed to enhance the security of sensitive data stored in a MongoDB database. It allows developers to encrypt specific fields within a document, ensuring that only authorized parties can access and decrypt the sensitive information. By encrypting data at the field level, even if an attacker gains unauthorized access to the database, they will only see encrypted values, making it significantly harder for them to extract meaningful information.
Key takeaways from MongoDB Field Level Encryption:
- Granular Data Protection: Field Level Encryption enables developers to selectively encrypt sensitive fields, leaving non-sensitive data in plaintext. This flexibility allows for better performance while maintaining a high level of security.
- Secure Key Management: MongoDB provides options for key management, including integration with Key Management Systems (KMS) and the ability to use local master keys. Properly managing encryption keys is critical to ensure data security.
- Transparent to Applications: MongoDB Field Level Encryption operates transparently at the database driver level, meaning that applications interact with encrypted data as if it were plaintext. This ease of use minimizes the impact on existing applications and simplifies the adoption of encryption.
- Minimal Performance Overhead: While encryption introduces some computational overhead, MongoDB’s Field Level Encryption is designed to minimize the impact on database performance. With careful key management and appropriate algorithm choices, the performance overhead can be managed effectively.
- Multi-layer Security: Field Level Encryption is just one layer of security. It is essential to follow other best practices, such as network security, user authentication, role-based access control (RBAC), and secure application design to create a robust security posture.
- Careful Implementation is Key: Implementing Field Level Encryption requires careful planning and consideration. Developers must define which fields should be encrypted and determine the appropriate encryption algorithms and key management strategies based on the sensitivity of the data.
- Compliance and Data Privacy: Field Level Encryption can be an essential tool for organizations dealing with sensitive or regulated data, helping them meet compliance requirements and data privacy regulations.
1.2 Encryption algorithms
1.2.1 Deterministic Encryption
Deterministic encryption is a type of encryption that always produces the same ciphertext for a given plaintext input. In other words, if you encrypt the same value multiple times using the same encryption key, you will get the same encrypted output each time. This property is useful when you need to perform equality searches or exact matches on encrypted data without decrypting it.
1.2.1.1 How MongoDB’s Deterministic Encryption Works
In MongoDB, deterministic encryption is one of the modes that can be used for field-level encryption. When you configure a field to be encrypted with deterministic encryption, the same plaintext value will always produce the same ciphertext using the same encryption key.
This behavior allows you to perform certain operations on the encrypted data without the need to decrypt it first. For example, you can perform equality checks and range queries on encrypted fields because the encrypted values remain consistent.
However, it’s important to understand that deterministic encryption has some limitations. Since the same plaintext always produces the same ciphertext, it can potentially lead to data correlation. If two documents have the same value in the encrypted field, they will have the same ciphertext, which could reveal some information to an attacker. To mitigate this risk, it’s essential to carefully choose which fields to encrypt deterministically and how you structure your data.
1.2.1.2 Use Cases
Deterministic encryption is suitable for scenarios where you need to perform exact matches on sensitive data without decrypting it. Some common use cases include:
- Encrypted search: Allowing the encrypted values to be used in search queries without revealing the actual data.
- Joining encrypted data: Enabling joins between encrypted fields in different collections or documents.
- Indexing: Supporting the indexing of encrypted fields for improved query performance.
1.2.2 Randomized Encryption
Randomized encryption is designed to provide stronger security compared to deterministic encryption. Unlike deterministic encryption, where the same plaintext always produces the same ciphertext, randomized encryption ensures that each encryption of the same plaintext results in a different ciphertext. This adds an extra layer of security, making it more difficult for attackers to correlate encrypted values.
1.2.2.1 How MongoDB’s Randomized Encryption Works
In MongoDB, randomized encryption is one of the modes available for field-level encryption. When you configure a field to be encrypted with randomized encryption, each time you encrypt the same value using the same encryption key, you will get a different ciphertext. This property prevents data correlation and adds a level of protection against certain types of attacks, such as frequency analysis.
Since randomized encryption generates different ciphertexts for the same plaintext, you might wonder how equality searches or range queries are possible on encrypted data. To support such operations, MongoDB stores metadata alongside the encrypted value. This metadata allows the MongoDB server to handle query operations on the encrypted fields without requiring decryption on the client side.
1.2.2.2 Use Cases
Randomized encryption is a more secure choice for scenarios where data correlation and frequency analysis could pose potential risks. Some common use cases include:
- Sensitive data protection: Encrypting sensitive fields like personally identifiable information (PII), financial data, or healthcare records.
- Compliance requirements: Meeting regulatory and compliance standards that mandate strong encryption practices.
- Privacy preservation: Safeguarding data in scenarios where multiple instances of the same plaintext value are expected.
2. Docker
In the present world, Docker is an important term –
- Often used in CI/CD platform that packages and runs the application with its dependencies inside a container
- Is a standard for Linux Containers
- A Container is a runtime that runs under any Linux kernel and provides a private machine-like space under Linux
2.1 Setting up Docker
If someone needs to go through the Docker installation, please watch this video.
2.2 Setting up Mongodb on Docker
To set up the Mongodb and Mongodb GUI on the I will be making use of Docker and for that, I have prepared a simple docker-compose.yml
that will help to set up the mongodb with a default database as – myDatabase
stack.yml
services: mongodb: container_name: mongodb image: mongo environment: MONGO_INITDB_DATABASE: myDatabase ports: - "27017:27017" express: container_name: express_1 image: mongo-express ports: - "9001:8081" environment: - ME_CONFIG_MONGODB_SERVER=mongodb - ME_CONFIG_MONGODB_PORT=27017 - ME_CONFIG_MONGODB_ENABLE_ADMIN=true depends_on: - mongodb version: "3"
To get the mongodb up and running we will trigger the following command – docker-compose -f /stack.yml up -d
. If the images are not present in the host environment then they will be downloaded from the Dockerhub repository and the whole process might take a minute or two. Once done you can use the – docker ps
command to confirm whether the container is running or not as shown in the below image.
You can also use the following command – docker-compose -f /stack.yml up -d
to clean up the created environment.
3. MongoDb Field Level Encryption
Let us create an application to demonstrate field-level encryption.
3.1 Import Required Modules
The script imports the MongoClient
class from the MongoDB Node.js driver and the crypto
module from Node.js standard library.
Snippet 1
const { MongoClient } = require("mongodb"); const crypto = require("crypto");
3.2 Define the MongoDB Connection URI and Create MongoClient
The MongoDB connection URI (uri
) points to the MongoDB server running on localhost
with port 27017
. A new MongoClient
instance is created using this URI.
Snippet 2
const uri = "mongodb://localhost:27017/"; const client = new MongoClient(uri);
3.3 Connect to MongoDB with Auto-Encryption
The connectToMongo()
function establishes a connection to the MongoDB server. It uses await client.connect()
to connect, and it includes the autoEncryption
option to enable field-level auto-encryption. The “local” mode in autoEncryption refers to a specific configuration option where the encryption and decryption keys are managed locally within the client application. In this mode, you are responsible for generating, storing, and managing the encryption keys outside of MongoDB. The keys are then provided to the MongoDB driver for the encryption and decryption operations.
- The
crypto.randomBytes(96)
generates a 96-byte random key, which is used as the encryption key. ThekeyVaultNamespace
specifies the namespace for the encrypted data storage.
Snippet 3
async function connectToMongo() { try { await client.connect({ useNewUrlParser: true, useUnifiedTopology: true, autoEncryption: { kmsProviders: { local: { key: crypto.randomBytes(96) } }, keyVaultNamespace: "myDatabase" + ".__keystore" } }); console.log("Connected to MongoDB"); } catch (err) { console.error("Error connecting to MongoDB:", err); throw err; } }
3.3.1 Enable Selected Fields Encryption
To enable the selected field’s encryption you can update the autoEncryption
block. The last code is modified to only encrypt the ssn
field for the document while leaving all other PI details unencrypted, as shown below.
Enabling encryption at selected fields
async function connectToMongo() { try { await client.connect({ "useNewUrlParser": true, "useUnifiedTopology": true, "autoEncryption": { "kmsProviders": { "local": { "key": 96 } }, "keyVaultNamespace": "myDatabase.__keystore", "schemaMap": { "myDatabase.myCollection": { "bsonType": "object", "encryptMetadata": { "keyId": [ "/ssn" ] }, "properties": { "ssn": { "encrypt": { "bsonType": "string", "algorithm": "AEAD_AES_256_CBC_HMAC_SHA_512-Random" } } } } } } }); console.log("Connected to MongoDB"); } catch (err) { console.error("Error connecting to MongoDB:", err); throw err; } }
kmsProviders
specifies the Key Management Service (KMS) provider used for managing encryption keys. In this case, a local provider is used- The
encryptMetadata
property indicates that the field-level encryption key used to encrypt the field is stored in the document itself under the path/ssn
. This allows MongoDB to use the appropriate encryption key to decrypt the field when reading the document - The
properties
object defines the schema for the field to be encrypted(ssn)
. In this case, it specifies that the field’s value should be encrypted as a string using theAEAD_AES_256_CBC_HMAC_SHA_512-Random
algorithm
Note: MongoDb field-level encryption also supports explicit encryption. To enable it we can use any of the cloud providers and update the autoEncryption
attribute as shown below (in this case, AWS KMS is used as the provider).
Snippet
"kmsProviders": { "aws": { "accessKeyId": "your_aws_access_key_id", "secretAccessKey": "your_aws_secret_access_key", "keyIds": { "myDatabase.myCollection": "arn:aws:kms:us-east-1:123456789012:key/my-mongodb-key" } } }
3.4 Insert Encrypted Data
The insert()
function inserts encrypted data into the MongoDB collection named “myCollection”. The collection.insertOne()
method is used to add a new document with encrypted fields (firstName, lastName, and ssn). These fields will be automatically encrypted when saved in the mongodb.
Note: The mongodb field-level encryption also provides a way to encrypt only selected fields while leaving the other fields. This mechanism is automatically handled by the mongodb based on the encryption passed to the mongodb client object. As in section 3.3.1, we are encrypting only the ssn field so the mongodb will handle it automatically based on the encryption object supplied to the configuration.
Snippet 4
async function insert() { try { const db = client.db("myDatabase"); const collection = db.collection("myCollection"); await collection.insertOne({ id: 1234, name: "Peter Parker", email: "peter.parker@example.net", ssn: "901-01-1234" }); console.log("Data inserted successfully and in encrypted form"); } catch (err) { console.error("Error inserting encrypted data:", err); throw err; } }
3.5 Retrieve and Decrypt Data
The findAndGet()
function queries the MongoDB collection and retrieves one document using collection.findOne({})
. The data will be automatically decrypted due to the auto-encryption setup during connection, however, when the data will be retrieved it will be decrypted automatically and printed to the IDE console.
Snippet 5
async function findAndGet() { try { const db = client.db("myDatabase"); const collection = db.collection("myCollection"); const data = await collection.findOne({}); console.log("Data fetched successfully and decrypted data:", data); } catch (err) { console.error("Error finding and decrypting data:", err); throw err; } }
3.6 Execute the Main Function
The main()
function is responsible for calling the other functions in the correct sequence. It connects to MongoDB, inserts encrypted data, retrieves and decrypts the data, and then closes the MongoDB connection.
Snippet 6
async function main() { try { await connectToMongo(); await insert(); await findAndGet(); } catch (err) { console.error("An unexpected error occurred:", err); } finally { await client.close(); console.log("MongoDB connection closed"); } } main();
That is all for this tutorial and I hope the article served you with whatever you were looking for. Happy Learning and do not forget to share!
3.7 Output
To run the code open the terminal and enter the following code – node main.js
. If everything goes well the code will be executed and the following logs will be generated.
Console output
Connected to MongoDB Data inserted successfully and in encrypted form Data fetched successfully and decrypted data: {"firstName":"Peter","lastName":"Parker","ssn":"901-01-1234"}
Once the data is successfully inserted into the mongodb navigate to the collection to verify that the data is in encrypted form as shown in the below image.
The interesting part to understand here is that if the encryption is enabled on selected fields the document in the mongodb ui will only show the selected field data as encrypted while all the other fields will stay as unencrypted.
4. Conclusion
Field Level Encryption (FLE) in MongoDB is a powerful feature that provides an additional layer of security to protect sensitive data stored in the database. It allows individual fields within a document to be encrypted, ensuring that even if unauthorized access occurs, the data remains unreadable and useless without the proper decryption keys. This comprehensive conclusion will delve into the benefits, considerations, and potential challenges of implementing MongoDB Field Level Encryption.
- Data Security Enhancement: FLE enhances data security by adding an extra layer of protection to the sensitive fields within a document. Even if an attacker gains access to the database or backups, the encrypted data will remain unintelligible, reducing the risk of data breaches and leaks.
- Granular Control: With FLE, organizations can selectively encrypt specific fields, providing granular control over which data requires the highest level of protection. This ensures that only essential information is encrypted, minimizing any potential performance impact.
- Regulatory Compliance: FLE is a valuable feature for organizations that must comply with strict data privacy regulations, such as GDPR, HIPAA, or CCPA. By encrypting sensitive data, organizations can demonstrate a commitment to protecting user information, reducing potential legal liabilities.
- No Impact on Application Logic: From an application perspective, FLE is transparent. Encryption and decryption of fields are handled at the driver level, eliminating the need for developers to implement complex encryption logic within their codebase.
- Key Management: Effective key management is crucial for a successful FLE implementation. MongoDB provides options for key management systems (KMS) integration, allowing organizations to manage their encryption keys securely.
- Performance Considerations: While FLE offers robust security, it is essential to consider its potential impact on performance. Encrypting and decrypting data can introduce overhead, but MongoDB’s native encryption support is designed to minimize this impact.
- Backup and Restore: When using FLE, organizations must carefully manage backup and restore processes to ensure that encrypted data remains secure. Storing encryption keys separately from the database backups is essential to prevent unauthorized access.
- Trade-offs in Indexing and Querying: Field Level Encryption has some trade-offs when it comes to indexing and querying encrypted fields. Encrypted fields cannot be directly used for indexing or executing complex queries, which might require additional considerations and optimizations.
- Migration Considerations: Organizations planning to implement FLE need to consider the migration of existing data. Pre-existing data may need to be encrypted and reinserted into the database, which requires a well-planned migration strategy.
- Data Recovery: In the event of key loss, recovering encrypted data can be challenging. Organizations should have a robust backup and key management strategy in place to ensure data can be recovered in emergencies.
In conclusion, the provided code demonstrates the implementation of Field Level Encryption (FLE) in MongoDB using the Node.js driver. The code showcases how to connect to the MongoDB instance with encryption settings, insert encrypted data, and retrieve decrypted data.
- Encryption Configuration: The code sets up the MongoDB connection with auto-encryption enabled using the
autoEncryption
option. It defines a local key management provider (kmsProviders.local
) to generate a random encryption key (crypto.randomBytes(96)
) for securing the data. - Key Vault Namespace: The code specifies the key vault namespace as
"myDatabase.__keystore"
. This namespace will store the encryption key used to protect the sensitive data. - Data Insertion: The
insert()
function adds a new document to the collection named “myCollection” in the “myDatabase” database. The document contains sensitive fields likessn
(Social Security Number) for which encryption is enabled. - Data Retrieval: The
findAndGet()
function retrieves the first document from the collection “myCollection” and prints the decrypted data. When querying the data, the decryption is handled automatically by the MongoDB driver, allowing the user to work with plain, readable data.
It’s essential to highlight that the code exemplifies a simple setup for encryption, using a locally generated random key. In a production environment, it’s recommended to use a more robust key management system (KMS) for secure key handling.
Overall, the code demonstrates how to leverage MongoDB Field Level Encryption to secure sensitive data in the database while ensuring seamless encryption and decryption operations at the application level. When implemented correctly, FLE provides an extra layer of security, protecting data even in the event of unauthorized access to the database.
5. Download the Project
This was a tutorial to understand field-level encryption in mongodb and create a small practical example.
You can download the full source code of this example here: MongoDB – Field Level Encryption