Partition Keys: Azure CosmosDB vs AWS DynamoDB
This is a collection of links and some details from these various articles on what a partition key is within Azure Cosmos DB and AWS DynamoDB. They are very similar as far as I can tell. How you choose the right partition key is via a similar approach in both cases. How you actually configure the partition key will vary depending upon the cloud provider UI or CLI commands you are using.
“Now to elaborate.
In Cosmos DB and DynamoDB partition is the transaction boundary. An operation performed over records with
N partition keys (and so in
N partitions) is split into
N separate transactions. Each partition transaction can fail or succeed independently with no rollback of the primary transaction.” -Source: indexoutofrange.com (see url below)
The rest of this blog post quotes directly from this great Azure tutorial about creating the partition key.
- Cosmos DB allows you to store a huge amount of data
- To query this huge data may impact the performance
- Partitioning allows you to group data in partitions and provides better performance.
- Partition key is the JSON property (or path) within your documents that can be used by Cosmos DB to distribute data among multiple partitions
- Partition key decides the placement of documents
- All the documents belonging to the same partition value of partition key are group together into a logical shared partition
- Once you set the partition key, you cannot change it
- It’s a best practice to have a partition key with many distinct values (hundreds to thousands at a minimum).
- For example, let’s say that you’re storing JSON data about employees and your partition key is “department.” Then all documents with the value of “department” equal to “engineering” will be stored in the same partition. Similarly, all documents with “department” of “marketing” will be stored in the same partition.
- Azure Cosmos DB stores data in a number of physical partitions
- Collection is a logical container of physical partitions
- Every partition in Azure Cosmos DB has a fixed amount of SSD-backed storage associated with it and is replicated for high availability.
- Partition management is fully managed by Azure Cosmos DB. So no need to write any code.
- Each partition hosts one or more Partition Keys
How Does Partition Work?
- By default Azure creates one default partition
- While inserting a new document, Azure Cosmos DB hashes the partition key value and uses the hashed result to determine which partition to store the item in.
- Once the size of partition reached to the threshold, Azure created another physical partition and moves big size logical partition to newly created partition
- The developer can provide a partition key while performing CRUD operations to optimize query performance.
- Data belonging to the same value of partition key always logically grouped together and stored in the particular physical partition.
How to Choose the Right Partition Key?
- Choosing a partition key purely depends on the structure of data
- It is important to choose a partition key property that has a number of distinct values
- An ideal partition key is one that appears frequently as a filter in your queries and has sufficient cardinality to ensure your solution is scalable.
- If the chosen partition key doesn’t have many distinct values then all queries will get fired to a single partition which may slow down performance.
- If you are working on a multi-tenant application, then choosing TenantId as a partition key is a good choice.
- If you are creating an application for families, then zipping the code as partition key is a good choice
How to create Partitioned Collection
- Login to Azure Portal
- Go to Cosmos DB account
- Select storage capacity as Unlimited (Partitioning is not allowed for fixed storage)
- Give partition key value or path (e.g. /address/zipcode)
- Select throughput
- Click Ok”