Amazon MSK was selected as the target for this solution. The primary reasons were:
- Eliminate operational overhead with managing cluster thereby reducing TCO.
- Seamless application migration with no code changes.
- Highly available and secure cluster provisioning within minutes with automatic cluster scaling.
The team also focused on the following points:
Producer/Consumer Dependency Mapping:
A dependency mapping chart was put together to identify the producers and consumers for various topics. This was a key exercise to ensure that the producers are migrated only after all the consumers are migrated to AWS and to prevent any data loss.
AWS MSK Capacity Planning:
The workload on the on-prem Apache Kafka clusters was analyzed and the target state MSK architecture was developed taking the service limits into consideration.
The order of migration was of critical importance to ensure that the customer experience was not impacted. This meant that some consumers were migrated to Amazon MSK, but producers remained on their existing Kafka cluster. Replication was key as the messages produced on the existing Kafka clusters still had to be replicated to Amazon MSK, so that the migrated consumers could consume those messages.
The following design principles were implemented to avoid any performance impact on the DirectConnect network between the datacenter and AWS or the MSK cluster itself:
- The migration of topics was managed in a certain order to make sure all available direct connect bandwidth is not consumed by the Kafka migration process.
- All topics on Amazon MSK were created with the same configurations such as replication factor and number of partitions.
- The offsets are replicated to make sure the consumers can resume processing from the next message onwards when the services are started in AWS.