We’re here to work with you at all stages.

View all services
Build products using the latest engineering practices and designs that aren’t just functional but beautiful with Launch.
Learn more
Rethink how your product delivery teams build and design your products. Architect to build the building blocks that allow experimentation with Amplify.
Learn more
Gain market share by designing and building product features. Gain velocity by embedding our experts in your team with Catalyse.
Learn more
Take control of your cloud costs and technical debt, and add coverage for DevOps with Control.
Learn more

Articles

From user research, digital strategy to solving bold engineering problems. Our team specialises in providing a suite of services that can take an idea from a rough sketch to a secure and fault tolerant digital product.
View all articles

Tutorials

Learning new technologies and frameworks ensures we are ahead of the curve. Here is a collection of step by step tutorials about things we've learnt. Learn with us!
View all tutorials

Culture

We believe the best digital products are built by a diverse and skilled team. We’ve created a safe inclusive workspace, and we believe in diversity. We are a group that believes in software development and design is a craft. This is what unites us.
Learn more

Mission, Vision & Purpose

Our team is diverse. Each coming from a different background and beliefs. We think of product development & design as a craft. We love to learn new ways of improving our craft - be it learning new frameworks, or adding new specialties.
Learn more
View all tutorials
Creating and managing an AWS MSK Cluster and Configuration
May 25, 2022
Vishnu Prasad
Software Engineer
Contents

Apache Kafka allows for asynchronous communication in a distributed ecosystem. It allows producers to publish messages on topics that are then ingested by consumers interested in those topics. As a concept, pub-sub models have been around for ages. However, the beauty of Kafka is in the how — using partitions and consumer groups, Kafka can scale the rate of consumption of messages with minimal dev and economic overhead. In this tutorial, I’ll take you through how to provision a managed Kafka cluster using the AWS Managed Stream for Kafka (MSK) service. We’ll use the serverless framework to create and maintain the infrastructure for MSK and the supporting VPCs, subnets, etc.

This tutorial assumes a good understanding of Kafka and how to configure it. We will go over deploying & updating an Apache Kafka service (clusters and configuration).

Prerequisites

  • Apache Kafka
  • Basics of the Serverless framework
  • An active AWS account

MSK(Managed Streaming for Kafka)

Amazon provides a managed Apache Kafka service called MSK (Managed Streaming for Kafka). MSK allows you to build scalable production-ready Kafka applications. It abstracts the provisioning of infrastructure for scaling and management of the cluster. The Kafka cluster has an MSK configuration attached to it.

In this tutorial, we will use the serverless framework for the following:

  • Creating an MSK Cluster
  • Creating an MSK Cluster Configuration
  • Attaching a configuration revision to the cluster
  • Deploying this to an AWS account and verifying it

1. Starting with a Serverless project

Please clone this starter project. This comes with out of the box support for provisioning a VPC, subnets, IGWs, route tables, etc. I’d highly recommend going through the resources/config directory to get familiar with how things have been set up.

Clone the repository, and install the dependencies using npm.


git clone git@github.com:wednesday-solutions/msk-kafka-starter.git
cd msk-starter-template
npm i

💡 Pro tip: Subnets for MSK needs to be setup carefully. You need two subnets in two different Availability Zones (AZs), if you are in the US West (N. California) Region. In all other regions, you can specify either two or three subnets. Your subnets must all be in different AZs. When a cluster is created, MSK distributes broker nodes evenly.

For this tutorial, we create three private subnets in the ap-south-1 region, and each subnet has 256 IPs within them. I have also taken special care to ensure that subnets are in different AZs. We also have one public subnet that routes traffic to an Internet Gateway (IGW) that allows the components within the VPC to communicate with the internet.

2. Create the MSK configuration

Let’s start by creating the Kafka configuration. We create a configuration in the server.properties file.


vi assets/server.properties

Add the following to the file.


auto.create.topics.enable = true #1
zookeeper.connection.timeout.ms = 1000 #2
log.roll.ms = 604800000 #3

The above represents a minimalistic configuration with the following settings:

  1. Enables auto-creation of topics on the server
  2. The max time that the client waits to establish a connection with zookeeper
  3. The period of time after which Kafka will force the log to roll even if the segment file isn't full

Here is a complete list of properties that can be configured.

Now let’s write the serverless resource for the creation of the cluster configuration.


vi resources/apache-kafka/msk-config.yml


Resources:
  ServerlessMSKConfiguration:
    Type: AWS::MSK::Configuration
    Properties:
      Description: cluster for msk cluster-${sls:stage}
      Name: msk-cluster-config-${sls:stage}-config
      ServerProperties: ${file('./assets/server.properties')} #1

The ServerProperties key references the newly created server.properties file

That’s it. That’s all you need to create a revision of your configuration. Subsequent changes in the configuration would create a new revision. The updated revision number has to be referenced in the MSK.

3. Create the Kafka cluster

Let’s create the Kafka cluster, which will reference the cluster configuration. Open the serverless.yml file and add the following.


#serverless.yml
resources:
  ...
	- ${file('./resources/apache-kafka/msk.yml')}

Next, run the following command.


vi resources/apache-kafka/msk.yml

Copy the section below into the file.


Resources:
  ServerlessMSK:
    Type: AWS::MSK::Cluster
    Properties:
      ClusterName: ${self:service}-${self:provider.stage}-msk #1
      KafkaVersion: 2.6.2 #2
      BrokerNodeGroupInfo:
        InstanceType: kafka.t3.small #3
        ClientSubnets: #4
          - !Ref ServerlessPrivateSubnet1 
          - !Ref ServerlessPrivateSubnet2 
          - !Ref ServerlessPrivateSubnet3
        SecurityGroups: #5
          - !GetAtt ServerlessMSKSecurityGroup.GroupId
        StorageInfo: #6
          EBSStorageInfo:
            VolumeSize: 10
      NumberOfBrokerNodes: 3 #7
      EncryptionInfo:
        EncryptionInTransit:
          ClientBroker: TLS
          InCluster: true
      EnhancedMonitoring: PER_TOPIC_PER_BROKER
      ConfigurationInfo: #8
        Arn: !GetAtt ServerlessMSKConfiguration.Arn
        Revision: 1

  1. The cluster name suffixes the stage, allowing for easy deployment across environments
  2. The Kafka version that we use is 2.6.2
  3. For the tutorial, we’ve used a kafka.t3.small instance, please configure this to best suit your application needs
  4. The previously created subnets are associated with the cluster
  5. The previously created security groups are associated with the cluster
  6. You can tweak the storage provisioned with this parameter
  7. The number of broker nodes, which is a critical factor for high availability and redundancy, can be tweaked using the NumberoOfBrokerNodes property
  8. ConfigurationInfo property governs the default Kafka configuration. We reference the previously created Configuration and the associated revision number

Deployments

Deploy your application by using the following command


npm run deploy:development

Before running the script, please make sure that the following environment variables are set with the correct values

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY

Alternatively, you can fork the repo, set the above as Github secrets, push it to your main branch and let the already setup CD pipeline take care of the rest. Once your pipeline runs successfully, check out your newly created cluster in the console!

Cluster Configuration Property
Deployed MSK Cluster in the AWS Console

Where to go from here?

If at any point, you feel stuck, a complete example with a working CI/CD can be found here

I hope you enjoyed reading and following through with this tutorial as much as I did when writing it.

The amazing folks at Wednesday Solutions have created a repository of example applications using the serverless framework. I’d love to get your thoughts on it and perhaps stories of how it eased/improved your development workflow.

Alternatively would love to hear your thoughts on where it can be improved. Thanks, and happy coding!

Wednesday is a boutique consultancy based in India & Singapore.

Let's talk

Wednesday is a boutique consultancy based in India & Singapore.

Let’s talk

2023