Azure Databricks & Kafka Enabled Event Hubs

I recently configured a Kafka enabled Event Hub in Azure. I used a Spark Scala cluster to stream these events. The following are my findings.

Kafka Enabled Event Hub

First things first, Kafka enabled Event Hubs DO NOT work on the basic pricing tier. You need Standard at least. I was working in a dev environment so naturally wanted to keep costs at a minimum, so I initially started with the Basic tier. Its interesting that Azure allow you to configure this.

Once you are aware of the above, configuring the Event Hub is as standard.

Azure Databricks

The next step was to start consuming events.

Databricks Runtime Version

I already had an existing 4.3 (Apache Spark 2.3.1, Scala 2.11) cluster in my Databricks Workspace, so I started working with this. However, this led to problems.

I could not launch my streams using 4.3. Kafka is expecting to find a jaas config file with the SASL configuration. I made several attempts to configure this file, including putting it on the Databricks File System, and writing it to the underlying VM. Nothing worked for me.

So I upgraded to the 5.0 runtime. The Kafka version supported here allows all configurations to be applied as options when defining the streams. This worked for me immediately without any issue.

Kafka Library

Azure Databricks uses a shaded Kafka library. So prepend all Kafka imports with “kafkashaded”. For exampe:

import kafkashaded.org.apache.kafka.common.security.plain.PlainLoginModule

Configuration

Configuration is straight forward. You only require the following information:

  • The Event Hub name for reading/writing streams
  • The consumer group for each Event Hub
  • The server name
    • {event-hubs-namespace}.servicebus.windows.net:9093
  • The connection string
    • kafkashaded.org.apache.kafka.common.security.plain.PlainLoginModule required username=\”$ConnectionString\” password=\”{YOUR_CONNECTION_STRING}\”;

The connection string is located under Shared access policies -> RootManageSharedAccessKey:

It looks as follows in my Notebook:

Stream Configuration

The following are my Kafka streams for reading & writing. They use the configuration variables from the previous section.

Summary

Hopefully this will help you avoid some of the pitfalls that I encountered. The links in the References section are worth looking at. They give plenty of information regarding setting these up. However, I just felt they were missing some of issues I encountered above, such as the basic tier and runtime version.

References

https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-create-kafka-enabled

https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-get-connection-string

https://docs.azuredatabricks.net/spark/latest/structured-streaming/kafka.html

One Reply to “Azure Databricks & Kafka Enabled Event Hubs”

Leave a Reply

Your e-mail address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.