Azure’s Event Hub is a real-time data ingestion tool that allows for events to be streamed and stored into partitions. According to Microsoft documentation Event Hub “represents the front door for an event pipeline, of called an event ingestor in solution architecture” (Microsoft, 2018). From a data engineers’ perspectives, it is a treasure cove of data that you can utilize with an existing Splunk environment for insights, analytics and correlation with other data. As any Splunkie would ask “Why don’t you Splunk the data?”
This blog will introduce an Azure Event Hub Connector developed by Solsys Corp which pulls raw data that is subscribed to event hub, from partitions and ingests them into your Splunk environment. The use case driving this requirement was the need to correlate data from different cloud environments into Splunk.
The problem we faced was that there was no streamlined solution to retrieve raw data from the hub partitions. Existing solutions can fetch diagnostics, activity or metrics (telemetry) information from the hub, however we faced a growing demand to retrieve raw data from the Event Hubs.
The Event Hub Connector was installed on a Heavy Forwarder according to Splunk’s best practices. In order to utilize the Connector, there were a few pre-requisites for ensuring it runs smoothly.
- Installation of Python3.6 or above for developers: This was needed to utilize the Azure Event Hub SDK for python
- Installation of Azure Event Hub Software Development Kit (SDK) for Python: This was required to communicate with azure portal and event hub for retrieving the data. This would allow us to stay up to date with Microsoft Updates on the Azure side.
- Installation of Splunk SDK for python: This allows the connector to interact with Splunk through python
Once the prerequisites were installed, we proceeded to install the Azure Event Hub Connector. The logic diagram below shows the software components that were needed on the heavy forwarder. Ignore the “Splunk – Azure EventHub TA” for now as that is a custom Technical add-on, that we developed to assist with sourcetypes and parsing the data.
Once the three SDKs are installed, we proceeded to install the Event Hub Connector using the setup script. The SDKs installation ensures all the necessary python libraries and packages are available and work in conjunction to retrieve data from your eventhub partition.
The next step after the Connector installation, is to invoke the wrapper script that is contained in the Connector directory. The wrapper script was copied from the the Connector into the $SPLUNKHOME/bin directory. To invoke the wrapper, a scripted input was created by updating the inputs.conf in Splunk as shown below.
We restarted Splunk once the input was defined and observed the data being written to disk. Woot! You can see a snippet of raw data from our ‘Operational Insights Logs’ eventhub. For more information and steps to export Azure Activity logs to Event Hubs can be found on this link. In our lab we exported Azure Monitoring Logs to the operational-insights-logs informing us when objects were changed in each of the services Microsoft Azure Provided.
If I deleted a virtual machine theactivity log would be stored as “/MICROSOFT.COMPUTE/VIRTUALMACHINE/DELETE”, in the raw logs. This is the red highlighted text in the screenshot blow. We will see later in the blog how this data can be displayed in a dashboard.
Our next step involved create an input to monitor the files written to disk from the wrapper script while ensuring that the data is onboarded with the correct sourcetype for time extraction.
Hopefully, the steps have not lost your train of though. To streamline this process of on boarding data for a user, we create an Azure Event Hub TA which contains an inputs.conf sample scripted inputs and monitoring stanzas that once can use once the Connector is installed. The TA can be installed from Splunkbase.
The TA also contains pre-defined sourcetypes that significantly reduces the effort a Splunk admin undertakes for onboarding data into Splunk. Instead of creating a props.conf or custom regexes a user can study the pre-defined sourcetypes to determine what sourcetype matches the data extracted from the eventhub. The props.conf for the sourcetype is also created keeping best practices in mind 😊!
Once the data onboarding process was completed, our next step was to analyze the information in Splunk. We created a few dashboards now packaged as the “Azure EventHub Insights App”. This app can be found on Splunkbase and reduces the time it takes to analyze meta information on the data pulled from the event hubs and written to disk.
Using the Insights App and the dashboards we developed we can find information such as; total partitions in each event hub, latency of each eventhub sourcetype, data ingested timeline. These features allow Splunk admin to actively monitor meta data related to the wrapper script.
The use case to extract raw data and lack of available options in the open source market, pushed us to develop an in-house connector to solve this problem. Once we were able to connect to the hub and extract data from the partitions, we created two Splunk Applications to give us insights and provide sourcetypes for time extraction from the data. The chart below shows the products that we have created and their descriptions.
If you would like to see a demo of our product live or talk to a technical consultant please email firstname.lastname@example.org. This is the first iteration of the Azure Event Hub Connector that we have developed, we would love to hear any feedback on improving our product. For any license queries regarding the Azure Event Hub Connector please email email@example.com.
Microsoft. (2018). Why use Event Hubs? Microsoft.