Validating Splunk Dashboards – A Testing Automation Approach to Quality, Reliability, and Assurance
Published: July 29, 2024
Testing and validation practices are one of the cornerstones of robust software development. Anyone can write code, but a key part of security, maintainability, and completeness is writing a comprehensive set of unit & acceptance tests for your product. These tests should cover every single component of the project and should preferably be run after every build, as well as on a set interval, to ensure that everything’s working properly. This is sometimes straightforward for a lot of software using, for example, unit test frameworks like JUnit, and Acceptance Testing frameworks like Serenity with some custom code. But what happens when these kinds of out-of-the-box tools don’t work right away? What happens when you need a more creative solution? These are the challenges you can face when creating acceptance tests and validation for Splunk Dashboards. The following is an approach to solving those challenges.
The intuitive method would be to use Serenity’s built-in web testing capabilities; after all, Splunk dashboards are designed around usability and visual clarity and are meant to be interacted with by humans directly in the browser. Powered by a web driver called Selenium, Serenity can emulate any browser and interact with websites in much the same way a human would. For example, if we define a webpage in the code (using little more than its URL) we can then open that page and replicate basic interactions through built-in functions, such as clicking on elements or entering text into specific fields and inspecting the results. Serenity even helpfully takes a screenshot every step of the way, so the reports that are generated provide a visual walkthrough of the steps it took, for later debugging or review. Applying this method to Splunk Dashboard testing does work: we can log in to Splunk, navigate to a dashboard, and interact with it just like a user would in a browser, complete with screenshots and logs. But this approach runs into an immediate problem – the dynamic nature of the pages means that elements are difficult to pinpoint and interact with, and the need to wait for searches to populate means long wait times and plenty of false positives for errors and failures. This leads to inconsistent, not to mention slow, results from these tests.
So what’s the solution? Splunk provides a helpful and well-documented API for its products, and we’ve written hundreds of API tests with Serenity as a driver. So if the front end won’t cooperate, we just have to recreate the dashboards in the ‘backend’.
Using some custom methods and a few well-placed API calls, it’s possible to pull a JSON representation of each Splunk dashboard to be tested, then parse the searches present inside, and call the API again to run those searches and verify their results. The first step is to make a quick API call to authenticate and get a session key, but after that, the process really begins by creating a GET request for all of the dashboards present in a certain Splunk app. This returns an inordinate amount of data, but by using JSON parsing, we can filter down to only the Splunk dashboards we want to test. These dashboards are given as XML with a JSON array inside, so then more string manipulation is used to extract the data in isolation.
Within this JSON array are all the searches used on the dashboard, but it isn’t enough to copy this text and run them. Not only are there time spans to consider, but most Splunk dashboard searches are also littered with variables and sometimes need additional commands to be added to work properly outside of the context of their dashboard panel. Luckily, the XML for the Splunk dashboard that we already retrieved contains the dropdowns and their values, as well as the default values for the variables. By creating a dictionary with those key-value pairs and looping over all the searches to replace the keywords with their values, we finally have a literalized version (multiple versions, in fact) of every search present in the Splunk dashboard being tested. The timespan for a search, similarly, is usually present in the JSON itself, but if not, the Splunk dashboard has a default time range we can use.
With our restructured and optimized dashboard searches in hand, we can now call the Splunk API a third time to run each search and write the results and any messages to a dictionary we create. This allows us to run all the Splunk dashboard searches and then test the results afterwards, rather than running and checking each in sequence. This would not only break up the flow of the testing steps but also only show us one error per report – the first or the last, neither of which are particularly helpful in diagnosing an issue on their own.
Once all the dashboard searches have been run, the final step is to loop back through the aforementioned results dictionaries to ensure that some response has been received and that there are no errors – notably, a Splunk dashboard search can have no results found, like a predict function with no data for its timespan, or an alert that hasn’t fired. Other validations can be applied if we know what characteristics or results we expect from the data found by the Splunk search, including perhaps validating the formatting and type of data returned, the ranges of results, or the minimum or maximum volume. Any reliable characteristic of the response can be checked in the test to ensure a particular Splunk dashboard is tested, and its searches are continuing to operate within well-known parameters.
In this undertaking, we’ve created a type of virtual Splunk dashboard we wanted to test, replicating and ensuring all the functionality of the original, purely through API interactions. This method can be extended to test the Splunk dashboard further to check the presence of buttons, login forms, the values of dropdowns, and more – all without having to navigate the slow and imprecise visual interface. Even better, this method is extendable to test any Splunk dashboard, not just hardcoded ones. As long as the response format of the API stays the same, the testing suite will work, though it is of course possible to write more stringent passing criteria based on the functionality of your particular dashboard.
With these Splunk Dashboard tests in place, we now have the ability to detect dashboard failures as early as possible, ensure a lack of outages, and double-check our Splunk app updates. Splunk dashboard testing like this makes the Splunk app and add-on changes we make more reliable, reduces risks, and can catch issues before they interrupt Splunk operators or users who often rely on Splunk and Splunk dashboards for security, alerting, and problem diagnosis. In addition, by breaking down each component of the Splunk dashboard in the testing suite, we gain not only a better understanding of the product but also comprehensive living documentation that can be used and reused going forward as we make further changes or others need to support our Splunk ecosystem. With testing being such a key standard of secure and robust software development, we must ensure that the same high standard extends to every corner of what we do, including Splunk Dashboard, app, and add-on validation and testing.
If you’d like help with starting to implement Splunk Dashboard testing, Serenity testing, or need advice on how to increase your code quality and observability, please contact our team of experts at Solsys. We can help you optimize the investment you’ve made in your Splunk environment and help you unleash the power of your data.
Kiran McCulloch is a Software Engineer at Solsys, where he has led the development effort for creating a synthetic API suite for testing security vulnerability detection on a variety of platforms. His versatility and aptitude for picking up new technologies have allowed Solsys to demonstrate to their customers the critical importance of solid API security and testing.