Lessons Learned from a Splunk Cluster Hardware Migration
Published: April 9, 2024
There are many reasons why you may choose to migrate hardware within your Splunk cluster. These could include aging hardware, failing machines, or even an increased budget. In this two-part blog series, we will guide you through a comprehensive plan to migrate your Splunk cluster deployment to new hosts and how to execute each step providing example code, diagrams and reference documentation wherever possible. Our existing infrastructure includes four indexers, three search heads, a heavy forwarder, a deployment server, and a cluster master. Each of these machines required decommissioning due to their support being discontinued.
This blog will focus on the planning stage of the process with an emphasis on the things that went wrong in hindsight.
Plan Overview
The first step in planning for any migration is to assess the current architecture and identify the instances that need to be migrated. This includes evaluating the performance and health of the existing instances, documenting the current configurations and settings, and identifying any potential issues that may arise during the migration process.
Migrating a Splunk cluster deployment can be a complex and challenging process, and careful planning is essential to ensure a smooth transition. This blog post is based on our experience with a real client’s Splunk deployment. We will try to share this experience and our expertise to help you identify key considerations and develop a comprehensive plan for your migration. Whether you are migrating to new hardware, upgrading your Splunk version, or consolidating your infrastructure, this post will provide you with the guidance you need to ensure a successful migration.
Preparation
Defining Migration Goals and Objectives
Before beginning a Splunk migration, it is important to identify the reasons for the migration and set clear goals and objectives. This will help ensure that the migration is aligned with your organization’s needs and priorities and that the migration process is well-planned and executed.
Setting clear goals and objectives for the migration is also important to ensure that the migration process is well-planned and executed. This may include setting goals for minimal downtime, a seamless transition, and improved performance. By identifying these goals, you can measure the success of the migration and identify areas for improvement.
Assessing the Current Architecture
It is important to assess the current architecture and identify the instances that need to be migrated. This may include the search heads, indexers, cluster master, deployment server, universal forwarders and heavy forwarders. Drawing an architecture diagram can be very useful in this case. Splunk can help you to determine the layout of your deployment architecture – you can use the monitoring console to determine your topology and Splunk Validated Architectures for guidance. And, here is the link for Spunk icon sets if you need them to develop your diagram. For example, below is the architecture diagram for our client’s Splunk deployment.
To evaluate the current performance and health of these instances use Splunk’s monitoring tools to gather data on system performance, resource utilization, and data ingestion rates. This will help you identify any bottlenecks or issues that may need to be addressed before the migration. Below is a step-by-step guide for benchmarking your Splunk system health:
How do I benchmark system health before a Splunk Enterprise upgrade?
It is also important to document the existing configurations and settings for each instance including the version of Splunk, all configuration files, and any customizations or add-ons. This documentation will be useful during the migration process to ensure that the new instances are configured correctly and that data is properly migrated. Ensure that you adhere to the following guidelines:
Splunk products version compatibility matrix,
Compatibility between forwarders and Splunk Enterprise indexers,
Compatibility between the manager and the peer nodes and search heads.
By assessing the current architecture and documenting the existing configurations and settings, you can ensure that any potential issues are identified and addressed before the migration begins.
Developing a Migration Timeline
Developing a migration timeline is an essential step in planning for a successful Splunk migration. A well-planned timeline will help ensure that the migration is completed on time and with minimal disruption to your organization’s operations. In our case, we laid out the migration over the course of several weeks, allocating enough time for each step depending on the type of Splunk instance. Search heads might take only a few hours to sync their configurations and data while indexers can take several hours or more depending on the size of data to be migrated and your network speed.
To develop a migration timeline, you should estimate the time required for each migration step, including the migration of search heads, indexers, cluster master, deployment server, and any heavy forwarders in your deployment. Be sure to allocate sufficient time for testing and troubleshooting. Due to unforeseen networking issues mentioned later in the blog, we had to postpone the migration for several weeks. We recommend performing these changes after-hours and over weekends so that any potential issues are identified and addressed with minimal disruption to the end-users.
Ensuring Data and Configuration Backups
Ensuring data and configuration backups is one of the most important steps in planning for a Splunk migration. Backups and snapshots of the current configuration and settings will help ensure that data is not lost during the migration process and that the new instances are properly configured. In the event that something goes wrong, having a backup on hand is crucial to mitigate losses.
To ensure that data and configurations are backed up correctly, you should create copies of the current configuration and settings for each instance, including any customizations or add-ons. In general, you should at least backup the “etc” directory which includes all the common and customized configurations of your Splunk instance. Refer to the following documentation for additional guidance:
Managing backup and restore processes
Backup configuration information
It is also important to ensure that data replication and search factor requirements are met. This will help ensure that data is properly replicated across the new instances and that search factor requirements are met so that data is not lost during the migration process and that the new instances are properly configured.
Finally, it is important to establish a rollback plan in case of migration issues. This will help ensure that any issues that arise during the migration process can be quickly addressed and that the migration can be rolled back if necessary.
Identifying Resource Requirements
Identifying resource requirements is another important step in planning for a Splunk migration. This includes determining the hardware and software requirements for the new hosts, assessing the need for additional resources, and evaluating the costs and benefits of the migration.
To determine the hardware and software requirements for the new hosts, you should assess the current performance and health of the existing instances and identify any bottlenecks or issues that need to be addressed. Refer to the system health benchmark documented above. This will help you determine the hardware and software requirements for the new hosts, including the CPU, memory, storage, and network bandwidth. Refer to the Splunk documentation for recommended hardware requirements:
System requirements and other deployment considerations for indexer clusters
System requirements and other deployment considerations for search head clusters
Finally, it is important to evaluate the costs and benefits of the migration. This includes assessing the costs of the new hardware and software, as well as the costs of the migration process itself. You should also evaluate the benefits of the migration, such as improved performance, increased reliability, and cost savings. Perhaps migrating to a cloud solution or Splunk Observability is a better option.
Preparing the New Hosts
Preparing the new hosts includes setting up the required hardware and software, configuring the new hosts according to the documented settings, and ensuring proper network connectivity and security measures.
In order to ensure that the required hardware and software are installed and configured correctly, you should install the appropriate operating system, Splunk version, and any required add-ons or customizations. You can validate this by referencing the requirements identified in the previous step. One of the issues we faced was that we were unable to find/download the installation file for the current architecture’s Splunk version since it was no longer supported by Splunk and the decision had been made to update the hardware before the software. Fortunately, we were able to locate the installation file on one of the existing hosts. You can also reach out to Splunk support in this situation.
Once the new hosts are set up, you should configure them according to the documented settings. This includes configuring the Splunk instance to match the settings of the existing instance, including the configuration files, add-ons, and customizations. This will help ensure that the new hosts are ready to be deployed when the time comes.
It is critical to ensure proper network connectivity and security measures between the old hosts and the new hosts on the correct ports. This includes configuring the network settings for the new hosts, ensuring that the new hosts are properly connected to the network, and implementing any necessary security measures, such as firewalls and access controls. After we set up our new hosts, we ran into an network issue where our old Splunk instances were utilizing management IPs and they were not able to communicate to our new hosts over management IP network. The new hosts were able to communicate to the old hosts over traffic IPs. So we had to update all the old hosts’ relevant Splunk configurations to communicate over traffic IPs.
We highly recommend assessing your Splunk network setup in the current environment and developing a network diagram including all the ports that will need to be open in your new environment. Make sure to collaborate with your system or network administrator to identify the current network setup and test it in the new environment. Below is a reference diagram from Splunk documentation.
Establishing a Communication and Collaboration Plan
Establishing communication and collaboration includes identifying the stakeholders involved in the migration, establishing clear lines of communication and collaboration among stakeholders, and providing regular updates on the migration progress and addressing any concerns.
The first step in the process is to identify the stakeholders involved in the migration. This may include the IT team, management, project coordinator, end-users, and any other relevant stakeholders. You should also identify the roles and responsibilities of each stakeholder, and ensure that everyone is aware of their role in the migration process.
Once the stakeholders have been identified, you should establish clear lines of communication and collaboration among them. This may include setting up regular meetings or conference calls to discuss the migration progress, sharing documentation and updates on the migration process, and addressing any concerns or issues that arise. We had multiple Splunk Solution Engineers collaborating with client’s side stakeholders to make sure that we are following the right steps and procedures throughout the migration project. We recommend reaching out to Splunk support or Splunk experts before the migration so you have answers to any queries you might have and you are prepared to identify and address any potential issues in a timely manner.
Developing a Testing and Verification Strategy
To develop a testing and verification strategy, you should first define the criteria for a successful migration. This may include performance, functionality, and data integrity. The monitoring console and the system health benchmark documented in previous steps would be integral in this step.
For testing purposes, a lab environment is advised. If possible, we recommend deploying a similar Splunk architecture in your lab to test each migration step.
Conclusion
In conclusion, migrating a Splunk cluster deployment to new hosts can be a complex and challenging process, but careful planning will ensure a smooth execution. Besides assessing the current architecture, system health and configurations of the existing deployment, we highly recommend evaluating your current network infrastructure to avoid any networking related issues during the migration.
In the next blog post, we will guide you through the execution stage of the process, providing example code and reference documentation wherever possible. We will cover topics such as migrating search heads, indexers, cluster master, deployment server, and any heavy forwarders in your deployment. We encourage you to share your experiences, insights, and any feedback on planning for Splunk cluster migration.