How to Proactively Detect Attacks: Using Splunk Attack_Data to Simulate Attacks

With data breaches negatively impacting business costs year-over-year, many companies are turning to advanced techniques to be proactive in predicting future breaches.  Splunk has provided a repository of Attack Data to simulate a variety of attacks in your Splunk ES environment. Using Attack_Data with Splunk Enterprise ES leads the charge in helping companies of all sizes to better understand real threats as well as to ensure detection of these threats and to take remedial action against them.

When starting with Splunk ES, one of the first priorities is configuring attack detections. However, often your currently-ingested logs do not have datasets representative of actual attacks. In this situation, Splunk Attack_Data is especially helpful by providing logs representative of the tactics and techniques of many types of attacks.

The instructions below provides an overview of how to set up attack data within your Splunk ES environment.

Installing Attack_Data

The instructions for downloading the Attack_Data repository are found on  Ensure that you have git and git-lfs installed and have over 22G of drive space available.  In the following examples, Attack_Data has been installed in the /mydir directory.

To use Attack_Data, Splunk Eventgen must also be installed. Splunk Eventgen is a tool that modifies logs based on a configuration file.  To install Splunk Eventgen, follow the instructions on  I recommend following the section “2. Install/ Use Eventgen as a Python (PyPI) package” or the section “Install Direct From GitHub”.

Please note that there are two versions of Eventgen: a python version and a TA version.  For the purposes of this discussion, I will reference the python version.  The TA version, if enabled in your Splunk environment, will run against any eventgen.conf file in your $SPLUNK_HOME/etc/apps directory.  Using the python version however, gives you more control over the data ingested to Splunk.

Splunk Eventgen can be configured to write to a log file or to directly send the data to your Splunk Server.  In practice, I have found that writing to a monitored log file is the most straightforward way to ingest data. For Splunk Cloud instance, using a Heavy Forwarder (HF) for this function is most convenient, just make sure the destination index is defined on the HF as well as on the Splunk Server.

Using Splunk Eventgen

Below is an example of an eventgen.conf. I have only shown one stanza ([windows-sysmon.log]) in this example but in practice there would be a separate stanza for each log file.  There are many options for the token.<integer> line, in this case the regex replaces a timestamp with a current timestamp. You can add further token.<integer> lines with different regex targets in the log file, including items like IP addresses and hostnames. You can replace anything in the log.

 [windows-sysmon.log] interval = 60
earliest = -10m
latest = now
disabled = 0
sampletype = raw
timeMultiple = 2
outputMode = file
fileName = /mydir/output/windows-sysmon.log.attack_data_datasets_attack_techniques_T1105_atomic_red_team
sampleDir = /mydir/attack_data/datasets/attack_techniques/T1105/atomic_red_team
count = 2296

token.0.token = \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}
token.0.replacementType = replaytimestamp
token.0.replacement = %Y-%m-%dT%H:%M:%S

token.2.token = backend_http_status="(.*?)"
token.2.replacementType= random
token.2.replacement= list["200","201","400","404","500"] 

The three most important lines are fileName, sampleDir, and count. fileName is the output file for the file updated by eventgen.  Naming the file with the specific attack name helps differentiate files from other attacks written to the fileName directory.  This output file is the one you would monitor with Splunk.  Note that the YAML file in the log file directory provides you with the sourcetypes for the log files.  Be sure to use the correct sourcetype when setting up the Splunk Monitor.  sampleDir is the directory in which the log file is found.  Count is the number of lines in the log file the stanza is modifying.

Using Attack_Data

The datasets provided in Attack_Data are organized by the MITRE ATT&CK Framework, the Lockheed Martin Cyber Kill Chain, and CIS Controls.  To simulate the T1105 (Ingress Tool Transfer) attack for example, the Attack_Data dataset has the following logs representative of this MITRE attack:



Once you are at the point where Eventgen is configured correctly, you can create an eventgen.conf for an Attack_Data dataset and run splunk_eventgen against it.

splunk_eventgen generate /mydir/attack_data/datasets/attack_techniques/T1105/atomic_red_team/eventgen.conf

This command will not return results immediately. The process will run until you stop it, so it will be creating additional log data over time, based on timing info in the eventgen.conf file. Check the output file below in the output directory to confirm that data is being written.


If you want to limit the duration that this process runs, you can use the “timeout” linux command as follows, which stops eventgen at 120 seconds.

timeout -s 1 120 splunk_eventgen generate \

Results of Running Attack_Data
If you updated your eventgen.conf for the T1105 MITRE Attack and included stanzas for each of the log files, you should then see the following in the >/mydir/output directory


These are copies of the Attack_Data log files modified with the token replacements as configured in your eventgen.conf

You can now ingest these files using a Splunk monitor configuration. The YAML file in this directory contains the relevant sourcetypes for the log files. Unfortunately, the sourcetypes are not listed against the relevant log files so you will need to choose the correct sourcetype/log relation when setting up the monitor.

- XmlWinEventLog:Microsoft-Windows-Sysmon/Operational
- WinEventLog:Security
- sysmon_linux
- bit9:carbonblack:json

The sourcetypes listed above are only a small sample of the potential sourcetypes available for Attack_Data logs.

Many files will be in WinEventLog format:

02/16/2022 09:49:34 PM

Many will be in XMLWinEventLog format:

<Event xmlns=''> 

Some will be in Linux Sysmon format:

For this particular sourcetype, the YAML file will use Syslog:Linux-Sysmon/Operational as the sourcetype. Remember to substitute sysmon:linux when creating the monitor if you have the current Linux Sysmon TA installed.

For other sourcetypes, such as bit9:carbonblack:json, some investigation is required although it may be obvious from the log file naming itself (carbon_black_events.log).

If these steps have been followed correctly, you should now have T1105 logs in your Splunk Enterprise Security environment where you can ensure that notable events have been created for T1105. Otherwise, you can configure the related correlation searches so that this attack is found.


Once you have generated the log data, you can monitor it in Splunk using the correct sourcetype and index. With this process, you can simulate an attack in your environment and test your Splunk Enterprise Security detections.

Overall, configuring detections in Splunk Enterprise Security can be challenging when ingested logs do not have datasets representative of an attack. Splunk provides a repository of “Attack Data” to simulate an attack in your environment, and you can use Splunk Eventgen to replay these logs to your Splunk Server. By following the instructions above, you can configure Splunk Eventgen to generate log data for a specific Attack_Data dataset and test your Splunk Enterprise Security to detect attacks.

What’s your business waiting for?