Detecting DNS implants: Old kitten, new tricks – A Saitama Case Study

Max Groot & Ruud van Luijk

TL;DR

A recently uncovered malware sample dubbed ‘Saitama’ was uncovered by security firm Malwarebytes in a weaponized document, possibly targeted towards the Jordan government. This Saitama implant uses DNS as its sole Command and Control channel and utilizes long sleep times and (sub)domain randomization to evade detection. As no server-side implementation was available for this implant, our detection engineers had very little to go on to verify whether their detection would trigger on such a communication channel. This blog documents the development of a Saitama server-side implementation, as well as several approaches taken by Fox-IT / NCC Group’s Research and Intelligence Fusion Team (RIFT) to be able to detect DNS-tunnelling implants such as Saitama. The developed implementation as well as recordings of the implant are shared on the Fox-IT GitHub.

Introduction

For its Managed Detection and Response (MDR) offering, Fox-IT is continuously building and testing detection coverage for the latest threats. Such detection efforts vary across all tactics, techniques, and procedures (TTP’s) of adversaries, an important one being Command and Control (C2). Detection of Command and Control involves catching attackers based on the communication between the implants on victim machines and the adversary infrastructure.

In May 2022, security firm Malwarebytes published a two¹-part² blog about a malware sample that utilizes DNS as its sole channel for C2 communication. This sample, dubbed ‘Saitama’, sets up a C2 channel that tries to be stealthy using randomization and long sleep times. These features make the traffic difficult to detect even though the implant does not use DNS-over-HTTPS (DoH) to encrypt its DNS queries.

Although DNS tunnelling remains a relatively rare technique for C2 communication, it should not be ignored completely. While focusing on Indicators of Compromise (IOC’s) can be useful for retroactive hunting, robust detection in real-time is preferable. To assess and tune existing coverage, a more detailed understanding of the inner workings of the implant is required. This blog will use the Saitama implant to illustrate how malicious DNS tunnels can be set-up in a variety of ways, and how this variety affects the detection engineering process.

To assist defensive researchers, this blogpost comes with the publication of a server-side implementation of Saitama on the Fox-IT GitHub. This can be used to control the implant in a lab environment. Moreover, ‘on the wire’ recordings of the implant that were generated using said implementation are also shared as PCAP and Zeek logs. This blog also details multiple approaches towards detecting the implant’s traffic, using a Suricata signature and behavioural detection.

Reconstructing the Saitama traffic

The behaviour of the Saitama implant from the perspective of the victim machine has already been documented elsewhere³. However, to generate a full recording of the implant’s behaviour, a C2 server is necessary that properly controls and instructs the implant. Of course, the source code of the C2 server used by the actual developer of the implant is not available.

If you aim to detect the malware in real-time, detection efforts should focus on the way traffic is generated by the implant, rather than the specific domains that the traffic is sent to. We strongly believe in the “PCAP or it didn’t happen” philosophy. Thus, instead of relying on assumptions while building detection, we built the server-side component of Saitama to be able to generate a PCAP.

The server-side implementation of Saitama can be found on the Fox-IT GitHub page. Be aware that this implementation is a Proof-of-Concept. We do not intend on fully weaponizing the implant “for the greater good”, and have thus provided resources to the point where we believe detection engineers and blue teamers have everything they need to assess their defences against the techniques used by Saitama.

Let’s do the twist

The usage of DNS as the channel for C2 communication has a few upsides and quite some major downsides from an attacker’s perspective. While it is true that in many environments DNS is relatively unrestricted, the protocol itself is not designed to transfer large volumes of data. Moreover, the caching of DNS queries forces the implant to make sure that every DNS query sent is unique, to guarantee the DNS query reaches the C2 server.

For this, the Saitama implant relies on continuously shuffling the character set used to construct DNS queries. While this shuffle makes it near-impossible for two consecutive DNS queries to be the same, it does require the server and client to be perfectly in sync for them to both shuffle their character sets in the same way.

On startup, the Saitama implant generates a random number between 0 and 46655 and assigns this to a counter variable. Using a shared secret key (“haruto” for the variant discussed here) and a shared initial character set (“razupgnv2w01eos4t38h7yqidxmkljc6b9f5”), the client encodes this counter and sends it over DNS to the C2 server. This counter is then used as the seed for a Pseudo-Random Number Generator (PRNG). Saitama uses the Mersenne Twister to generate a pseudo-random number upon every ‘twist’.

To encode this counter, the implant relies on a function named ‘_IntToString’. This function receives an integer and a ‘base string’, which for the first DNS query is the same initial, shared character set as identified in the previous paragraph. Until the input number is equal or lower than zero, the function uses the input number to choose a character from the base string and prepends that to the variable ‘str’ which will be returned as the function output. At the end of each loop iteration, the input number is divided by the length of the baseString parameter, thus bringing the value down.

*Function used by Saitama client to convert an integer into an encoded string*.

To determine the initial seed, the server has to ‘invert’ this function to convert the encoded string back into its original number. However, information gets lost during the client-side conversion because this conversion rounds down without any decimals. The server tries to invert this conversion by using simple multiplication. Therefore, the server might calculate a number that does not equal the seed sent by the client and thus must verify whether the inversion function calculated the correct seed. If this is not the case, the server literately tries higher numbers until the correct seed is found.

Once this hurdle is taken, the rest of the server-side implementation is trivial. The client appends its current counter value to every DNS query sent to the server. This counter is used as the seed for the PRNG. This PRNG is used to shuffle the initial character set into a new one, which is then used to encode the data that the client sends to the server. Thus, when both server and client use the same seed (the counter variable) to generate random numbers for the shuffling of the character set, they both arrive at the exact same character set. This allows the server and implant to communicate in the same ‘language’. The server then simply substitutes the characters from the shuffled alphabet back into the ‘base’ alphabet to derive what data was sent by the client.

*Server-side implementation to arrive at the same shuffled alphabet as the client*.

Twist, Sleep, Send, Repeat

Many C2 frameworks allow attackers to manually set the minimum and maximum sleep times for their implants. While low sleep times allow attackers to more quickly execute commands and receive outputs, higher sleep times generate less noise in the victim network. Detection often relies on thresholds, where suspicious behaviour will only trigger an alert when it happens multiple times in a certain period.

The Saitama implant uses hardcoded sleep values. During active communication (such as when it returns command output back to the server), the minimum sleep time is 40 seconds while the maximum sleep time is 80 seconds. On every DNS query sent, the client will pick a random value between 40 and 80 seconds. Moreover, the DNS query is not sent to the same domain every time but is distributed across three domains. On every request, one of these domains is randomly chosen. The implant has no functionality to alter these sleep times at runtime, nor does it possess an option to ‘skip’ the sleeping step altogether.

*Sleep configuration of the implant. The integers represent sleep times in milliseconds*.

These sleep times and distribution of communication hinder detection efforts, as they allow the implant to further ‘blend in’ with legitimate network traffic. While the traffic itself appears anything but benign to the trained eye, the sleep times and distribution bury the ‘needle’ that is this implant’s traffic very deep in the haystack of the overall network traffic.

For attackers, choosing values for the sleep time is a balancing act between keeping the implant stealthy while keeping it usable. Considering Saitama’s sleep times and keeping in mind that every individual DNS query only transmits 15 bytes of output data, the usability of the implant is quite low. Although the implant can compress its output using zlib deflation, communication between server and client still takes a lot of time. For example, the standard output of the ‘whoami /priv’ command, which once zlib deflated is 663 bytes, takes more than an hour to transmit from victim machine to a C2 server.

Transmission between server implementation and the implant

The implant does contain a set of hardcoded commands that can be triggered using only one command code, rather than sending the command in its entirety from the server to the client. However, there is no way of knowing whether these hardcoded commands are even used by attackers or are left in the implant as a means of misdirection to hinder attribution. Moreover, the output from these hardcoded commands still has to be sent back to the C2 server, with the same delays as any other sent command.

Detection

Detecting DNS tunnelling has been the subject of research for a long time, as this technique can be implemented in a multitude of different ways. In addition, the complications of the communication channel force attackers to make more noise, as they must send a lot of data over a channel that is not designed for that purpose. While ‘idle’ implants can be hard to detect due to little communication occurring over the wire, any DNS implant will have to make more noise once it starts receiving commands and sending command outputs. These communication ‘bursts’ is where DNS tunnelling can most reliably be detected. In this section we give examples of how to detect Saitama and a few well-known tools used by actual adversaries.

Signature-based

Where possible we aim to write signature-based detection, as this provides a solid base and quick tool attribution. The randomization used by the Saitama implant as outlined previously makes signature-based detection challenging in this case, but not impossible. When actively communicating command output, the Saitama implant generates a high number of randomized DNS queries. This randomization does follow a specific pattern that we believe can be generalized in the following Suricata rule:

alert dns $HOME_NET any -> any 53 (msg:"FOX-SRT - Trojan - Possible Saitama Exfil Pattern Observed"; flow:stateless; content:"|00 01 00 00 00 00 00 00|"; byte_test:1,>=,0x1c,0,relative; fast_pattern; byte_test:1,<=,0x1f,0,relative; dns_query; content:"."; content:"."; distance:1; content:!"."; distance:1; pcre:"/^(?=[0-9]+[a-z]\|[a-z]+[0-9])[a-z0-9]{28,31}\.[^.]+\.[a-z]+$/"; threshold:type both, track by_src, count 50, seconds 3600; classtype:trojan-activity; priority:2; reference:url, https://github.com/fox-it/saitama-server; metadata:ids suricata; sid:21004170; rev:1;)

This signature may seem a bit complex, but if we dissect this into separate parts it is intuitive given the previous parts.

Content Match	Explanation
00 01 00 00 00 00 00 00	DNS query header. This match is mostly used to place the pointer at the correct position for the byte_test content matches.
byte_test:1,>=,0x1c,0,relative;	Next byte should be at least decimal 25. This byte signifies the length of the coming subdomain
byte_test:1,<=,0x1f,0,relative;	The same byte as the previous one should be at most 31.
dns_query; content:”.”; content:”.”; distance:1; content:!”.”;	DNS query should contain precisely two ‘.’ characters
pcre:”/^(?=[0-9][a-z]\|[a-z][0-9])[a-z0-9] {28,31} \.[^.]\.[a-z]$/”;	Subdomain in DNS query should contain at least one number and one letter, and no other types of characters.
threshold:type both, track by_src, count 50, seconds 3600	Only trigger if there are more than 50 queries in the last 3600 seconds. And only trigger once per 3600 seconds.

Table one: Content matches for Suricata IDS rule

The choice for 28-31 characters is based on the structure of DNS queries containing output. First, one byte is dedicated to the ‘send and receive’ command code. Then follows the encoded ID of the implant, which can take between 1 and 3 bytes. Then, 2 bytes are dedicated to the byte index of the output data. Followed by 20 bytes of base-32 encoded output. Lastly the current value for the ‘counter’ variable will be sent. As this number can range between 0 and 46656, this takes between 1 and 5 bytes.

Behaviour-based

The randomization that makes it difficult to create signatures is also to the defender’s advantage: most benign DNS queries are far from random. As seen in the table below, each hack tool outlined has at least one subdomain that has an encrypted or encoded part. While initially one might opt for measuring entropy to approximate randomness, said technique is less reliable when the input string is short. The usage of N-grams, an approach we have previously written about⁴, is better suited.

Hacktool	Example
DNScat2	35bc006955018b0021636f6d6d616e642073657373696f6e00.domain.tld⁵
Weasel	pj7gatv3j2iz-dvyverpewpnnu–ykuct3gtbqoop2smr3mkxqt4.ab.abdc.domain.tld
Anchor	ueajx6snh6xick6iagmhvmbndj.domain.tld⁶
Cobalt Strike	Api.abcdefgh0.123456.dns.example.com or post. 4c6f72656d20697073756d20646f6c6f722073697420616d65742073756e74207175697320756c6c616d636f20616420646f6c6f7220616c69717569702073756e7420636f6d6d6f646f20656975736d6f642070726.c123456.dns.example.com
Sliver	3eHUMj4LUA4HacKK2yuXew6ko1n45LnxZoeZDeJacUMT8ybuFciQ63AxVtjbmHD.fAh5MYs44zH8pWTugjdEQfrKNPeiN9SSXm7pFT5qvY43eJ9T4NyxFFPyuyMRDpx.GhAwhzJCgVsTn6w5C4aH8BeRjTrrvhq.domain.tld
Saitama	6wcrrrry9i8t5b8fyfjrrlz9iw9arpcl.domain.tld

Table two: Example DNS queries for various toolings that support DNS tunnelling

Unfortunately, the detection of randomness in DNS queries is by itself not a solid enough indicator to detect DNS tunnels without yielding large numbers of false positives. However, a second limitation of DNS tunnelling is that a DNS query can only carry a limited number of bytes. To be an effective C2 channel an attacker needs to be able to send multiple commands and receive corresponding output, resulting in (slow) bursts of multiple queries.

This is where the second step for behaviour-based detection comes in: plainly counting the number of unique queries that have been classified as ‘randomized’. The specifics of these bursts differ slightly between tools, but in general, there is no or little idle time between two queries. Saitama is an exception in this case. It uses a uniformly distributed sleep between 40 and 80 seconds between two queries, meaning that on average there is a one-minute delay. This expected sleep of 60 seconds is an intuitive start to determine the threshold. If we aggregate over an hour, we expect 60 queries distributed over 3 domains. However, this is the mean value and in 50% of the cases there are less than 60 queries in an hour.

To be sure we detect this, regardless of random sleeps, we can use the fact that the sum of uniform random observations approximates a normal distribution. With this distribution we can calculate the number of queries that result in an acceptable probability. Looking at the distribution, that would be 53. We use 50 in our signature and other rules to incorporate possible packet loss and other unexpected factors. Note that this number varies between tools and is therefore not a set-in-stone threshold. Different thresholds for different tools may be used to balance False Positives and False Negatives.

In summary, combining detection for random-appearing DNS queries with a minimum threshold of random-like DNS queries per hour, can be a useful approach for the detection of DNS tunnelling. We found in our testing that there can still be some false positives, for example caused by antivirus solutions. Therefore, a last step is creating a small allow list for domains that have been verified to be benign.

While more sophisticated detection methods may be available, we believe this method is still powerful (at least powerful enough to catch this malware) and more importantly, easy to use on different platforms such as Network Sensors or SIEMs and on diverse types of logs.

Conclusion

When new malware arises, it is paramount to verify existing detection efforts to ensure they properly trigger on the newly encountered threat. While Indicators of Compromise can be used to retroactively hunt for possible infections, we prefer the detection of threats in (near-)real-time. This blog has outlined how we developed a server-side implementation of the implant to create a proper recording of the implant’s behaviour. This can subsequently be used for detection engineering purposes.

Strong randomization, such as observed in the Saitama implant, significantly hinders signature-based detection. We detect the threat by detecting its evasive method, in this case randomization. Legitimate DNS traffic rarely consists of random-appearing subdomains, and to see this occurring in large bursts to previously unseen domains is even more unlikely to be benign.

Resources

With the sharing of the server-side implementation and recordings of Saitama traffic, we hope that others can test their defensive solutions.

The server-side implementation of Saitama can be found on the Fox-IT GitHub.

This repository also contains an example PCAP & Zeek logs of traffic generated by the Saitama implant. The repository also features a replay script that can be used to parse executed commands & command output out of a PCAP.

References

[1] https://blog.malwarebytes.com/threat-intelligence/2022/05/apt34-targets-jordan-government-using-new-saitama-backdoor/
[2] https://blog.malwarebytes.com/threat-intelligence/2022/05/how-the-saitama-backdoor-uses-dns-tunnelling/
[3] https://x-junior.github.io/malware%20analysis/2022/06/24/Apt34.html
[4] https://atomic-temporary-13954993.wpcomstaging.com/2019/06/11/using-anomaly-detection-to-find-malicious-domains/

Share this:

Related