Programming/Project Euler

[C/C++] Project Euler #59 - XOR Decryption

NoxLuna 2025. 5. 9. 10:29

This problem is rated at a 5% difficulty level, but if solved in a standard way, I believe it would be more difficult than that.

The method of decrypting this kind of cipher often involves brute-force repeated substitution. While simple repeated substitution can crack the cipher, it often takes thousands or even tens of thousands of years, even with the help of tens of thousands of computers. Things might change with the advent of quantum computers, though.

In this problem, the encryption is cracked using a three-letter lowercase key. If we were to solve it in line with the original intent (i.e., at 5% difficulty), we would generate all possible three-letter lowercase keys and check if the resulting decrypted text forms natural English words—such as “a”, “this”, “is”, “are”, “you”, and so on.

decryption



However, I attempted a classical approach, assuming that the encryption key is hard to deduce. Of course, the key length must be known. Generally, most simple repeated XOR-encrypted documents can be cracked using this method, unless the document is too short relative to the key length. I used a statistical approach to solve the problem.

Step 1: I identified letters that frequently appear in English words—such as i, a, e, o, c, d, g, h, f. You must also not forget the space character.

Step 2: Then I created a frequency distribution (histogram) of the repeating characters based on the key length. Using that distribution, I tried substituting the frequently occurring characters from Step 1 to deduce the key. This way, I was able to identify the encryption key.

The actual execution output is as follows:

12 6 48 8 5 5 18 0 18 28 6 12 2 0 18 25 7 4 5 28 17 15 0 3 0 0 0 0 0 0
3 2 4 1 0 0 0 1 0 0 0 1 0 0 0 3 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 70 0 11 0 5 0 0 1 1 0 0 0 1 1 0 2 1 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
26 16 8 18 0 0 28 35 8 6 31 16 5 3 20 0 0 0 0 0 0 0 5 0 9 6 6 21 20 6
0 2 0 0 0 0 0 0 1 2 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 2 0 0 4 1 0 0 0 0
2 0 2 0 0 0 0 1 0 0 0 0 0 0 85 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11 41 4 11 0 16 3 6 8 7 23 26 24 21 0 1 31 5 5 7 5 0 17 14 0 0 0 0 1 5
1 0 0 1 0 3 0 1 1 0 1 0 0 0 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 77 1 0 0 4 0 4 0 0 0 0 0 0 1 0 1 0 5 2 1 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

keyword = $%&
(The Gospel of John, chapter 1) 1 In the beginning the Word already
existed. He was with God, and he was God. 2 He was in the beginning
with God. 3 He created everything there is. Nothing exists that he
didn't make. 4 Life itself was in him, and this life gives light to
everyone. 5 The light shines through the darkness, and the darkness
can never extinguish it. 6 God sent John the Baptist 7 to tell
everyone about the light so that everyone might believe because of
his testimony. 8 John himself was not the light; he was only a
witness to the light. 9 The one who is the true light, who gives
light to everyone, was going to come into the world. 10 But
although the world was made through him, the world didn't recognize
him when he came. 11 Even in his own land and among his own people,
he was not accepted. 12 But to all who believed him and accepted him,
he gave the right to become children of God. 13 They are reborn!
This is not a physical birth resulting from human passion or plan,
this rebirth comes from God.14 So the Word became human and lived
here on earth among us. He was full of unfailing love and
faithfulness. And we have seen his glory, the glory of the only
Son of the Father.
ans = ******


The first three lines are the frequency distributions (histograms) of the characters. The keyword obtained through statistical methods was ***. The decrypted result was displayed afterward.

Below is the source code I wrote.

//------------------------------------------------
//    Project Euler #59 - XOR Decryption
//        - by Aubrey Choi
//        - created at 2015-04-02
//------------------------------------------------
#include <stdio.h>
#include <memory.h>

int main()
{
    int histo[3][256];
    char cipher[] = 
    {
        #include "p059_cipher.txt"
    };
    
    memset(histo, 0, sizeof(histo));
    for( int i = 0 ; i < sizeof(cipher) ; i++ )
        histo[i%3][cipher[i]]++;
    
    for( int i = 0 ; i < 3 ; i++ )
    {
        for( int j = 0 ; j < 128 ; j++ ) printf("%d ", histo[i][j]);
        putchar('\n');
    }

    char keyword[3] = { 0, 0, 0 };

    for( int i = 0 ; i < 3 ; i++ )
    {
        char *guess = " iaeocdghf";
        int maxh = histo[i][0];
        int maxv = 0;
        for( int j = 1 ; j < 128 ; j++ )
            if( maxh < histo[i][j] ) { maxh = histo[i][j]; maxv = j; }
        for( int j = 0 ; *(guess+j) ; j++ )
        {
            char s = *(guess+j)^maxv;
            if( s < 'a' || s > 'z' ) continue;
            int k;
            for( k = 0 ; k < 128 ; k++ )
            {
                if( histo[i][k] == 0 ) continue;
                if( (k^s) < ' ' || (k^s) > 'z' ) break;
            }
            if( k < 128 ) continue;
            keyword[i] = s;
            break;
        }
    }
    printf("keyword = %c%c%c\n", keyword[0], keyword[1], keyword[2]);
    int sum = 0;
    for( int i = 0 ; i < sizeof(cipher) ; i++ )
    {
        putchar(cipher[i]^keyword[i%3]);
        sum += cipher[i]^keyword[i%3];
    }
    putchar('\n');
    printf("ans = %d\n", sum);
}


I created a variable called histo to store the frequency distribution. If you’re only working within the ASCII range, you only need to store 128 values.

The frequency distribution changes depending on the length of the encryption key. If the key length were infinite, statistical decryption becomes practically impossible. You would need an extremely large amount of encrypted data using the same key. (In the movie The Imitation Game, there is a sequence where they crack a German cipher, and the algorithm used to find the key is very similar to this.)

Imitation Game(Movie)



Although the protagonist appeared a bit foolish and the historical accuracy was off in some parts, I still enjoyed the movie. (Oops, went off on a tangent there.)

Afterwards, using the guessed English characters stored in the guess variable, the decryption key is identified.

반응형