Blog

Parsing OpenPGP Key Export Format with JavaScript

If you tuned in last time to my post about scalar numbers and multiprecision integers, you might think that I'm just writing a book report for the OpenPGP spec. That’s not actually the case. My goal is to write a JavaScript app that allows encryption and decryption of OpenPGP messages, and I’m trying to document my progress as it happens. Reading the spec and obsessing over every definition will only get us so far. I’m tired of reading, so now it’s time to write some code.

In this post, I will be walking through the process of reading OpenPGP public key files with JavaScript. Before we can get on to the more glamorous work of encrypting and decrypting OpenPGP messages, we need a way to get public and private keys into our app. And even before we can do this, we need a way of getting JavaScript to simply read data from the files themselves.

OpenPGP data files and JavaScript

OpenPGP data files—which can include public key, private key or encrypted messages (among other things)—can be stored as raw binary or as, what the spec calls, ASCII Armored data. “ASCII Armor” (see section 6.2 in the spec) is a fancy way of saying that the binary data is Base64-encoded and saved as ASCII along with a checksum and some particular headers. When you decode an ASCII Armored data file, you should get the exact same binary data as if you had opened an equivalent binary file. (That’s part of what we’re going to prove today.)

Unfortunately, Javascript makes it a bitch to work with binary files, so that’s why I’m devoting a post solely to reading files into JavaScript and Base64 decoding ASCII Armor into binary. We’ll actually parse the binary into meaningful data structures at a later time. In order to properly decode ASCII Armored data, we’ll need JavaScript functions to Base64 decode/encode as well as a CRC24 checksum function. The spec helpfully describes how all of this will work, and we’re going to implement it!

Export your Public Key files with GNUPG

GnuPG is what all the cool kids use for PGP encryption these days, and ensuring interoperability with their export files and messages is a big priority for my project (although they can keep their license). If you don’t already have it set up, check out this excellent intro to OpenPGP by Zachary Voase, which includes installation instructions and usage tips for GnuPG. Once you have it set up and you have at least one entry in your public keyring (probably your own personal public key/subkey), export your keyring to both binary and ASCII Armored files using the following commands:

gpg2 --export > pubkeys.gpg
gpg2 --armor --export > pubkeys.asc

Opening ASCII Armored keyfiles with JavaScript’s File API

The previous commands exported two files: pubkeys.gpg, which is binary, and pubkeys.asc, which is ASCII Armored. Using a relatively new “HTML5” feature in JavaScript, (namely the File API) we can read the data from these files directly in the browser without first posting them to a server or anything ugly like that. This might not work on older browsers, so everyone using those browsers can cry me a river.

In order to open a file with JavaScript, we need one of those file uploader form input elements. I shall give it the id ascii_keyfile.

<input type="file" id="ascii_keyfile"/>

Next we use JavaScript to bind an event to the file input so we can call a function when it changes. Note that I’m using MooTools for the event binding and document.getElementById shortcutting, but you can easily substitute your own favorite JavaScript framework as needed.

window.addEvent('domready', function() {
    $('ascii_keyfile').addEvent('change', open_ascii_keyfile);
});

var open_ascii_keyfile = function() {
    var file = $('ascii_keyfile').files[0];
    read_file(file, parse_ascii_keyfile);
}

var read_file = function(file, callback) {
    var reader = new FileReader();
    reader.onload = function(evt) {
        var binary = evt.target.result;
        callback(binary);
    };
    reader.readAsBinaryString(file);
}

var parse_ascii_keyfile = function(data) {
    // ... ACTUALLY DECODE THE FILE HERE ...
}

This code binds a change event to the ascii_keyfile input. When you open your file via the uploader button, this event fires and triggers the open_ascii_keyfile function. This pulls the File object for the attachment out of the input and sends it into the read_file function along with the parse_ascii_keyfile callback function. Since reading data from the file may take awhile, it happens asynchronously. The read_file function starts this process and, when it completes, it sends the file’s data into the parse_ascii_keyfile callback.

If all goes well, the file will be read and passed into parse_ascii_keyfile as ASCII text. It should look something like this:

-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: GnuPG v2.0.19 (Darwin)

mQENBFEN6EoBCADChZ+c6Q84tJ+WLTKYfhdN49OTUlxmoZD8cou6Bdi/EKXvpciA
ydnD+SmlYf4pjAOwEiEsKJ6swLORAam4q0pnW9gAALbclhwDf9J4sLwUkh4F4D9P
6TJX2vPEk4WRkudkj2TW3H2Wn1d7fQ3zlwLtK/bC5YeajuAIAk1m5zCtMbeZoYGc
FWU+Max2G4Xr1/5JmUzfVtVSlxdJj7SX1FtJ/zj/eWklKNtl05yBWA+NyFpkgkzR
DP+oJYBPdNoyS5mqNNIEnIIjDAUiufhGzkk2+865gIOH9X2WWCB5p0EGsR8ZzZA6
H379WPca+GTlu5JncEi7lLcg+eQRwxQu9S6XABEBAAG0QEplZmYgTCAoaHR0cDov
L3J1YmJpbmdhbGNvaG9saWMuY29tKSA8amVmZkBydWJiaW5nYWxjb2hvbGljLmNv
bT6JATgEEwECACIFAlEN6EoCGwMGCwkIBwMCBhUIAgkKCwQWAgMBAh4BAheAAAoJ
EJmhufkmy7JniVYH/3Mjgo2gDDsc8tTPaIsBbYacB40pMOMX7+KxSQktrUZkGqwJ
TlGnfBB4R8jz+32dBjX/OmeGYFTl9xFMBx+MuQlHW9Sl0ffV+Gpk9YbebBZaPn7Y
5OpinF9e7zuFH7MyI72SIM7S1CvvfP3QrYj7viBitddJ+eW3Vx3ANgpkr8Bj9auH
oT043dlfm/xpqozOLwbVM0BADJge0zQvNKGpoZjoHU2mNSSGhWhXAPCRp9wVOCCE
SxbL+2Wi++ZUQUXO9DIxAQJy6HJfPx+PvBeedGAisfovNwtT0tDfJvyyPnRTKtEz
TWiWDYwNUY80A6o/KkmSxSs5OeSh4t08phBIzWu5AQ0EUQ3oSgEIAMZf+w8pVqj/
ZUQtacxzDe52kz+HtljJq4ltxulxQtoln5VkP5vWGq3uF1RFBoLVZ0OE/61yZixG
8pOPMiGzHWJtidtQk7GxT/Z/b34voeTeruZjpfm3ty14sQvmApaRpjEQaNFTPy7d
DiJKqGkD7teb/Mx8rtWJpN60hTiww1cOP5VjBvC82mn6uZ9DU2vJ6VwBTmwYnZMa
XLiGRIpEAOqtLag4XwYrHS04H7No3asxSGhlyVN2KnxvlIMwoTZ+bTVaOr2ivCIC
el1dY2kC5LsfMa04z2Ne7fme+pnGM62ufC+l/T9H58vsw1VFl5vanYmJugtFzxHF
HzU3atdbHzEAEQEAAYkBHwQYAQIACQUCUQ3oSgIbDAAKCRCZobn5JsuyZ138CACu
mdutchMDVE7V8ewhzsOCHgSMQjnmkB0HFCll2RxbhLz6x8SmzcQK107XbHQwFCdF
A5v4JgFtwb6b9W9WShemNvC7tNx/loo2C+EiUKA9tURo/rJORu6S1jR79BaaOUUj
MsB/jxxF2eRzE86SzgWXj34pYyoqJeMaiLSdXcCNW8eyN1i3gf8XpMlM7Ldv0Bq7
vqbU2sDXBQvPDbNyhVIZjqfjTOBJl54NWHYRXlybFaSrXb7Qg/9ac+54TPpgCBTs
1kR/HSZDujWE891NqlKGpSN4MDyi3WRL2RVbW0s5+8f8odNJuswIo1tWiNXBHVXs
2/eCtlrSbyoTGYj0ErY0
=a7UR
-----END PGP PUBLIC KEY BLOCK-----

There are two parts to this data that we care about. First we have the big block of ASCII characters grouped 76 to a line. That’s the actual data encoded as Base64. Then, the second to last line is 4 more Base64 characters preceded by an equal sign—in this case it’s =a7UR. That’s our CRC24 checksum (stay tuned for more on that).

Isolating and decoding our Base64 data

The first thing to do is to isolate the Base64-encoded payload data from the block above. We can do that with regular expressions:

var parse_ascii_keyfile = function(data) {
    // Our data begins at the first character index preceded by a blank line.
    var body_begin_index    = data.search(/^(\r\n|\n|\r)/m) + 1;

    // Our data ends right before the checksum line which starts with an "="
    var body_end_index      = data.search(/^\=/m);

    // Both of these indexes need to exist for the file to be readable.
    if (body_begin_index == -1 || body_end_index == -1) {
        alert('This is not a valid ASCII-Armored OpenPGP export file.');
        return false;
    }

    // Pull the body out of the data and strip all newlines from it
    var body        = data.substring(body_begin_index, body_end_index);
    var body        = body.replace(/(\r\n|\n|\r)/gm, '');

    // Grab the checksum while we're at it...
    var body_checksum   = data.substr(body_end_index + 1, 4);
    ...

Now we’ve isolated our Base64-encoded data into the body variable. It’s time to decode it! Some browsers natively support Base64 encoding and decoding respectively via the btoa(data) and atob(text) functions, but I don’t trust them. I’ve seen Base64 implementations that try to “helpfully” UTF8-encode/decode the incoming/outgoing data, and this will actually break our binary data. Plus I was geeking out on this awesome guide to Base64 in JavaScript (if you don’t know what Base64 is or roughly how it works, check it out), so I decided to write my own:

var base_64 = {
    chars: 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/',

    encode: function(data) {
        var output = '';
        for (i=0, c=data.length; i<c; i += 3)
        {
            var char1 = data.charCodeAt(i) >> 2;
            var char2 = ((data.charCodeAt(i) & 3) << 4) | data.charCodeAt(i+1) >> 4;
            var char3 = ((data.charCodeAt(i+1) & 15) << 2) | data.charCodeAt(i+2) >> 6;
            var char4 = data.charCodeAt(i+2) & 63;

            output  +=  this.chars.charAt(char1)
                        +   this.chars.charAt(char2)
                        +   this.chars.charAt(char3)
                        +   this.chars.charAt(char4);
        }
        if (c % 3 == 1)
            output = output.substr(0, output.length - 2) + '==';
        else if (c % 3 == 2)
            output = output.substr(0, output.length - 1) + '=';

        return output;
    },

    decode: function(str) {
        var data = '';

        for (i=0, c=str.length; i<c; i += 4)
        {
            var char1 = this.chars.indexOf(str.charAt(i));
            var char2 = this.chars.indexOf(str.charAt(i+1));
            var char3 = this.chars.indexOf(str.charAt(i+2));
            var char4 = this.chars.indexOf(str.charAt(i+3));

            data += String.fromCharCode(char1 << 2 | char2 >> 4);
            if (char3 != -1)
                data += String.fromCharCode((char2 & 15) << 4 | char3 >> 2)
            if (char4 != -1)
                data += String.fromCharCode((char3 & 3) << 6 | char4);
        }
        return data;
    }
}

Now, getting back to our parse_ascii_keyfile function, Base64-decoding the data is as simple as:

var decoded_body = base_64.decode(body);

Computing the CRC24 checksum

The previous code should return a string of 8-bit binary characters. But how do we know this is the correct string of 8-bit binary characters? That’s where our CRC24 checksum function comes in to play. The CRC24 checksum is sort of like a cheap hash function. Given a certain input (in our case the binary data), it should provide an output which is somewhat uniquely mapped to the input and consistent (non-random). This output is in the form of 24 binary bits. If we then Base64 encode the output, we should get 4 characters which precisely match the CRC24 checksum near the bottom of our ASCII Armored keyfile. If you’re confused, check out Section 6 of the OpenPGP spec—it explains this pretty well.

The spec also includes a very helpful example of a CRC24 function written in C, which translates very easily to JavaScript:

var crc24 = function(data) {
    var crc = 0xb704ce;
    var len = data.length;
    while (len--) {
        crc ^= (data.charCodeAt((data.length-1) - len)) << 16;
        for (i=0; i<8; i++) {
            crc <<= 1;
            if (crc & 0x1000000)
                crc ^= 0x1864cfb;
        }
    }
    return number_to_binstring(crc, 24);
}

var number_to_binstring = function(bin, bits) {
    bits || (bits = 32);
    var text = Array();
    var i = (bits < 32 && bits > 0 && bits % 8 == 0) ? (bits / 8) : 4;
    while (i--) {
        if (((bin>>(i*8))&255) || text.length) {
            text.push(String.fromCharCode(((bin>>(i*8))&255)))
        }
    }
    return text.join('')
}

The major difference between this and the spec is my addition of the number_to_binstring function. In the crc24 function, JavaScript is performing a bunch of bitwise operations on the crc integer variable as it iterates over the 8-bit ASCII character codes associated with each byte of the input data. These operations are performed numerically, even though the data itself is in the form of a string (JavaScript is kind of janky here with its spotty support for ByteArrays). The number_to_binstring function simply converts the resulting 24-bit crc number value back to a string composed of 3 8-bit ASCII bytes.

We previously read the ASCII Armor checksum into the body_checksum variable. Now we can take our decoded data and compute its checksum, then Base64-encode that checksum and compare to body_checksum. If it’s a match then we can do a victory dance, because the data was not corrupted and it decoded properly!

var decoded_checksum    = base_64.encode(crc24(decoded_body));

if (body_checksum != decoded_checksum) {
    alert('Checksum mismatch! (Expected '+body_checksum+', got '+decoded_checksum+')');
    return false;
}
// Our data decoded successfully
...

Opening binary files is easier. The checksums should match!

We can repeat the same basic steps as far as the read_file function to read a binary file into JavaScript. There’s no special decoding required once you get the file data, but one useful result is to compute the checksum. If you’re working with the binary version of the same key export as you opened previously in ASCII Armored format, then the checksum on the data should be the same. This is useful for proving this whole process is sane.

After all this work, all we got was some lousy binary data in JavaScript. Actually reading the data and doing something useful with it is a whole ‘nother can of worms. But not to despair—we have to crawl before we can walk, and walk before we can run, and I really need some whiskey so I’m done for now.

Learning about scalar numbers and multiprecision integers

In pursuit of a JavaScript OpenPGP Implementation

Introduction

Recently I’ve been studying the OpenPGP specification in hopes of developing a Javascript implementation. I’m aware that someone has already done this, however I’m not a fan of using GPL-licensed code (I prefer more permissive MIT-style licensing). Furthermore, I have an academic curiosity towards learning and doing this myself, and thus I set out to learn all there is to know about PGP encryption.

Shortly into reading the spec, I realized that there is a lot of math and number theory involved in OpenPGP, and found that I’ve become rusty since dropping out of my math major (I left halfway through my undergrad cryptography course—d'oh!). Additionally, there is a lot of technical notation within the spec that is over and above anything I’ve encountered before. This isn’t the sort of stuff you can just search for on Google and quickly have the knowledge piped into your brain. There is much learning to be done, and not a lot of existing hand-holding online to help, and that’s why I’m writing this. Consider this an online version of my math notebook. If I can help you learn PGP in the process of helping myself, then great.

Disclaimer: I am not an authority on the subject of cryptography. I’m doing my best, but the information contained herein may be inaccurate, incomplete, or simply wrong. I encourage you to fact-check and work through the examples on your own. Please let me know if you find any errors in my posts.

Scalar and big-endian numbers

I assume you’ve read up to section 3 in the OpenPGP specification. Despite it being an ugly .txt file, it’s a surprisingly good read. You have a basic conceptual understanding of what PGP encryption is and how it works. Great. Well, section 3.1 is where it really starts getting technical, so let’s start off with a quote:

3.1 Scalar Numbers

Scalar numbers are unsigned and are always stored in big-endian format. Using n[k] to refer to the kth octet being interpreted, the value of a two-octet scalar is ((n[0] << 8) + n[1]). The value of a four-octet scalar is ((n[0] << 24) + (n[1] << 16) + (n[2] << 8) + n[3]).

Let’s pick this apart. An unsigned number means that it’s above zero, but what the hell is “big-endian format?” According to Wikipedia, a big-endian number has its most significant digits first. That’s easy—we use big-endian decimal numbers all the time in our daily lives. For example, if you find a sack containing $1,024 in a dark alley, you know that 1 represents the thousands digit, 0 represents the hundreds digit, 2 represents the tens, and 4 represents the 1’s. Simple enough.

Now, we have to start thinking like a computer. An octet is a more specific way of saying a byte, or 8 binary bits. Be familiar with binary. It’s the most basic way a computer represents information and is a base-2 numeric notation (also in big-endian format) where each digit represents a power of two.

For example, decimal 187 can be represented by binary octet 10111011 as we see below:

187 represented in binary:

Summing 128 + 32 + 16 + 8 + 2 + 1 = 187. Make sense? Getting back to the specification, we need to learn some notation. Specifically, how do we interpret:

the value of a two-octet scalar is ((n[0] << 8) + n[1])

This is easiest to explain by way of example. Supposing you have a scalar number consisting of two octets 10111011 and 01000101 (69, lol). Respectively, these may be referred to by n[0] and n[1]. With n[0] << 8, we perform the operation of shifting n[0]’s binary representation over to the left by 8 binary places, thus multiplying n[0]’s value by 2^8. This can best be visualized:

10111011 << 8:

Summing 32768 + 8192 + 4096 + 2048 + 512 + 256 = 47872 (which is 187 * 28). Now by adding in our value for n[1], we get 47872 + 69 = 47941. This can be represented in binary

(10111011 << 8) + 01000101:

In other words, we compute the value of a scalar number by taking the binary representations of each of its octets and smushing them together into one big long binary number. Whew.

Multiprecision Integers

3.2. Multiprecision Integers

Multiprecision integers (also called MPIs) are unsigned integers used to hold large integers such as the ones used in cryptographic calculations.

An MPI consists of two pieces: a two-octet scalar that is the length of the MPI in bits followed by a string of octets that contain the actual integer.

Section 3.2 gives a couple of examples, which I will explain, but first you’ll need a basic understanding of hexadecimal numbers. With binary, each digit (between 0 and 1) represents a power of two. With decimal, each digit (between 0 and 9) represents a power of 10. With hexadecimal, each digit (between 0 and …15?) represents a power of 16. Since we don’t have any numeric characters to represent numbers above 9, we start using letters of the alphabet! So we count 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F. lol

Here’s why we care about hexadecimal: it easily converts to binary for computer applications, since 16 is a power of 2, and it gives us a much more efficient way for notating octets than writing out their full binary representation. For example, we can write 249 in binary as:

249 in binary:

But it's a hell of a lot more concise in hexadecimal:

249 in hexadecimal:

Now, consider our first example using hexadecimal numbers in section 3.2:

The string of octets [00 01 01] forms an MPI with the value 1.

In this simple example, the first two octets 00 01 indicate the length of the multiprecision integer in bits, whereas the final 01 represents the actual integer. If we convert hexidecimal octets 00 01 01 to binary, we have 00000000 00000001 00000001. So the first two octets tell us that the length of our MPI is 1 bit, and the final octet is the binary representation for the number 1 (which is indeed just 1 bit).

The string [00 09 01 FF] forms an MPI with the value of 511.

If we convert this to binary octets, we have 00000000 00001001 00000001 11111111. The first two octets, again, form a scalar number indicating the length of the MPI in bits. Converting 00000000 00001001 to decimal, we see that our MPI is 9 bits long. The final two octets 00000001 11111111 represent the integer itself. We can smush these octets together into one binary scalar number 000000111111111 which has decimal representation 511. Notice that all significant digits occur in the first 9 bits (starting from the right), as indicated by our length scalar.

With the notation for multiprecision integers, we can represent extremely large integers (up to 65,536 bits), far beyond the typical 32-bit memory limit that a computer imposes on a regular integer value. This will become important later on in our cryptographic calculations, where we work with very large prime numbers.

blog

[TAGGED: binary]