Techniques for verifying shasums conveniently

Scenario: You’ve downloaded a file. It contains software that you’re itching to install and use. But wait! There’s a SHA256 checksum on the webpage and you should verify it first.

Easy, just run shasum -a 256 myfile.tar.xz.

Ugh, hang on. How do you know if the hash on the webpage is the same as the hash printed in the terminal? Maybe you’ll copy them into a text file and eyeball them? Stuff it. It’s probably fine, right?

I come up against these problems all the time:

  • I would like to pass the correct hash as a command line argument and have the computer tell me if it’s right
  • The correct hash is in a file or stdin but it’s not in that fancy SHASUMS format
  • I have one of those SHASUMS files but my filename got messed up when I downloaded it and no longer matches what’s in the check file

Yes, they can all be worked around easily enough, but why does this friction have to exist?

In this post I’m going to briefly describe the hash-checking software we have out-of-the-box today, current techniques for verifying hashes from the command line, and my efforts to improve the situation. The conclusion is that doing this awesomely will require a new CLI tool, which I am currently developing, because then at least I’ll get to use it.

Existing Techniques - Mac/Linux

The two main tools are the perl script shasum, and md5sum and friends from GNU coreutils. This was a recent surprise for me: for years I’d been using sha256sum and shasum -a 256 interchangeably without realising they were different programs.

md5sum/sha1sum/sha256sum coreutils

These are the GNU utilities, which are not available by default on Mac. They are actually all derived from a single source file called md5sum.c, using different compiler options to select the variant.

Running it on a file produces the output you normally expect for a SHASUMS file:

$ sha1sum test.tar.xz
0eed545d3abfecb9e3f606571a4a9a6fa71e4bc3  test.tar.xz

If you happen to have a SHASUMS file like that in the same directory, you can check it:

$ sha1sum -c SHASUMS
test.tar.xz: OK

What if we want to paste in the checksum? The only option we have using this tool is to create a SHASUMS file manually, or pass the same information via stdin:

$ echo "0eed545d3abfecb9e3f606571a4a9a6fa71e4bc3  test.tar.xz" | sha1sum -c -
test.tar.xz: OK

This sucks for two reasons.

  1. You have to type all this out precisely, including two spaces between the checksum and the filename. (There are reasons for the double space and you probably won’t like them. Very interesting history though.)
  2. The double space means you have to wrap the echo in double quotes, so tab completion won’t work on the filename.

There are countless other ways you could manipulate and compare the output but I think this technique is about as good as it gets.

shasum

shasum is a perl script based on libraries authored by Mark Shelor. Available on Mac and GNU/Linux, it mimics the coreutils implementation while offering some fancy features that GNU doesn’t, like calculating digests of input data which isn’t a multiple of 8-bit bytes. (I hope I never find myself in a situation where I need to do that.)

Usage is similar to coreutils except we choose the algorithm with the -a option:

$ shasum -a 256 test.tar.xz
8e884b4af6426788dfa31370cbca0202fda0baacdf89ca8363a3edceb7a20b20  test.tar.xz

The workaround to verify a hash on the command line is basically the same. Unless we’re using the default of SHA-1 we’re required to specify the algorithm with -a. However, the perl options parser is a bit more permissive than GNU and we can omit the spaces in our short options.

$ echo "8e884b4af6426788dfa31370cbca0202fda0baacdf89ca8363a3edceb7a20b20  test.tar.xz" | shasum -a256 -c-
test.tar.xz: OK

openssl

If you’ve used the openssl utility before it will come as no surprise that among its thousands of functions it can generate MD5/SHA-1/SHA-256 hashes. In fact it has the simplest syntax of the lot.

$ openssl md5 test.tar.xz
MD5(test.tar.xz)= 5962124bde1c2f74d03d9efcb6f7b276

It can also produce “coreutils format”:

$ openssl sha1 -r test.tar.xz
0eed545d3abfecb9e3f606571a4a9a6fa71e4bc3 *test.tar.xz

(Yes the second space is a star now. Here on Mac/Linux they are interchangeable. Unfortunately, using a * doesn’t fix our earlier problems.)

Unlike the other two tools openssl doesn’t have a way to verify the result against a known checksum so it doesn’t make life any easier than before.

Existing Techniques - Windows

I had to especially research this section since I have limited Windows experience and even less with PowerShell. Best I can tell, the situation is basically similar to Mac and Linux—but the PowerShell option is actually my favourite out of all of them.

PowerShell

PS4.0 provides a cmdlet called Get-FileHash which handles all your regular MD5 and SHA needs.

PS D:\> Get-FileHash test.tar.xz -A Md5 |Format-List

Algorithm : MD5
Hash      : 5962124BDE1C2F74D03D9EFCB6F7B276
Path      : D:\test.tar.xz

This doesn’t directly offer a way to compare the hash with something we have on the clipboard, but being PowerShell it’s easy to take this output and use it.

PS D:\> (Get-FileHash test.tar.xz -A MD5).hash -eq "5962124BDE1C2F74D03D9EFCB6F7B276"
True
PS D:\> (Get-FileHash test.tar.xz -A MD5).hash -eq "fish"
False

That’s almost easy to remember. Very cool.

certutil

certutil is another tool and this one can be used from the normal Command Prompt. It’s easy enough to get a hash out of it. Specify -hashfile with the file you want to checksum and the algorithm.

D:\>certutil -hashfile test.tar.xz SHA1
SHA1 hash of test.tar.xz:
0eed545d3abfecb9e3f606571a4a9a6fa71e4bc3
CertUtil: -hashfile command completed successfully.

Unfortunately it doesn’t offer any options for automatic verification.

Improving the situation

It would be awesome if every major OS shipped with a CLI utility for verifying a hash on the command line. If I’m remotely configuring some server it’s likely to be Linux, and likely to have both shasum and sha256sum installed.

First idea: could we just add this feature to an existing tool? Then when distros pull the update from upstream everybody will get this functionality automatically.

To that end I emailed Mark Shelor and asked if he would consider adding a new option, something like --check-hash, that would take a hash as a command line argument. (I offered to provide the requisite patch.) He did consider it and decided no, it’s outside the main function of shasum, which is to compute and display a hash value.

The more I’ve thought about it I’ve come to agree. The UNIX philosophy of having a tool that does one thing well appeals to me a lot and adding verification would dilute shasum’s purpose. Flip that around though and it raises a interesting challenge: if I had a tool whose one job was to easily verify downloaded files, what would it look like?

I realised that a dedicated tool opens lots of possibilities:

  • Read the hash as a command line argument
    • Or from standard input, or a raw file. Why the heck not?
  • It’s an interactive tool—use ANSI colours! Maybe even emoji! (or not)
  • Auto-select algorithms to save the operator time—as of 2019 we only care about MD5/SHA-1/SHA-256 for this purpose. Use binary mode.
  • Drop-in support for SHASUMS-style files
  • Fuzzy-matching so you can get a “possible match” if the hash was right but the filename was wrong
  • Human-readable output that clearly explains what “fuzzy” actions have been taken
  • Clipboard integration
  • Maybe even a simple GPG key verification wrapper for projects which use that? (Here be dragons… let’s call that version 2)

So that’s what I’m working on now. Apart from being a tool that I will actually use it’s a good opportunity to see some parts of the Rust ecosystem I haven’t had a chance to play with yet. If you’re itching for an easier hash tool, it’ll end up on GitHub once I’m relatively happy with it.

Update: My tool hashgood is now available