Alex Mitchell

Checksums and Gaming – Pt. 1: What’s a Checksum?

I’m going to say something so blindingly obvious, so nearly tautological, that I might as well be telling you that water is wet: data integrity is important for retro gaming. As our current storage mediums age and decay, and we begin the arduous process of migrating our data to newer mediums with improved reliability and support, it’s important that users understand some best practices and computing fundamentals that will help our community preserve gaming for subsequent generations. One tool that will help us do just that is something called a “checksum”, which is an algorithm that data archivists use to see if the data they have is still the same as it was when it was first generated. For those of you out there screaming at me through your computer screens: yes, this is about as broad of a simplification as I can make and I know it’s a little more complicated than that. Still, we’re starting from square one here and I plan to explore this concept in greater detail with subsequent articles.

For part one of this series, let’s make the concept of a checksum slightly more “digestible” by going through a thought experiment. Say that I want to make some soup for a date tonight but I don’t have a recipe on hand. I call Bob from RetroRGB and he gives me an ingredients list from one of his recipes, which is a simple chicken soup:

  • (1) tbsp vegetable oil
  • 500g chicken breast
  • (2) Celery sticks
  • (1) Large carrot
  • (1) Medium sized yellow onion
  • (1) Knob of ginger
  • (2) Cloves of garlic
  • (1) Cup frozen peas
  • (1) Packet of noodles
  • (4.5) Cups chicken broth

Like most folk probably would, I decided to write this ingredient list down on a sheet of paper that I put in my pocket. In addition to writing the list down though, I also decide that I’m going to count all the characters (not including spaces) in the entire ingredients list and keep that number on a separate piece of paper. In this instance, I get a result of 177. That’ll be important later.

So, I’m walking to the store and it’s a gorgeous day; almost too gorgeous. In fact, it’s so hot out that I’m sweating like a hog while I’m walking. As a result, by the time I get to the store I’ve completely soaked my groceries list and it looks very different now. I think I can make it out, but the ingredients seem kinda… odd:

  • (1) tbsp Canola oil
  • 500lbs chicken breast
  • (2) Cinnamon sticks
  • (1) Large parsnip
  • (1) Medium sized red onion
  • (1) Knob of ginger
  • (2) Cloves of garlic
  • (1) Cup snap peas
  • (1) Cup of rice
  • (4.5) Cups chicken broth

Even though the ingredients seem similar, it is pretty clear that this isn’t the same list I wrote down earlier. I remember that number I wrote down before I left my house: 177. In theory, if I count the number of characters in this list and it adds up to 177 then nothing has changed, right?

Well, you can guess where this is going. I count up the number of characters in the list and get 168. Darn! Something changed but because I don’t really know how to make soup I’m not sure what it could be. However, because I know something is wrong I decide to just call Bob back and ask him to read me the ingredients list again. I copy it down, again, and because I’m kinda nervous I decide that I’ll count all the characters in it to make double sure that it’s the same as the first time I called Bob up. Lo and behold, the new list has a character count of 177, just like the original list! I now feel confident that I’m buying the right ingredients for this recipe and my soup is saved. Sure, my date still thinks I’m a total weirdo for making chicken soup in the blistering heat but another sweaty catastrophe has been averted.

To loop back to the namesake of this article, that little number I wrote down (177) was, in a broad manner of speaking, a checksum. Checksums are just short strings of data, generated by algorithms that use my information (in this case, the ingredients list) as the input. I used that short string to compare my sweaty list to my dry list, and also compare my dry list to the replacement list I got from my friend the second time. By doing so, I was not only able to see that the sweaty copy of my list had been compromised by the ravages of time, but also that the replacement list I got from my friend was identical to my original list. I do this very same thing with copies of my video games all the time, which is possible because groups like No-Intro and Re-Dump have taken on the Herculean task of scanning and cataloguing entire libraries of video games. I scan my game files with a checksum algorithm, like MD5, and then compare the output I get to that game’s listing on an archival database. If I get a matching string, I know that the game in my possession is bit-for-bit the exact same as when it was released, and—as long as my hardware is working properly—the experience I have playing it should be exactly as the developer intended.

If you’ve stayed with me through this whole article I’m super grateful. Again, checksums are a little more complicated than all this but if you tune in for the next article I’ll be sure to have some more real-world examples of how checksums can be used by retro gamers like you and me.