Recent work: sindresorhus/file-type and hex skills

Yesterday my pull request for sindresorhus/file-type was accepted.  Hoorah! ?

Screen Shot 2015-11-08 at 5.34.24 PM

I’ve decided to spend more time working with OSS developers for fun and profit. Sindre Sorhus is hugely popular as a developer, in large part for the the exceedingly numerous, high-quality packages he’s submitted to NPM.org and on Github.  All in all, he’s a stand-up guy who deserves the success and recognition for all the hard work he’s put in for the JS community.

Lo-and-behold, a few days ago the call for contributors was made!

 

I landed on this issue made for expanding sindresorhus/file-type with more compressed file types.  file-type is a Node.js module made for apps needing to identify files by their actual binary signature, similar to threatstack/libmagic for the C language.  It has  a straightforward and concise structure consisting of a series of if statements and a byte-by-byte examination of the current file stream, typically only a few bytes. Here’s an example of how it checks a file if it’s a GIF:

if (buf[0] === 0x47 && buf[1] === 0x49 && buf[2] === 0x46) {
    return {
        ext: 'gif',
        mime: 'image/gif'
    };
}

Digging in…

I’ve messed with hex dumps and debugging binary files before, and this simply involved cross-referencing the first few bytes of a file with already-known signatures.

The first  task was to get the lists of file types needing signatures and grabbing the relevant data from sources around the net, mostly from the File Signatures Table, compiled by Gary Kessler.

I created a quick spreadsheet to keep things in order, too.
Screen Shot 2015-11-08 at 5.32.45 PM

To help out on quick inspections, I created tiny utility to help at first, calling it fsig

#!/bin/bash

clear
xxd $1 | head

And for inspecting temporary directories of common file types, I used this bash one-liner:

clear; for n in *; do echo $n ; xxd $n | head ; done

 

For extracting signatures cleanly, I ran into HexFiend for OSX.  It’s highly recommended since it displays where the cursor is by byte and allows for clean copy-paste.
Screen Shot 2015-11-08 at 4.10.52 PM

Making it work

A couple of problems I ran into was the issue of files with similar signatures.  For example, deb files are nearly identical to files made with the compress utility.  That means that care was needed when ordering the logic.

ordering logic
Care was needed in ordering the logic for comparing file signatures

To make converting the signature hex into code simple and quick, I used Vim’s handy macro recording mode.
macro

I love Vim macros. Check out how to use them with Vim 101: A Gentle Introduction to Macros.

All set

Check outsindresorhus/file-type, and for that matter, maybe you can help with OSS by working on open issues from sindresorhus’ other projects ?

Let me know what you think on twitter  @lintuxvi!