Normalise all files with one command

Problem

I was after a single command to normalise my code files (trailing spaces, newline at EOF, etc.). I had to do it for about 15 repos, and I've wanted to do it several times in the past, so I decided to find a comprehensive way of doing it.

Solution

The result I came up with is ok. It's complex & nasty and relies on dos2unix†, but it's flexible and it works. You can pass it a directory or a list of files (space separated, as per shell norm). It will:

  • strip trailing whitespace,
  • replace all tabs with 2 spaces (configurable),
  • ensure the line endings are unix, and
  • ensure there is exactly one newline at the end of the file.

Another benefit, depending on your point of view, is that it will also reset file permissions to your default (probably 644). This is because it outputs the results to a new file, then renames it to the original file. Because of this, I'd recommend only doing this with files that are already tracked in a VCS. The only issue I've had with this is needing to chmod 755 script/rails and you could add that in at the end if you like ([[ -f script/rails ]] && chmod 755 script/rails will even check if it exists first).

UPDATE 2014-06-13: I tried so many regexen with Mac sed to get the other line endings out of my files to no avail. I did use dos2unix, but it didn't handle Mac line endings. Turns out tr works much better anyway.

The relevant part is:

printf '%s' "`cat $file`" | tr '\r\n' '\n' | tr '\r' '\n' | sed -E 's/[[:blank:]]+$//' | expand -t 2 > "$file.tmp"
  1. printf prints the contents of $file without touching newlines, as received from cat - this leverages the fact that backticks strip newlines & ensures we have no newlines (I inherited a bunch of files with several newlines).
  2. The content is filtered through dos2unix which converts all line endings to unix-y ones. It actually didn't handle Mac line endings (\r), my testing was incomplete. tr does though. It first replaces all "\r" with "\n", which eliminates Mac line endings, but makes Windows endings into "\n\n" - the -s switch compresses these to just "\n". Ugh, I did it again: the -s switch actually compresses all occurences of \n\n in your code, so it will remove all blank spacing lines. I've updated it to do two passes, one for Windows, one for Mac. Now it should not affect files that already have Unix line-endings.
  3. It's then passed on to sed which matches against one or more "blanks" (spaces, tabs) at the end of each line and replaces them with nothing. Incidentally, sed ensures there's at least one trailing newline, so this is where it gets added back in.
  4. Then expand replaces tabs with spaces. Change the 2 to a 4 if you've been using 4 space indentation.
  5. The output is then saved to a new file with the extension ".tmp"

This is great for one file, but I want it to do all the files, so I added a flexible support system. Let me know if you've got an easier way to achieve the same outcome.

Explanation

The first function, exclude_patterns, builds a list of patterns to pass to find's exclude filters. Because this selectively excludes things, only use it on files that are tracked by a VCS. If you find any files or folders that you don't want to normalise, add them in on the appropriate line.

The second function, code_file_list, returns the actual list of files. You can run it independently to see which files will be normalised, but change -print0 to -print first, so that you can read its output. (Then change it back after.)

The third function, normalise_files, uses a nasty loop that separates on null characters (from -print0) to ensure we don't have problems with files that have spaces in them. It also echoes the file so that we have a log of which files were normalised.