Normalise all files with one command
I was after a single command to normalise my code files (trailing spaces, newline at EOF, etc.). I had to do it for about 15 repos, and I've wanted to do it several times in the past, so I decided to find a comprehensive way of doing it.
The result I came up with is ok. It's complex & nasty
and relies on †, but it's flexible and it works. You can pass it a directory or a list of files (space separated, as per shell norm). It will:
- strip trailing whitespace,
- replace all tabs with 2 spaces (configurable),
- ensure the line endings are unix, and
- ensure there is exactly one newline at the end of the file.
Another benefit, depending on your point of view, is that it will also reset file permissions to your default (probably 644). This is because it outputs the results to a new file, then renames it to the original file. Because of this, I'd recommend only doing this with files that are already tracked in a VCS. The only issue I've had with this is needing to
chmod 755 script/rails and you could add that in at the end if you like (
[[ -f script/rails ]] && chmod 755 script/rails will even check if it exists first).
† UPDATE 2014-06-13: I tried so many regexen with Mac
sed to get the other line endings out of my files to no avail. I did use
dos2unix, but it didn't handle Mac line endings. Turns out
tr works much better anyway.
The relevant part is:
printf '%s' "`cat $file`" | tr '\r\n' '\n' | tr '\r' '\n' | sed -E 's/[[:blank:]]+$//' | expand -t 2 > "$file.tmp"
printfprints the contents of
$filewithout touching newlines, as received from
cat- this leverages the fact that backticks strip newlines & ensures we have no newlines (I inherited a bunch of files with several newlines).
The content is filtered throughIt actually didn't handle Mac line endings (\r), my testing was incomplete.
dos2unixwhich converts all line endings to unix-y ones.
It first replaces all "\r" with "\n", which eliminates Mac line endings, but makes Windows endings into "\n\n" - theUgh, I did it again: the
-sswitch compresses these to just "\n".
-sswitch actually compresses all occurences of
\n\nin your code, so it will remove all blank spacing lines. I've updated it to do two passes, one for Windows, one for Mac. Now it should not affect files that already have Unix line-endings.
- It's then passed on to
sedwhich matches against one or more "blanks" (spaces, tabs) at the end of each line and replaces them with nothing. Incidentally,
sedensures there's at least one trailing newline, so this is where it gets added back in.
expandreplaces tabs with spaces. Change the 2 to a 4 if you've been using 4 space indentation.
- The output is then saved to a new file with the extension ".tmp"
This is great for one file, but I want it to do all the files, so I added a flexible support system. Let me know if you've got an easier way to achieve the same outcome.
The first function,
exclude_patterns, builds a list of patterns to pass to
find's exclude filters. Because this selectively excludes things, only use it on files that are tracked by a VCS. If you find any files or folders that you don't want to normalise, add them in on the appropriate line.
The second function,
code_file_list, returns the actual list of files. You can run it independently to see which files will be normalised, but change
The third function,
normalise_files, uses a nasty loop that separates on null characters (from
-print0) to ensure we don't have problems with files that have spaces in them. It also echoes the file so that we have a log of which files were normalised.