M-x all-things-emacs

Quick Tip: dos2unix, et al

April 30th, 2007 by Ryan McGeary · 4 Comments

I despise the fact that we live in a world with different end-of-line file formats. Windows/DOS uses CRLF, Unix uses LF, and Mac’s used to use CR1. Thankfully, Mac’s started to adopt the Unix format when OS X was released — if only Windows could do the same.

What I despise even more is that some editors seem to be incapable of determining the difference between a DOS and Unix file. There’s nothing worse than finding a once, perfect Unix file corrupted by a small section of lines with CRLFs while the rest of the file keeps only LFs. Most of the time, the blame can be placed on one’s editor configuration, but I also blame some editor defaults for not at least maintaining the format that the file was opened in. To be fair, most power-editors like emacs, vim, TextMate, etc behave “correctly” by default and keep the format that the file was opened in, but many others (unnamed) do not.

There’s not a whole lot we can do to avoid these problems without hounding our peers, but there are ways to fix these problems after they’re found.

Let’s fix the nastier problem first. When you find a file corrupted with half LFs and half CRLFs, strip out the ^M (CR) characters with a quick search and replace. Run M-% (query-replace) and substitute C-q C-m with nothing. C-q runs quoted-insert and is useful for inserting control characters (e.g. ^M, entered as C-m). Afterwards hit the exclamation point (!) to tell query-replace to replace all matches with no questions.

Other times, you will run into DOS formatted files and will just want to convert them to Unix format for consistency sake. To do this, open the buffer and run C-x <RET> f then enter unix or undecided-unix when prompted for the new coding system. This runs set-buffer-file-coding-system and the result is very similar to running dos2unix myfile.txt at the command line.

1 CR is Carriage Return. LF is Line Feed (aka Newline).

Tags: osx · quick · tips · unix · windows

4 responses so far ↓

  • 1 James // Apr 30, 2007 at 3:27 pm

    Fantastic. I never knew that ‘!’ turns a q-r-r into a replace-regexp. I’ve always just done a C-g, then run a replace-regexp reusing the last replacement.

  • 2 Peter // Apr 30, 2007 at 3:50 pm

    Have you seen the package: http://centaur.maths.qmul.ac.uk/Emacs/files/eol-conversion.el

  • 3 Christoph // May 1, 2007 at 9:03 pm

    C-x <RET> C-f in fact is C-x <RET> f.

  • 4 Ryan McGeary // May 2, 2007 at 12:39 am

    Thanks Christoph. I fixed the typo. I’m sorry if this caused anyone else unnecessary confusion.

Leave a Comment