I am uniq

[Republished from October 2005 from my previous site]

Two of the things I love most about unix are learning new tools and tricks and teaching tools and tricks to others. One of unix’s real gems is uniq(1): “report or filter out repeated lines in a file”. Extremely simple and powerful, probably a page or two of easy code, but completely indispensable.  I usually use it to get the intersections or differences of lists using the -d and -u options. Intersection is pretty straightforward:

  cat list_one.txt list_two.txt | sort | uniq -d

Basically, cat the two lists together (incidentally creating the union, with duplicates), sort that (because uniq only checks adjacent lines, so duplicates have to be next to each other) then have uniq print only the duplicated lines (ok, yeah, you need to be sure neither list has any duplicates to start out with)

Getting a strict difference is a little tricker, you actually have to get the intersection first and then remove those elements using a second pass:

  cat list_one.txt list_two.txt | sort | uniq -d | cat list_one.txt - | sort | uniq -u

Will return the elements of ”list_one.txt” that do not appear in ”list_two.txt” (Note the - argument to the second cat, which signifies standard input and receives the | — cat is worthy of its own entry sometime, though)

Leave a Reply