Sunday, October 7, 2012

'unsort', the missing UNIX filter


The UNIX filter that should have been there since the beginning!

Unsort is the natural opposite to standard 'sort', it takes lines and shuffles them. I wonder how it is not part of every standard UNIX system. How the hell did they test 'sort'?
Sample run:
$ echo -e "1\n2\n3\n4\n5\n6" | unsort
stderr: Using seed: 542941369
6
4
1
5
2
3
stderr: Jumps: 4 left, 4 right, balance: 1.000000
This 'unsort' implementation works as follows:
  1. reads lines from standard input
  2. assigns them random, unique indexes
  3. sorts lines by their random index
  4. when input finishes, lines are dumped as sorted.
The only limitation is on file size, since the whole file is kept in memory during the process.

Download 'unsort' here: unsort-1.0.1.c. Simple build & install instructions are within the file, along with shell script for testing the statistical distribution, like this one:

Statistical test #2 "Permutations of 3 elements":
$ i=9999;while [ $((i--)) -ge 0 ];do echo -e "a\nb\nc"|unsort -q|xargs echo;done|sort|uniq -c
1644 a b c
1681 a c b
1680 b a c
1633 b c a
1709 c a b
1653 c b a

No comments:

Post a Comment