Unicode sorting in perl

Just Normalize. Assume file.txt is a list of unicode words, then

cat file.txt | perl -e 'use Unicode::Normalize; my @w ;\
while (<>) {chomp; push @w, $_;} ; @w = sort {NFD($a) cmp NFD($b) } @w ;\
print(join("\n", @w))'

will output the sorted list (well, sorted according to the NFD normalization, which for Spanish is enough).

speak up

Add your comment below, or trackback from your own site.

Subscribe to these comments.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*Required Fields