scripts, shell

Accents everywhere (to be removed, obviously)

Accents have again (I still live in Spain and do some work for the Spanish Administration) crept into my terminal. This time it was a group of University professors which had to create a lot of files and directories concerning a historical catalog and even though I remember telling the coordinator not to use accents or spaces or any funny characters, they did it. I should have known better…
The problem was then to take away all spaces and non-ASCII characters from directory and file names.
After thinking about it a bit, I came up with the solution below. There may be a simpler way (using mkdir -p) but then you would have to remove a lot of files and be careful to check that the copies have succeeded before rm‘ing anything… Too much of a mess to me.
Find has a couple of options to specify the depth of the search to be done: -maxdepth and -mindepth, both starting at 0 (the pwd).

# Yes, they USE spaces in filenames...
# Notice that OS X has no seq command.
# I am sure they have not reached insanity YET, so 10 is a likely bound 
for depth in 1 2 3 4 5 6 7 8 9 10 ; do
    find . -type d -mindepth $depth -maxdepth $depth | while read -r i ; do
        # It is better to know the ALLOWED set of chars, not leaving
        # it to the shell's fancy. Any unallowed item becomes an underscore
        j=`echo "$i" | sed -e 's/[^a-zA-Z_0-9./]/_/g'`
        # only move if source != destination
        if [ "$i" != "$j" ] ; then
                # for logging purposes, one can never be too careful:
                echo moving "$i" TO "$j"
                # this will actually not happen but just in case...
                if [ -e "$j" ] ; then
                    echo "COLLISION: last move not done"
                    mv "$i" "$j"

Two remarks:

  • I prefer specifying the whole set of allowed characters because I was going to have to repeat the job in Perl (I had to edit some html files pointing to those directories), so the [:alnum:] class etc. would complicate things more due to the differences between Perl’s and sh’s regex’s.
  • I was practically certain that there would not be any collisions. In different circumstances, I would have logged a lot more information and prevented collisions using a counter.

I want to remark that the above code had:

    for i in `find . -type d -mindepth $depth -maxdepth $depth` ; do

instead of the piped while you see above. Thanks to Pierre Gaston for his comment.


  • On 03.20.09 Anonymous Coward said:

    You reinvent the wheel: apt-get install convmv.

  • On 03.20.09 Pedro Fortuny said:

    Hi, AC:

    reinvented up to a point: I tried convmv on my MBPro and the

    Almost all POSIX filesystems do not care about how filenames are encoded

    (man convmv) problem crept up… So I ended up writing the above.

    However, thanks for the pointer, I ought to have mentioned it.


speak up

Add your comment below, or trackback from your own site.

Subscribe to these comments.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*Required Fields