Jesper Noehr

Pythonista, RESTafarian, Binary Poet & Proud Bucketeer

Archive for March, 2009

{l,r}strip considered harmful

with one comment

If you’re using lstrip() or rstrip() in your code, chances are you might have a problem.

This is because those functions probably don’t do what you think they do.

So go ack --python '[lr]strip' your codebase now.

What you think it does

If you haven’t been bitten by this before, and you haven’t thoroughly read help(str.rstrip), you probably think rstrip will strip a sequence of bytes off the end of a string.

For example, it could be used to get rid of a file extension, like


>>> filename = "fumble.exe"
>>> basefn = filename.rstrip(".exe")

Bzzzt. Wrong.

What it actually does

As per the docstring:

rstrip(…)
S.rstrip([chars]) -> string or unicode

Return a copy of the string S with trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
If chars is unicode, S will be converted to unicode before stripping

Pay attention here: characterS. Plural. Not a sequence. More like a list.

Now, have a look at our previous example, removing the extension.


>>> filename = "fumble.exe"
>>> basefn = filename.rstrip(".exe")
>>> basefn
'fumbl'

Not what you expected, eh? Problem here is that it treats ‘.exe’ as a list of characters, so it’s basically this:


>>> remove_chars = [ '.', 'e', 'x', 'e' ]
>>> for char in reverse(filename):
... if char in remove_chars:
... # remove the char we're looking at
... else:
... break

  1. Start at the end and go backwards, byte by byte.
  2. If the character we’re seeing is in the aforementioned list, remove it.
  3. If not, we’ve reached a stop point, so process no further.

The opposite is of course true for lstrip.

What it is useful for

Once you get over the misleading behavior and come to terms with what it actually does, you can start discovering what it is useful for.

For example, it’s immensely useful for stripping leading or trailing whitespace. In fact, this is such a common use-case that this is what it does if you don’t specify any arguments.

Since it’s a list of characters, in cases where you need to remove both unix-style carriage returns as well as win32 ones, you can simply do:


block_of_text.rstrip("\r\n")

This will remove both. They don’t necessarily have to be in that order.

What you probably wanted instead

OK, so having that out of the way, what would you want to get rid of a file extension? replace(). replace() is perfect for this, because it takes a third optional argument:

replace(…)
S.replace (old, new[, count]) -> string

Return a copy of string S with all occurrences of substring
old replaced by new. If the optional argument count is
given, only the first count occurrences are replaced.

So lets try it again:


>>> filename = "fumble.exe"
>>> basefn = filename.replace(".exe", "", 1)
>>> basefn
'fumble'

Much better.

Written by jespern

March 8th, 2009 at 12:57 pm

Posted in python

Tagged with

Mercurial powertip: Un-add a file

without comments

Something that isn’t entirely clear from the use of Mercurial, is how to un-add a file you accidentally added, before you commit.


$ hg add data/
adding data/index.txt
adding data/README
adding data/hugefile.db

$ hg status
A data/index.txt
A data/README
A data/hugefile.db

Oops. Didn’t want to add ‘hugefile.db’. How to undo that add?


$ hg revert data/hugefile.db

Did that do the right thing?


$ ls data/hugefile.db # still there?
data/hugefile.db

$ hg status
A data/index.txt
A data/README
? data/hugefile.db

Yep!

Written by jespern

March 2nd, 2009 at 10:04 am

Posted in hg

Tagged with ,