Archive for March, 2009
{l,r}strip considered harmful
If you’re using lstrip() or rstrip() in your code, chances are you might have a problem.
This is because those functions probably don’t do what you think they do.
So go ack --python '[lr]strip' your codebase now.
What you think it does
If you haven’t been bitten by this before, and you haven’t thoroughly read help(str.rstrip), you probably think rstrip will strip a sequence of bytes off the end of a string.
For example, it could be used to get rid of a file extension, like
>>> filename = "fumble.exe"
>>> basefn = filename.rstrip(".exe")
Bzzzt. Wrong.
What it actually does
As per the docstring:
rstrip(…)
S.rstrip([chars]) -> string or unicodeReturn a copy of the string S with trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
If chars is unicode, S will be converted to unicode before stripping
Pay attention here: characterS. Plural. Not a sequence. More like a list.
Now, have a look at our previous example, removing the extension.
>>> filename = "fumble.exe"
>>> basefn = filename.rstrip(".exe")
>>> basefn
'fumbl'
Not what you expected, eh? Problem here is that it treats ‘.exe’ as a list of characters, so it’s basically this:
>>> remove_chars = [ '.', 'e', 'x', 'e' ]
>>> for char in reverse(filename):
... if char in remove_chars:
... # remove the char we're looking at
... else:
... break
- Start at the end and go backwards, byte by byte.
- If the character we’re seeing is in the aforementioned list, remove it.
- If not, we’ve reached a stop point, so process no further.
The opposite is of course true for lstrip.
What it is useful for
Once you get over the misleading behavior and come to terms with what it actually does, you can start discovering what it is useful for.
For example, it’s immensely useful for stripping leading or trailing whitespace. In fact, this is such a common use-case that this is what it does if you don’t specify any arguments.
Since it’s a list of characters, in cases where you need to remove both unix-style carriage returns as well as win32 ones, you can simply do:
block_of_text.rstrip("\r\n")
This will remove both. They don’t necessarily have to be in that order.
What you probably wanted instead
OK, so having that out of the way, what would you want to get rid of a file extension? replace(). replace() is perfect for this, because it takes a third optional argument:
replace(…)
S.replace (old, new[, count]) -> stringReturn a copy of string S with all occurrences of substring
old replaced by new. If the optional argument count is
given, only the first count occurrences are replaced.
So lets try it again:
>>> filename = "fumble.exe"
>>> basefn = filename.replace(".exe", "", 1)
>>> basefn
'fumble'
Much better.
Mercurial powertip: Un-add a file
Something that isn’t entirely clear from the use of Mercurial, is how to un-add a file you accidentally added, before you commit.
$ hg add data/
adding data/index.txt
adding data/README
adding data/hugefile.db
$ hg status
A data/index.txt
A data/README
A data/hugefile.db
Oops. Didn’t want to add ‘hugefile.db’. How to undo that add?
$ hg revert data/hugefile.db
Did that do the right thing?
$ ls data/hugefile.db # still there?
data/hugefile.db
$ hg status
A data/index.txt
A data/README
? data/hugefile.db
Yep!
