Archive for the ‘python’ Category
YO PYGMENTS 1.0 I’M REALLY HAPPY FOR YOU, AND IMMA LET YOU FINISH, BUT PYGMENTS 1.1 HAS ONE OF THE BEST RELEASES OF ALL TIME
The past couple of weeks, if not months, we’ve been seeing some processes clog up on Bitbucket. Long story short, eventually, some apache2 worker processes would start using a lot of CPU as well as memory, and sit there for a long time. Hours, in fact. This effectively caused those workers to stall, not letting new ones spawn, and over time, that would make the site unreachable and/or slow.
Today, I added some “forensic logging”, which let us trace what URLs those processes were choking on. After we found those, and could reproduce the crazy resource usage, we traced it back to a Pygments highlight. Specifically for the Scala lexer. Apparently that lexer didn’t work all too well.
Upgrading to 1.1 fixed the issue. Things are running much smoother now.
Moral of the story? Keep your software upgraded. It’s not only new features that are being released, also bugfixes.
HttpOnly with Django
Did you ever wonder what the dirtiest way to add HttpOnly cookies to Django was, without having to patch both Django and Python?
Well, through the amazing flexibility of the Python language, and the black art of monkeypatching, here’s how:
from Cookie import Morsel
from django.http import HttpResponse
def http_only_cookie(fn):
def wrap(self, key, *args, **kwargs):
fn(self, key, *args, **kwargs)
self.cookies[key]['HTTPOnly'] = True
return wrap
def exclude(fn, field):
fields = fn()
for idx, (k, v) in enumerate(fields):
if field == k:
fields.pop(idx)
return fields
def append_httponly(fn):
def wrap(self, *args, **kwargs):
out = fn(self, *args, **kwargs)
return out+'; HttpOnly'
return wrap
def bootstrap_httponly():
HttpResponse.set_cookie = http_only_cookie(HttpResponse.set_cookie)
Morsel._reserved['httponly'] = 'httponly'
Morsel.items = lambda self: exclude(super(Morsel, self).items, 'httponly')
Morsel.OutputString = append_httponly(Morsel.OutputString)
Now just stick this in one of your __init__.py files:
from myapp.httponly import bootstrap_httponly bootstrap_httponly()
And enjoy cookies like:
Set-Cookie: sessionid=weirdhash; Domain=.foo.org; expires=Fri, 18-Sep-2009 13:18:52 GMT; Max-Age=1209600; Path=/; HttpOnly
And that’s that.
Making Python’s string.Template useful
You know how Python has string.Template? It’s kinda useful, as it allows you to do stuff like:
from string import Template
s = Template('$who likes $what')
print s.substitute(who='tim', what='kung pao')
'tim likes kung pao'
That’s neat. But more often than not, you may want to use nested dicts, so you can write something like ‘person.name’. string.Template won’t allow you to do this, but it’s pretty easy to get around:
class TraversingDict(dict):
def __getitem__(self, item):
if '.' in item:
source, path = item.split('.', 1)
return TraversingDict(self[source])[path]
return super(TraversingDict, self).__getitem__(item)
class InterpolTemplate(string.Template):
idpattern = r'[_a-z][_a-z0-9\.]*'
def render(self, dct):
return self.safe_substitute(TraversingDict(dct))
How does it work? It uses a custom class, which subclasses ‘dict’. It’ll behave just like a normal built-in dictionary, but we’ve overriden __getitem__ to look for periods in the key name. If one is found, it splits up the key, and instantiates itself recursively. This essentially means that you can nest to any level, like ‘person.information.personal.name.first_name’.
The ‘render’ method on InterpolTemplate is not really needed, but it turns your dict into a TraversingDict, so you don’t need to mess with those at all:
Here’s the unittest I use:
def run_template_test():
tmpl = "repository: ${repo.name}, owner: ${repo.owner}, size: ${size}"
t = InterpolTemplate(tmpl)
d = { 'repo': { 'name': 'foo', 'owner': 'bar' }, 'size': 42 }
r = t.render(d)
assert r == 'repository: foo, owner: bar, size: 42', r
Neat, eh? Makes for a nice simple substitute when you don’t want to rely on <insert template library here>.
{l,r}strip considered harmful
If you’re using lstrip() or rstrip() in your code, chances are you might have a problem.
This is because those functions probably don’t do what you think they do.
So go ack --python '[lr]strip' your codebase now.
What you think it does
If you haven’t been bitten by this before, and you haven’t thoroughly read help(str.rstrip), you probably think rstrip will strip a sequence of bytes off the end of a string.
For example, it could be used to get rid of a file extension, like
>>> filename = "fumble.exe"
>>> basefn = filename.rstrip(".exe")
Bzzzt. Wrong.
What it actually does
As per the docstring:
rstrip(…)
S.rstrip([chars]) -> string or unicodeReturn a copy of the string S with trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
If chars is unicode, S will be converted to unicode before stripping
Pay attention here: characterS. Plural. Not a sequence. More like a list.
Now, have a look at our previous example, removing the extension.
>>> filename = "fumble.exe"
>>> basefn = filename.rstrip(".exe")
>>> basefn
'fumbl'
Not what you expected, eh? Problem here is that it treats ‘.exe’ as a list of characters, so it’s basically this:
>>> remove_chars = [ '.', 'e', 'x', 'e' ]
>>> for char in reverse(filename):
... if char in remove_chars:
... # remove the char we're looking at
... else:
... break
- Start at the end and go backwards, byte by byte.
- If the character we’re seeing is in the aforementioned list, remove it.
- If not, we’ve reached a stop point, so process no further.
The opposite is of course true for lstrip.
What it is useful for
Once you get over the misleading behavior and come to terms with what it actually does, you can start discovering what it is useful for.
For example, it’s immensely useful for stripping leading or trailing whitespace. In fact, this is such a common use-case that this is what it does if you don’t specify any arguments.
Since it’s a list of characters, in cases where you need to remove both unix-style carriage returns as well as win32 ones, you can simply do:
block_of_text.rstrip("\r\n")
This will remove both. They don’t necessarily have to be in that order.
What you probably wanted instead
OK, so having that out of the way, what would you want to get rid of a file extension? replace(). replace() is perfect for this, because it takes a third optional argument:
replace(…)
S.replace (old, new[, count]) -> stringReturn a copy of string S with all occurrences of substring
old replaced by new. If the optional argument count is
given, only the first count occurrences are replaced.
So lets try it again:
>>> filename = "fumble.exe"
>>> basefn = filename.replace(".exe", "", 1)
>>> basefn
'fumble'
Much better.
Python tricks: functools.partial and wraps
Since Python 2.5, Python has had the ‘functools’ module for doing various higher order functions.
For example:
from functools import partial def adder(first, second): return first + second adder10 = partial(adder, 10) print adder10(32) # -> 42
Partial evaluation, eh? That’s kinda cool.
On to ‘wraps’ which is the one I’ve found most practical use for. I like decorators, and I use them where applicable. What I don’t like about decorators is that when you get a backtrace, it’ll actually show up as *that* function, and not the function you decorated.
‘wraps’ to the rescue:
from functools import wraps def some_decorator(f): def wrap(*args, **kwargs): return f(*args, **kwargs) return wraps(f)(wrap) @some_decorator def some_function(): ...
Now the function name, docstring, signature, etc. will be that of ‘f’, no longer ‘wrap’! Immensely useful.
Piston and Oberon
I just wanted to do a quick write-up on a couple of things, because:
- I wanted to announce two upcoming projects of mine, and
- Getting a new post out there
Piston
Piston’s a django-app I’m writing for Bitbucket. It serves as sort of a “mini-framework” on top of Django for creating RESTful APIs. Well, actually it doesn’t tie you to be RESTful at all, as its url mapping facility hooks directly into Django.
A while back, jacobian wrote an article, “REST worst practices”, outlining some of the things a good implementation would need. I’m happy to say that Piston’s elegantly waltzing its way through the list, checking off his points one by one.
We don’t tie a resource to a model (although you easily can), we have plug-able authentication (with new handlers being a breeze to add), configurable output formats (in form of “emitters”, a simple dict-to-x facility, comes with emitters for JSON, YAML and XML), proper use of HTTP (status codes, headers) and CRUD semantics, and best of all, it ties right in to your Django application.
Anyway, I wrote it for Bitbucket, but it definitely merits an open source release and its own project. It’s behind closed doors right now, but nearing completion. Once we feel it’s good to release, we’ll do a release together with David Larlet, the author of Semantic Django (who else?)
Oberon
Oberon’s also something we use on Bitbucket. It’s a queue-based “application platform” based on Twisted. Vague, huh?
No, we use it for the service integration facility of Bitbucket. Oberon itself is just a daemon, serving as a message-passing facility between the client and what I call “brokers”. A broker is a piece of Python code that must satisfy two things:
- It must contain a class that subclasses “BaseBroker”, and
- That class must have a “handle” method receiving a single argument, “payload”
What this allows you to do is pretty nifty. You can load up a few of these brokers, and then using the client API, you can send messages to Oberon, and it’ll take it from there.
For example, we have a couple of brokers, like Twitter, which extracts the information it wants from the payload and uses a Twitter client library to post messages. There’s a Basecamp broker, and the most popular one thus far is the “Issue” broker, which parses commit messages and acts on them. Stuff like “great, all done, fixes #42″ will close up issue 42, and “hm, needs more work, references #37″ will add a comment to issue 37.
Best of all, and my favorite feature is ‘oberonc’, the command line client. It’s pretty basic but it has useful commands like ’stats’, ‘brokers’ and best of all: ‘reload’ — yep, that’s right, you can reload brokers on the fly without disrupting service. It works really well too, due to the way we’ve designed the application. It also means you can load up new brokers that have never been loaded before, so it makes it really interesting to upgrade running systems.
None is this stuff is tied into Bitbucket, so it has a vast variety of uses. It runs on top of ‘twistd’ as well so it should be pretty stable and scalable (it uses stuff like epoll.)
Anyway, Oberon’s also getting its own open source release, together with all the brokers we’ve written for the service integration we’re using on the live system. Those should serve as good examples.
I’ll post about both here, when they’re out.
Conditional middleware execution in Django
On BitBucket, we need to handle streaming data through Django. This lowers the memory footprint of the application and makes execution faster.
The problem with this is that several stock middleware in Django “look” at the content before sending it. This is a problem for streaming content, since you’d generally use a generator, and you can’t consume it until the very last minute.
The middleware in Django that does this is ConditionalGetMiddleware which attempts to create an ‘ETag’ header, and CommonMiddleware, which attemps to create a ‘Content-Length’ header.
Here’s an easy way of not executing certain middleware in such cases:
def wsgi_compat_middleware_factory(klass):
class compatwrapper(klass):
def process_response(self, req, resp):
if not whatever_condition:
return klass.process_response(self, req, resp)
return resp
return compatwrapper
This is a “factory”, returning a class that can you use instead of the normal middleware. On BitBucket, the condition is ‘if not req.is_mercurial():’. Replace with whatever makes sense for you.
You use it by doing something like this:
from django.middleware.http import ConditionalGetMiddleware from django.middleware.common import CommonMiddleware StreamingConditionalGetMiddleware = wsgi_compat_middleware_factory(ConditionalGetMiddleware) StreamingCommonMiddleware = wsgi_compat_middleware_factory(CommonMiddleware)
Now you have two new classes – Just install those in place of the stock middleware, and viola.
