Jesper Noehr

Pythonista, RESTafarian, Binary Poet & Proud Bucketeer

YO PYGMENTS 1.0 I’M REALLY HAPPY FOR YOU, AND IMMA LET YOU FINISH, BUT PYGMENTS 1.1 HAS ONE OF THE BEST RELEASES OF ALL TIME

with 5 comments

The past couple of weeks, if not months, we’ve been seeing some processes clog up on Bitbucket. Long story short, eventually, some apache2 worker processes would start using a lot of CPU as well as memory, and sit there for a long time. Hours, in fact. This effectively caused those workers to stall, not letting new ones spawn, and over time, that would make the site unreachable and/or slow.

Today, I added some “forensic logging”, which let us trace what URLs those processes were choking on. After we found those, and could reproduce the crazy resource usage, we traced it back to a Pygments highlight. Specifically for the Scala lexer. Apparently that lexer didn’t work all too well.

Upgrading to 1.1 fixed the issue. Things are running much smoother now.

Moral of the story? Keep your software upgraded. It’s not only new features that are being released, also bugfixes.

Written by jespern

October 1st, 2009 at 11:57 am

Posted in python

HttpOnly with Django

without comments

Did you ever wonder what the dirtiest way to add HttpOnly cookies to Django was, without having to patch both Django and Python?

Well, through the amazing flexibility of the Python language, and the black art of monkeypatching, here’s how:

from Cookie import Morsel
from django.http import HttpResponse

def http_only_cookie(fn):
    def wrap(self, key, *args, **kwargs):
        fn(self, key, *args, **kwargs)
        self.cookies[key]['HTTPOnly'] = True
    return wrap

def exclude(fn, field):
    fields = fn()

    for idx, (k, v) in enumerate(fields):
        if field == k:
            fields.pop(idx)

    return fields

def append_httponly(fn):
    def wrap(self, *args, **kwargs):
        out = fn(self, *args, **kwargs)
        return out+'; HttpOnly'
    return wrap

def bootstrap_httponly():
    HttpResponse.set_cookie = http_only_cookie(HttpResponse.set_cookie)
    Morsel._reserved['httponly'] = 'httponly'
    Morsel.items = lambda self: exclude(super(Morsel, self).items, 'httponly')
    Morsel.OutputString = append_httponly(Morsel.OutputString)

Now just stick this in one of your __init__.py files:

from myapp.httponly import bootstrap_httponly

bootstrap_httponly()

And enjoy cookies like:

Set-Cookie:  sessionid=weirdhash; Domain=.foo.org;
  expires=Fri, 18-Sep-2009 13:18:52 GMT;
  Max-Age=1209600; Path=/; HttpOnly

And that’s that.

Written by jespern

September 4th, 2009 at 2:25 pm

Posted in django, python

Bitwise permissions in Python (and Django)

with 8 comments

So I’ve been wanting to write about this for a while. Finally getting around to it.

We’re going to be talking about how to construct and use a flexible permission scheme with Python, and how you can use it in your Django project.

If you’re already a binary ninja, you can skip this section.

The “trick” we’re going to be using, is what’s called “bitwise operators”, namely &, | and <<.

It’s important you know what these do, so lets do a quick tour. We’ll start with the “left shift” operator (<<).

So you know binary, right? Those 1’s and 0’s? OK, good. Lets say that we have the number 2. That’s 10 in binary. Well, actually it’s not 10, it’s 00000010, since there are 8 bits for one byte. The << operator, also called “left shift”, will *move* the bits to the left. So 00000010 << 1 is 00000100, or in decimal, 4. 00000010 << 2 is 00001000, or decimal 8. There’s also the “right shift” operator, so 8>>1 is 4, but we won’t be using that.

There’s an interesting pattern here. All the numbers we get, starting from 1, are in the power of 2, or 2*(2^x), actually. So 2<<4 is the same as 2*(2^4), which is decimal 32.

Now, lets talk about “binary or” (|). This operator, at first sight, may seem like it’s the same as decimal plus (+), since if say 2 | 4, we get 6. But, if we say 6 | 4, we get 6 again. Strange, eh? Not really. The trick here is that “or” will *set* the bit in the originating number, *if it’s not already set*. If it *is* set, it does nothing. Lets drop down to binary again to demonstrate this.

So, using our example of 2 | 4, 2 is 00000010 and 4 is 00000100. 2 | 4 is basically saying “look at the bits in both, and switch all the ones you find on”. So we end up with 00000110, which is…. 6. But wait! 6 is not in the power of 2. You can’t do 2<<x and end up with 6. Thus, we can conclude that 6 is the *combination* of 2<<0 and 2<<1.

In fact, we can construct very large numbers using only “or” and numbers generated by 2<<x, and we can trace it back to the originating numbers. And your computer knows this. We’ll exploit that fact in just a minute.

Finally, lets talk about “binary and” (&). This operator will look at two numbers, and in its result, only return the bits that are set in *both* numbers. This is the operator we’re going to use to “deconstruct” the numbers you or’d together earlier. For example, if we again have 2, which is 00000010, and 6, which is 00000110, it will return 00000010, since only the 7th bit is set in both. Since 6 is constructed from 2 | 4, it will also return 00000100, since 4 is also part of 6. For anything else, it will return 0.

So in summary, 6 & 2 is 2, 6 & 4 is 4, and 6 & 8 is 0.

Application

Lets try it:

CAN_READ = 1<<2
CAN_WRITE = 1<<3
CAN_ADMIN = 1<<4

READER = CAN_READ
WRITER = READER | CAN_WRITE
ADMIN = WRITER | CAN_ADMIN

bob = ADMIN
alice = READER

print "Is Bob an admin?"

if bob & CAN_ADMIN:
    print "Yes!"
else:
    print "No."

print "is Bob a reader?"

if bob & CAN_READ:
    print "Yes!"
else:
    print "No."

print "Is Alice a writer?"

if alice & CAN_WRITE:
    print "Yes!"
else:
    print "No."

print "Is Alice a reader?"

if alice & CAN_READ:
    print "Yes!"
else:
    print "No."

alice |= ADMIN

print "Can Alice write now?"

if alice & CAN_WRITE:
    print "Yes!"
else:
    print "No."

And the output:

$ python bit.py
Is Bob an admin?
Yes!
is Bob a reader?
Yes!
Is Alice a writer?
No.
Is Alice a reader?
Yes!
Can Alice write now?
Yes!

So we’ve defined a very simple permission scheme here, reading, writing and administrating. We’ve defined 3 “flags”, indicating what you can do, and we’ve defined 3 “roles”, defining what each role has access to.

The way this works, comes from what we discussed above. CAN_READ is 4, CAN_WRITE is 8, and CAN_ADMIN is 16. As we saw, we can piece these together using the “or” operator, to get a new number that has that flag “set.” READER is 4, WRITER is 12 (CAN_READ | CAN_WRITE), and ADMIN is 28 (CAN_READ | CAN_WRITE | CAN_ADMIN, or simply WRITER | CAN_ADMIN to add that flag).

Now, with an unsigned integer, we can go up to (2**16)-1 (65535, does that look familiar?), so we can actually fit quite a few more flags in there. How many? You guessed it–16.

In Python, and most other “newer” languages, you don’t really have to worry about unsigned-ness and 8 bit integers, as the language just adjusts the internal representation when you go above the limit. This means that Python won’t really complain if you give it something like 2<<1 | 2<<100, it will just give you back 2535301200456458802993406410756L, indicating that you’re no longer dealing with integers, you’re now dealing with longs. Most database backends support this too–MySQL and PostgreSQL both gives you BIGINT, which will let you go up to 18446744073709551615 (which is (2<<63)-1, hence it’s a 64 bit integer.)

So now you have 64 flags you can mess around with, and you can define these on a *per row/object level basis in a single column/number*! So theoretically, you could eliminate 64 database columns in favor of one number, and you can even use SQL to SELECT it, as SQL *also* supports bitwise operators.

Oh, right, how can you use this in your Django application? Well, we use it heavily on Bitbucket to define permissions to repositories, issues, whatever.

We have a lot of statements that look like this:

if repo.access_for(request.user) & RP.WRITER:
   # allow the write...

And of course, you can construct various other complex comparisons this way.

I hope this has helped you understand basic bitwise operators, and I urge you to dive in further. There’s cool stuff like “not” (~) and “xor” (^), who may be more powerful than what we’ve already demonstrated.

Read more about Bitwise operators on Wikipedia, and have fun.

Written by jespern

August 27th, 2009 at 12:16 pm

Posted in Uncategorized

Dumpster kitten

with 2 comments

Kitten

I found this little guy yesterday. Me and Katie were driving home from my Greek lesson and we could hear something screaming (we had the windows down.)

I pull over, and we walk towards the sound. This was at 10pm, by the way. Just by a dumpster, we see this tiny kitten, screaming his eyes out. Well, that’s a bad analogy, seeing as he’s just a couple of days old and his eyes aren’t even open yet. In fact, one of them was infected, badly.

We take him home and clean up his eye a bit. He doesn’t like that at all, but it needs to be done. We try to feed him a bit with some milk, but we have no way of getting it into his tiny mouth, and he keeps trying to suck on our fingers. Poor thing.

We drive to the vet, but he’s closed. His cell number is on the door though, so we call it. He tells us to go by the 24-hour pharmacy and pick up a syringe. Strangely, they give it to us for free. Needle and everything. Fun.

We head back home, fill it with milk and try to feed him again. Much better this time. This little guy must’ve gulped down about 25ml of milk, and in between meals he also decided to pee on me. Good sign, I think. We put some blankets in a bucket and put him in Katie’s office downstairs. It’s dark there, and he fell asleep instantly. Poor guy must’ve been terrified and exhausted. God knows how long he’s been in that dumpster.

This morning he was still sound asleep in his blankets, with the eventual “mmmm blankets comfy” moves. We haven’t woken him up yet, but we’ll go to the vet this morning. He seems to be doing a lot better.

So, to the people who put him where we found him, and to anyone ever having dumped a litter of kittens in the trash: You need to be arrested and put in jail. You’re inhuman. Animal cruelty is one of the lowest points you can reach in life. If you have a female cat and you don’t want kittens, a) don’t let her go outside and/or b) have her fixed. You need to be responsible. If you *do* end up with kittens, you’re responsible for them, too. What you did was inexcusable and I hope for your sake, that we never meet.

What’s gonna happen to this little guy? I don’t know. Someone we know might want him. If they don’t, we’ll probably adopt him. We already have a cat, but he’s currently back in The Netherlands, getting his vaccinations sorted out. Maybe the two will get along, we’ll see.

One thing’s for sure: We’ll take care of him and make sure he’s happy for as long as we need to.

Written by jespern

July 18th, 2009 at 8:48 am

Posted in Uncategorized

Making Python’s string.Template useful

with 3 comments

You know how Python has string.Template? It’s kinda useful, as it allows you to do stuff like:

from string import Template
s = Template('$who likes $what')
print s.substitute(who='tim', what='kung pao')
'tim likes kung pao'

That’s neat. But more often than not, you may want to use nested dicts, so you can write something like ‘person.name’. string.Template won’t allow you to do this, but it’s pretty easy to get around:

class TraversingDict(dict):
    def __getitem__(self, item):
        if '.' in item:
            source, path = item.split('.', 1)
            return TraversingDict(self[source])[path]
        return super(TraversingDict, self).__getitem__(item)

class InterpolTemplate(string.Template):
    idpattern = r'[_a-z][_a-z0-9\.]*'

    def render(self, dct):
        return self.safe_substitute(TraversingDict(dct))

How does it work? It uses a custom class, which subclasses ‘dict’. It’ll behave just like a normal built-in dictionary, but we’ve overriden __getitem__ to look for periods in the key name. If one is found, it splits up the key, and instantiates itself recursively. This essentially means that you can nest to any level, like ‘person.information.personal.name.first_name’.

The ‘render’ method on InterpolTemplate is not really needed, but it turns your dict into a TraversingDict, so you don’t need to mess with those at all:

Here’s the unittest I use:

def run_template_test():
    tmpl = "repository: ${repo.name}, owner: ${repo.owner}, size: ${size}"
    t = InterpolTemplate(tmpl)
    d = { 'repo': { 'name': 'foo', 'owner': 'bar' }, 'size': 42 }
    r = t.render(d)

    assert r == 'repository: foo, owner: bar, size: 42', r

Neat, eh? Makes for a nice simple substitute when you don’t want to rely on <insert template library here>.

Written by jespern

July 15th, 2009 at 9:50 am

Posted in python

Off to Eurodjangocon

without comments

Tomorrow morning (Sunday) I’ll be off to Eurodjangocon. I’ll be in Prague for one week, staying at the Iris congress hotel,  so if anyone wants to meet up and discuss, let me know. My contact information can be found on the About page.

I’m bringing ~200 Bitbucket stickers as well, first come, first served.

If you’re going to the conference and want to grab a beer, that’s cool too.

Written by jespern

May 2nd, 2009 at 3:34 pm

Posted in django

Debugging Django cache

with one comment

Easy way to debug what’s going on with your cache:

from django.core.cache import cache as django_cache

class debug_cache(object):
    ignore = [ 'to_slug', 'repodownloadsize' ] # ignore keys starting with these

    def __getattr__(self, attr):
        def wraps(f):
            def i(*args, **kwargs):
                if not any([ args[0].startswith(ig) for ig in debug_cache.ignore ]):
                    print "CACHE %s: args=%s, kwargs=%s" % (attr, args, kwargs)
                return f(*args, **kwargs)
            return i
        return wraps(getattr(django_cache, attr))

if getattr(settings, 'DEBUG_CACHE', False):
    cache = debug_cache()
else:
    cache = django_cache

Put a setting in your settings.py called DEBUG_CACHE = True, and you’ll see what’s going on.

Written by jespern

April 16th, 2009 at 9:46 am

Posted in Uncategorized

Mercurial powertip: Move changesets out of the way momentarily

with 2 comments

Sometimes you may be working in a repository, and want to momentarily move changesets out of the way. From what I can gather, you can get the same results as you get with “git stash”, but it offers much more.

Say that you have been working on an experimental feature, but need to fix a bug. You don’t want to sit and be careful only to commit the files modified by the bugfix, especially if the bugfix touches files you’ve already modified.

Your log could look like this:

$ hg log
changeset:   2:41009a6aa783
tag:         tip
summary:     adding B

changeset:   1:419ab519b195
summary:     adding C

changeset:   0:8f276b14c116
summary:     adding A



Now, changeset 1 & 2 are the experimental changes. You need to get rid of these before you can fix the bug.

It’s important that you have a “patch queue” repository inside your repository first, this is what “qinit” is for. Afterwards, we’ll import the changesets into the patch queue, using qimport:

$ hg qinit -c # tell hg to create a versioned patch queue in .hg/patches/
$ hg qimport -r 2:1


Now lets take a look at the log:

$ hg log
changeset:   2:41009a6aa783
tag:         qtip
tag:         2.diff
tag:         tip
summary:     adding B

changeset:   1:419ab519b195
tag:         1.diff
tag:         qbase
summary:     adding C

changeset:   0:8f276b14c116
tag:         qparent
summary:     adding A



The changesets are still there, but they’re a little different; They’ve been tagged with a couple of things – first, is the filename the changeset was saved as. In this case, your changes are in ‘.hg/patches/1.diff’ and ‘.hg/patches/2.diff’. Go on, have a look. There’s also some new semantic, namely ‘qbase’, ‘qtip’ and ‘qparent’. This is a way for MQ to keep track of the queue tip, the queue base and the parent.

But, you may notice that the changesets are still present. This is because they are “applied” to the repository. To get rid of them, we use qpop:

$ hg qpop -a # pop all patches from the stack
patch queue is now empty
$ hg log
changeset:   0:8f276b14c116
tag:         tip
summary:     adding A



Lovely. You can now see which patches are available via qseries:

$ hg qseries
1.diff
2.diff



To pop them back on the stack, you can use ‘qpush -a’. But first, we have a bug to fix:

$ echo 'D' > D
$ hg add D
$ hg ci -m "Adding D, which we'll pretend fixes a bug."



And the log:

$ hg log
changeset:   1:5d41625a80b5
tag:         tip
summary:     adding D (which is a bugfix)

changeset:   0:8f276b14c116
summary:     adding A



Now push that fix out, or whatever you want to. Time to get the experimental changesets back. We’ll use ‘qpush -a’ for that:

$ hg qpush -a
applying 1.diff
applying 2.diff
now at: 2.diff



You can run log to see what happened. Needless to say, your patches are there. Lets turn them back into normal changesets:

$ hg qfinish 3:2 # they're not 2:1 anymore, we have another changeset
                   in before them now, consult 'hg log' for details
$ hg log
changeset:   3:1a07541824d3
tag:         tip
summary:     adding B

changeset:   2:b4a1402f9b50
summary:     adding C

changeset:   1:5d41625a80b5
summary:     adding D (which is a bugfix)

changeset:   0:8f276b14c116
summary:     adding A



Et viola.

There’s much more you can do with MQ. If you’re only importing a single changeset, you can name the patch via ‘qimport -n’. You can give your patches to other people, and you can even push your patch queue around. ‘qimport’ will even import patches from outside your repository. You can move the order of patches around, you can do guards, .. MQ is really a wonderful addition to Mercurial.

Written by jespern

April 10th, 2009 at 12:04 pm

Posted in hg

{l,r}strip considered harmful

with one comment

If you’re using lstrip() or rstrip() in your code, chances are you might have a problem.

This is because those functions probably don’t do what you think they do.

So go ack --python '[lr]strip' your codebase now.

What you think it does

If you haven’t been bitten by this before, and you haven’t thoroughly read help(str.rstrip), you probably think rstrip will strip a sequence of bytes off the end of a string.

For example, it could be used to get rid of a file extension, like


>>> filename = "fumble.exe"
>>> basefn = filename.rstrip(".exe")

Bzzzt. Wrong.

What it actually does

As per the docstring:

rstrip(…)
S.rstrip([chars]) -> string or unicode

Return a copy of the string S with trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
If chars is unicode, S will be converted to unicode before stripping

Pay attention here: characterS. Plural. Not a sequence. More like a list.

Now, have a look at our previous example, removing the extension.


>>> filename = "fumble.exe"
>>> basefn = filename.rstrip(".exe")
>>> basefn
'fumbl'

Not what you expected, eh? Problem here is that it treats ‘.exe’ as a list of characters, so it’s basically this:


>>> remove_chars = [ '.', 'e', 'x', 'e' ]
>>> for char in reverse(filename):
... if char in remove_chars:
... # remove the char we're looking at
... else:
... break

  1. Start at the end and go backwards, byte by byte.
  2. If the character we’re seeing is in the aforementioned list, remove it.
  3. If not, we’ve reached a stop point, so process no further.

The opposite is of course true for lstrip.

What it is useful for

Once you get over the misleading behavior and come to terms with what it actually does, you can start discovering what it is useful for.

For example, it’s immensely useful for stripping leading or trailing whitespace. In fact, this is such a common use-case that this is what it does if you don’t specify any arguments.

Since it’s a list of characters, in cases where you need to remove both unix-style carriage returns as well as win32 ones, you can simply do:


block_of_text.rstrip("\r\n")

This will remove both. They don’t necessarily have to be in that order.

What you probably wanted instead

OK, so having that out of the way, what would you want to get rid of a file extension? replace(). replace() is perfect for this, because it takes a third optional argument:

replace(…)
S.replace (old, new[, count]) -> string

Return a copy of string S with all occurrences of substring
old replaced by new. If the optional argument count is
given, only the first count occurrences are replaced.

So lets try it again:


>>> filename = "fumble.exe"
>>> basefn = filename.replace(".exe", "", 1)
>>> basefn
'fumble'

Much better.

Written by jespern

March 8th, 2009 at 12:57 pm

Posted in python

Tagged with

Mercurial powertip: Un-add a file

without comments

Something that isn’t entirely clear from the use of Mercurial, is how to un-add a file you accidentally added, before you commit.


$ hg add data/
adding data/index.txt
adding data/README
adding data/hugefile.db

$ hg status
A data/index.txt
A data/README
A data/hugefile.db

Oops. Didn’t want to add ‘hugefile.db’. How to undo that add?


$ hg revert data/hugefile.db

Did that do the right thing?


$ ls data/hugefile.db # still there?
data/hugefile.db

$ hg status
A data/index.txt
A data/README
? data/hugefile.db

Yep!

Written by jespern

March 2nd, 2009 at 10:04 am

Posted in hg

Tagged with ,