Adaptive Replacement Cache in Python
Today we’re going to talk about ARCs, or Adaptive Replacement Caches.
So what is it? Well, lets talk about some other caching algorithms first.
- LRU: Last Recently Used. This means that you have a cache with a fixed size. For each “hit” on an entry, you increase the hit count (or the timestamp of access time) on the entry, and you keep the cache sorted sequentially. Once the cache “overflows”, you expunge the least used entry.
- MRU: Most Recently Used. This is used when your entries are mostly unpredictable. You keep the most recent items in the beginning of the list, and you expunge from the end once you hit the fixed size.
So what’s an ARC? Well, it’s basically a more sophisticated caching algorithm, which sort of, kind of, looks like a MRU/LRU. We’ll try to explain it in further detail here, and also provide a Python implementation.
If we look at the Wikipedia article for an ARC, it states that:
ARC improves the basic LRU strategy by splitting the cache directory into two lists, T1 and T2, for recently and frequently referenced entries. In turn, each of these is extended with a ghost list (B1 or B2), which is attached to the bottom of the two lists. These ghost lists act as scorecards by keeping track of the history of recently evicted cache entries, and the algorithm uses ghost hits to adapt to recent change in resource usage. The combined cache directory is organised in four LRU lists
…which looks like this:

There’s an overall strategy to this;
- When you first cache something, it will end up in T1. T1 will only contain entries that are hit only once.
- Once a cache hit occurs in T1, it will be moved to T2. T2 will hold anything that is hit more than once.
- When either of these heaps overflow (T1 & T2), the least hit entry will be evicted. This just means that it will be moved to the ghost-heap.
- Once you do a cache lookup, and you find the entry in the ghost, you will move the entry back into the main heap, and evict the least hit entry out into the ghost. Basically a swap.
- When a ghost overflows, the entry will be completely expunged, and will have to be calculated from scratch again.
Lets see what happens when something first gets cached:

OK, so ‘x’ gets cached, and since it’s never hit, it ends up in T1, with a hit count of 1. When we cache something else (call it ‘y’), that also ends up in T1. The second that something gets hit in the cache, we must move it to T2:

Say that T2 is filled to the brim, we must evict the least used entry to the ghost:

And then say that B2 gets too full:

‘x’ is now dead. Any requests for x will fail, and it will need to be processed again. That’s the basic premise of ARC caching. It’s not rocket science, but it’s a step up from LRU.
Anyway, I promised a Python implementation, so lets take a look at it.
DISCLAIMER: This was written on behalf of a client, who kindly allowed me to show it to the public. There’s no license on it, so lets just assume that it’s under BSD.
First of, grab a hold of the file. It includes a test in the main method, which does some basic Fibonacci calculations. That’s good, cause it will recursively requests a sequence of numbers until we reach 1 or 0, so we can cache the result of those calculations. Initializing and using the cache is easy;
c = ARC(1000) # Fixed size 1000, must be divisible by 4.
@c.cache # Decorator from `ARC'
def fib(x):
...
Lets do a run, and look at the statistics it prints after execution:
jespern@cantor:Work% python memoize-arc.py
[...some output truncated...]
ARC statistics:
-----------------
T1: 0
T2: 100
B1: 0
B2: 0
-----------------
<Heap: 0 key(s), 0 ghosted, max 250>
<Heap: 100 key(s), 0 ghosted, max 250>
Nothing remains in T1, since it was all hit more than once. T2 can hold all of the 100 fibonacci numbers we’ve requested, so nothing’s evicted to the ghosts either.
Go try it! Adjust the maxsize of the cache, and check out the statistics. See what happens. Have fun, comments welcome.
Edit, January 3rd 2008: As ‘bayareaguy’ kindly points out, ARC is patented by IBM. You might want to look at 2Q instead.
Thanks bayareaguy, I’ve updated the post with your suggestion.