Tag Archives: heapq

Sort python dictionaries by values

Perhaps you already encountered a problem like the following one yourself:
You have a large list of items (let’s say URIs for this example) and want to sum up how often they were viewed (or edited or… whatever). A small one-shot solution in python looks like the following and uses the often unknown operator.itemgetter:

import sys
import operator
uriViews = {}
for line in sys.stdin:
    uri, views = line.strip().split()
    views = int(views)
    uriViews[uri] = uriViews.get(uri, 0) - views
    # why minus? could do + and use reverse=True below?
    # right, but then items also get sorted in reverse, if views are
    # the same (so Z would come first, a last)
for uri, views in sorted(uriViews.items(),
                         key=operator.itemgetter(1,0)):
    print -views, uri

This approach can be a lot faster than self written lambda functions called for every comparison or a list comprehension which turns around all tuples and then sorts it. Also in contrast to many other solutions you can find online this one uses operator.itemgetter(1,0), which means that if two items have the same amount of views, their URIs are sorted alphabetically.

Remember that this approach sorts the whole list in RAM, so you might want to chose other approaches in case your lists are getting huge.
For further information you might want to read PEP-0265, which also includes a hint what to do when you’re only interested in the Top 1000 for example (will only sort these top1000):

import heapq
top1000 = heapq.nlargest(1000, uriViews.iteritems(), itemgetter(1,0))
for uri,views in top1000:
   print -views, uri

Min-Heap in Python

I recently wanted to implement a small event system where events can have different priorities. So for example the event with highest priority (lowest value) should be handled first.
Python comes with a heapq module which can transform a list into a heap in a way that it stays a list, but fulfills all heap constraints. Nevertheless you might want to wrap the heap like this, so you can do nice stuff:

import heapq
class Heap(object):
    """ A neat min-heap wrapper which allows storing items by priority
        and get the lowest item out first (pop()).
        Also implements the iterator-methods, so can be used in a for
        loop, which will loop through all items in increasing priority order.
        Remember that accessing the items like this will iteratively call
        pop(), and hence empties the heap! """

    def __init__(self):
        """ create a new min-heap. """
        self._heap = []

    def push(self, priority, item):
        """ Push an item with priority into the heap.
            Priority 0 is the highest, which means that such an item will
            be popped first."""
        assert priority >= 0
        heapq.heappush(self._heap, (priority, item))

    def pop(self):
        """ Returns the item with lowest priority. """
        item = heapq.heappop(self._heap)[1] # (prio, item)[1] == item
        return item

    def __len__(self):
        return len(self._heap)

    def __iter__(self):
        """ Get all elements ordered by asc. priority. """
        return self

    def next(self):
        """ Get all elements ordered by their priority (lowest first). """
        try:
            return self.pop()
        except IndexError:
            raise StopIteration

With this one could do the following:

h = Heap()
# add some nonsense:
h.push(10, "I'm a large one")
h.push(20, "Actually I'm larger")
h.push(5, "size is not everything")
h.push(0, "which size?")

# get it out in a cool way:
for item in h:
   print item

# prints:
# which size?
# size is not everything
# I'm a large one
# Actually I'm larger