Sort python dictionaries by values

Perhaps you already encountered a problem like the following one yourself:
You have a large list of items (let’s say URIs for this example) and want to sum up how often they were viewed (or edited or… whatever). A small one-shot solution in python looks like the following and uses the often unknown operator.itemgetter:

import sys
import operator
uriViews = {}
for line in sys.stdin:
    uri, views = line.strip().split()
    views = int(views)
    uriViews[uri] = uriViews.get(uri, 0) - views
    # why minus? could do + and use reverse=True below?
    # right, but then items also get sorted in reverse, if views are
    # the same (so Z would come first, a last)
for uri, views in sorted(uriViews.items(),
                         key=operator.itemgetter(1,0)):
    print -views, uri

This approach can be a lot faster than self written lambda functions called for every comparison or a list comprehension which turns around all tuples and then sorts it. Also in contrast to many other solutions you can find online this one uses operator.itemgetter(1,0), which means that if two items have the same amount of views, their URIs are sorted alphabetically.

Remember that this approach sorts the whole list in RAM, so you might want to chose other approaches in case your lists are getting huge.
For further information you might want to read PEP-0265, which also includes a hint what to do when you’re only interested in the Top 1000 for example (will only sort these top1000):

import heapq
top1000 = heapq.nlargest(1000, uriViews.iteritems(), itemgetter(1,0))
for uri,views in top1000:
   print -views, uri
This entry was posted in Coding and tagged , , , , , , , , . Bookmark the permalink.

2 Responses to Sort python dictionaries by values

  1. joern says:

    Hi Gecko,
    ja, aber erneut: itemgetter(1) holt nur jeweils das 2te item, ich würde itemgetter(1,0) benutzen, damit klar ist, wie’s bei Gleichheit vom 2ten item weitergeht.
    Viele Grüße,
    Jörn

  2. Gecko says:

    Hi Jörn.

    itemgetter ist auch super, um ein mehrdimensionales array nach einem beliebigen key zu sortieren:

    from operator import itemgetter
    someArray = [['o',3],['n',1],['e',2]]

    someArray.sort()
    print "Standard Sortierung:\n", someArray
    # [['e', 2], ['n', 1], ['o', 3]]

    someArray.sort(key=itemgetter(1))
    print "sortiert nach den Ziffern:\n", someArray
    # [['n', 1], ['e', 2], ['o', 3]]

    Gruß,
    Gecko

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>