Perhaps you already encountered a problem like the following one yourself:
You have a large list of items (let’s say URIs for this example) and want to sum up how often they were viewed (or edited or… whatever). A small one-shot solution in python looks like the following and uses the often unknown operator.itemgetter
:
import sys
import operator
uriViews = {}
for line in sys.stdin:
uri, views = line.strip().split()
views = int(views)
uriViews[uri] = uriViews.get(uri, 0) - views
# why minus? could do + and use reverse=True below?
# right, but then items also get sorted in reverse, if views are
# the same (so Z would come first, a last)
for uri, views in sorted(uriViews.items(),
key=operator.itemgetter(1,0)):
print -views, uri
This approach can be a lot faster than self written lambda functions called for every comparison or a list comprehension which turns around all tuples and then sorts it. Also in contrast to many other solutions you can find online this one uses operator.itemgetter(1,0)
, which means that if two items have the same amount of views, their URIs are sorted alphabetically.
Remember that this approach sorts the whole list in RAM, so you might want to chose other approaches in case your lists are getting huge.
For further information you might want to read PEP-0265, which also includes a hint what to do when you’re only interested in the Top 1000 for example (will only sort these top1000):
import heapq
top1000 = heapq.nlargest(1000, uriViews.iteritems(), itemgetter(1,0))
for uri,views in top1000:
print -views, uri