Ticket #760 (new defect)
traceback creation slows down exception propagation considerably
|Reported by:||scoder||Owned by:||somebody|
I did some callgrind profiling on Cython's generators and was surprised to find that AddTraceback() represents a serious performance penalty for short running generators.
I profiled a compiled Python implementation of itertools.groupby(), which yields (key, group) tuples where the group is an iterator again. I ran this code in Python for benchmarking:
L = sorted(range(1000)*5) all(list(g) for k,g in cython_groupby(L))
Groups tend to be rather short in real code, often just one or a couple of items, so unpacking the group iterator into a list will usually be a quick loop and then the generator raises StopIteration on termination and builds a traceback for it. According to callgrind, the iteration during the group unpacking takes about 30% of the overall runtime of the all() loop, and the AddTraceback() call at the end of each group traversal takes up to 25% (!) on my side. That means that more than 80% of the group unpacking time goes into raising StopIteration from the generators.
About half of the exception raising time is eaten by PyString_FromFormat() that builds the function-name + line-position string (which, I may note, is basically a convenience feature). This string is a constant for a generator's StopIteration exception, at least for each final return point in a generator, but here it is being recreated over and over again, for each exception that gets raised.
Even if we keep creating a new frame instance each time (which should be ok because CPython has a frame instance cache already and we'd only create one during the generator lifetime), the whole code object could actually be cached after the first creation, preferably bound to the lifetime of the generator creator function/method. Or, more generally, one code object per generator termination point, which will be a single point in the majority of cases. For the specific code above, that should shave off almost 20% of the overall runtime of the all() loop.