Optimising at the Python level, using a good understanding of how Python works internally, is almost always good enough in the cases where optimising is required.
Bringing common variables into scope (namespace lookups are somewhat expensive).
Sometimes comparing different coding techniques is the only way to go. Working out "by eye" whether a list comprehension beats an unrolled manual loop of some description is inherently unreliable.
Since changing and rerunning code fragments is so simple in Python (no compile cycle), optimising by trial-and-error for the last phase is not a crazy idea.