{"id":5380,"date":"2018-10-03T21:11:56","date_gmt":"2018-10-03T21:11:56","guid":{"rendered":"https:\/\/really.zonky.org\/?p=5380"},"modified":"2018-10-06T09:54:11","modified_gmt":"2018-10-06T09:54:11","slug":"optimising-a-python-script","status":"publish","type":"post","link":"https:\/\/really.zonky.org\/?p=5380","title":{"rendered":"Optimising A Python Script"},"content":{"rendered":"\n<p>I have a Python script that over-simplifying, reads very large log files and runs a whole bunch of regular expressions on each line. As it had started running inconveniently slowly, I had a look at improving the performance.<\/p>\n\n\n\n<p>The conventional wisdom is that if you are reading a file (or standard input), then the simplest method is probably almost always the fastest :-<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>for line in logstream:\n    processline(line)<\/code><\/pre>\n\n\n\n<p>But being stubborn, I looked at possible improvements and came up with :-<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from itertools import islice\n    \nwhile True:\n    buffer = list(islice(logstream, islicecount))\n    if buffer != []:\n        for line in buffer:\n             processline(line)\n    else:\n        break\n<\/code><\/pre>\n\n\n\n<p><em>This code has been updated twice because the first version added a splat to the output and the second version (which was far more elegant) didn&#8217;t work. The final version\u00a0<\/em><\/p>\n\n\n\n<p>This I benchmarked as being nearly 5% quicker &#8211; not bad, but nowhere near enough for my purposes.<\/p>\n\n\n\n<p>The next step was to improve the regular expressions &#8211; I read somewhere that\u00a0<strong><em>.*<\/em><\/strong> can be expensive and that <strong><em>[^\\s]*<\/em><\/strong> was far quicker and\u00a0<em>often<\/em> gave the same result. I replaced a number of\u00a0<strong>.*<\/strong>\u00a0occurrences in the &#8220;patterns&#8221; file and re-ran the benchmark to find (in a case with lots of regular expressions) the time had dropped nearly 25%.<\/p>\n\n\n\n<p>The last step was to install\u00a0<em>nuitka<\/em>\u00a0to compile the Python script into a binary executable. This showed a further 25% drop &#8211; a script that started the day taking 15 minutes to run through one particular run ended the day taking just under 8 minutes.<\/p>\n\n\n\n<p>The funny thing is that the optimisation that took the longest and had the biggest effect on the code showed the smallest improvement!<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"683\" height=\"1024\" src=\"https:\/\/i0.wp.com\/really.zonky.org\/wp-content\/uploads\/2016-05-29-Four-Posts.jpg?resize=683%2C1024&#038;ssl=1\" alt=\"\" class=\"wp-image-5015\" srcset=\"https:\/\/i0.wp.com\/really.zonky.org\/wp-content\/uploads\/2016-05-29-Four-Posts.jpg?w=683&amp;ssl=1 683w, https:\/\/i0.wp.com\/really.zonky.org\/wp-content\/uploads\/2016-05-29-Four-Posts.jpg?resize=200%2C300&amp;ssl=1 200w\" sizes=\"auto, (max-width: 683px) 100vw, 683px\" \/><figcaption>Four Posts<\/figcaption><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>I have a Python script that over-simplifying, reads very large log files and runs a whole bunch of regular expressions on each line. As it had started running inconveniently slowly, I had a look at improving the performance. The conventional wisdom is that if you are reading a file (or standard input), then the simplest <a href='https:\/\/really.zonky.org\/?p=5380' class='excerpt-more'>[&#8230;]<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"_share_on_mastodon":"0"},"categories":[4,226],"tags":[407,1797,1798],"class_list":["post-5380","post","type-post","status-publish","format-standard","hentry","category-it","category-working-notes","tag-python","tag-regex","tag-regular-expression","category-4-id","category-226-id","post-seq-1","post-parity-odd","meta-position-corners","fix"],"share_on_mastodon":{"url":"","error":""},"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p1f2KI-1oM","_links":{"self":[{"href":"https:\/\/really.zonky.org\/index.php?rest_route=\/wp\/v2\/posts\/5380","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/really.zonky.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/really.zonky.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/really.zonky.org\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/really.zonky.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5380"}],"version-history":[{"count":4,"href":"https:\/\/really.zonky.org\/index.php?rest_route=\/wp\/v2\/posts\/5380\/revisions"}],"predecessor-version":[{"id":5387,"href":"https:\/\/really.zonky.org\/index.php?rest_route=\/wp\/v2\/posts\/5380\/revisions\/5387"}],"wp:attachment":[{"href":"https:\/\/really.zonky.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5380"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/really.zonky.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5380"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/really.zonky.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5380"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}