Oct 032018
 

I have a Python script that over-simplifying, reads very large log files and runs a whole bunch of regular expressions on each line. As it had started running inconveniently slowly, I had a look at improving the performance.

The conventional wisdom is that if you are reading a file (or standard input), then the simplest method is probably almost always the fastest :-

for line in logstream:
    processline(line)

But being stubborn, I looked at possible improvements and came up with :-

from itertools import islice
    
while True:
    buffer = list(islice(logstream, islicecount))
    if buffer != []:
        for line in buffer:
             processline(line)
    else:
        break

This code has been updated twice because the first version added a splat to the output and the second version (which was far more elegant) didn’t work. The final version 

This I benchmarked as being nearly 5% quicker – not bad, but nowhere near enough for my purposes.

The next step was to improve the regular expressions – I read somewhere that .* can be expensive and that [^\s]* was far quicker and often gave the same result. I replaced a number of .* occurrences in the “patterns” file and re-ran the benchmark to find (in a case with lots of regular expressions) the time had dropped nearly 25%.

The last step was to install nuitka to compile the Python script into a binary executable. This showed a further 25% drop – a script that started the day taking 15 minutes to run through one particular run ended the day taking just under 8 minutes.

The funny thing is that the optimisation that took the longest and had the biggest effect on the code showed the smallest improvement!

Four Posts
Jan 292018
 

I recently dived into the rabbit hole of educational computers and came across a site which made a big song and dance about how Python is a great deal more complicated than BASIC. Well that is perhaps arguably correct, but the comparison they made was grossly unfair :-

#!/usr/bin/env python
#-*- coding: UTF-8 -*-

from random import randint
from time import sleep
import sys

string = "Hello World!"
while true:
  attr = str(randint(30,48))
  out = "\x1b[%sm%s\x1b[0m" % (attr, string)
  sys.stdout.write(out)
  sys.stdout.flush()

  sleep(1)

Now for the criticisms :-

  1. The first line (“#!/usr/bin/env …”) is nothing to do with Python; and in fact a BASIC program should also include this if it wants to run in the same way as a Python program under Linux. The “#!” is in fact a directive to the Linux kernel to tell it what script to pass the rest of the file through.
  2. The second line (“# -*-…”) also has nothing to do with Python; it is a directive to an editor to tell it to use the UTF-8 character set. Why doesn’t the basic equivalent also include this?
  3. Now onto the Python itself … first of all there are a whole bunch of imports which are done in the verbose way just so that you can call sleep rather than time.sleep; I generally prefer the later (which would result in the import time rather than from time import sleep). But yes, in Python you have to import lots of stuff to get anything done, and it would be helpful for quick and dirty scripts if you could just import lots to get a fair amount of ordinary stuff loaded.
  4. The rest of the code is … um … obviously designed to make Python look bad and glossing over the fact that Python runs in the Linux runtime environment whereas the BASIC equivalent does not – it has a BASIC runtime environment.

That last point is worth going into more detail on – the BASIC code was written for a BASIC runtime environment, and one method of sending output to the screen. Linux has many ways of writing to the screen, and the chosen method above is perhaps historically the worst (it only works for devices that understand the escape sequences; there is a curses library for doing this properly).

So is Python unsuited to a quick and easy learning environment? A quick hackers language? As it is, perhaps not, but that is not quite what Python is designed to be. And with a suitable set of modules, Python could be suitable :-

import lots

white True:
  screen.ink(random.choice(inkcolours))
  screen.paper(random.choice(papercolours))
  screen.print("Hello World!")

  time.sleep(1)

(That’s entirely hypothetical of course as there is no “screen” module)

I’m not qualified to judge whether BASIC or Python are better languages for beginners – I’ve been programming for around 35 years, and the BASIC I remember was very primitive. But at least when you compare the two languages, make the comparison a fair one.

Jan 062007
 

I have just released a new version of Popspeaker, a trivial little Python script to make announcement sounds when it spots new messages from selected people in your POP3 mailbox. The big change is that it now loads a configuration file rather than rely on global variables in the script itself; but some other minor improvements have been made to make this more like a product and less than a scrofulous script knocked up for one person’s use.

The advantage of running this script for me, is that I can be sitting down reading a book and my workstation will announce “You have mail from your parents” if that happens. I can see mails from interesting people quickly, and let all the spam and other cruft wait until I am in the mood to trawl through my mail.