Tuesday, 11 March 2014

Julia

At the moment my workflow for a lot of data processing uses Perl to massage and prepare the data, and then R to examine and report on it if I can get it into memory.
RMD and Knitr produces reports good enough to email out with blat, run from R and works well.
We've got a lot of MS SQLserver databases - for larger queries I'm collecting the information into files running Transact SQL with SQLCMD in bat or perl scripts.
After following Fortress to its demise, I've been tracking Julia which seems to be getting traction focusing on providing accessible performance.
An initial question was could I do the processing I do with Perl in Julia

and the results for a simple processing of a 25 GB text file to give a 750Mb output file
Perl
# 41374044 lines took:2839 wallclock secs (2786.82 usr + 16.72 sys = 2803.54 CPU)
Julia
elapsed time: 1022.455216751 seconds

The Julia file is pretty much a translisteration of the Perl one and given a lot of the timing would be the reading and writing of the files that's a significant gain.
I'll try a simple comparison with Python, though not sure why it would be much better than Perl.

But it looks as though I've a better way to do the report processing that I've been doing in Perl
I've been using Perl for about twenty years to process text files, which it's still great at. But the time saving without being a struggle to get running is a cause to use Julia instead in the report processing, never mind the other use cases it should be good at.