Wednesday, 17 July 2013

London R - c++ functions in R embarrassingly easy with Rcpp

I went to the LondonR meetup yesterday sponsored by Mango Solutions and also the workshop on using Rcpp - c++  functions in R.
A great introduction to Rcpp, thanks due to Hadley Wickham. Thanks also to Dirk Eddelbuettel and Romain Francois for the package. Romain was also there helping out.
The takeaway is that it is astonishingly easy to create and use c++ functions within R using Rstudio passing in and out vectors of strings or numbers, so much so that I felt driven to post. If performance in some bottleneck in R is a problem and you've vectorised your code etc. there's another chunk of improvement (4 to 20 times depending on some samples we saw) available for very little effort. It's impressive what Dirk and Romain have done.
The three talks after were worth listening to, Andrie de Vries on an overview of using survival analysis to pick up what influenced a marketing success, Richard Pugh on analytics applications including R and Hadley Wickham giving some thoughts where he's trying to take some of packages.
I was also interested to see the broad range of users there for the meetup, which reflects how people are finding R useful in so many walks of life.

Wednesday, 3 July 2013

Excel Addin performance

I was working on improving the performance of a spreadsheet and wondered what the performance of different ways of calling functions was.
----------
Findings
Note for anything critical and like all performance testing one would need to repeat the testing with the final configurations etc.. But there are some bits of guidance.

The quickest way to do a simple calculation is in a worksheet cell with worksheet functions.

Using a vanilla XLL would take around twice as long, a COM addin something like 18 times as long and a VBA UDF something like 13 times as long.
Using the ExcelDNA way of creating a dot Net XLL is astonishingly painless, I was really impressed how quick and easy it was. However on the calling it was 4 to 5 times slower than a simple native Delphi XLL - but still quicker than a vba udf or a delphi com automation add in. Unless the function is being called a large number of times, this is a great way of creating XLL's

But if you are going to do large numbers of calculations and where the time of calling is important, it's better to use an native xll rather than writing it in VBA or writing a com automation addin.

Arrays only were a little slower when not referring with a name and looped in VBA, otherwise it made no difference - given there are lots of articles saying to avoid them on performance, maybe that's something that's improved in Excel 2010

Calculating the sum of two columns in a table is slower than calculating outside the table, arguing for calculations to be kept out of a table.

using sum() rather than + was a little quicker

using a name for a range is more readable and made no noticeable difference to the calculation
--------------
Some summary timings
I was trying get some idea of the cost of calling a function, so the function was simply to add the value of one cell to another.
This was on 32 bit Excel 2010, 4 core older xeon desktop, 16Gb ram
I used the range calculation code from http://msdn.microsoft.com/en-us/library/ff700515(v=office.14).aspx
set up a spreadsheet in its own Excel instance and created 100,000 rows of two columns of random numbers, which I then copied and pasted as values.

I then timed the calculation for these as shown below, using different ways of calculating in the worksheet, a vba user defined function and then a com add in (delphi and add in express) and an xll add in (delphi also a version with Visual Studio 2010 with ExcelDNA for a comparison), both looping through the recalculation of 10,000 rows 100 times and also one off calculations of 100,000 rows.
Although I got some variation and, there was enough consistency to make some rough conclusions which I've noted above


summarised results below, timings in microseconds

Source Formula As Range loop100
us us
=B32+A32 0.8 1.2
=addingTest!$A32 + addingTest!$B32 0.7 1.0
=random1 + random2 0.8 1.1
array {=A32:A100031 +B32:B100031} 0.7 1.7
array {=random1 + random2} 0.8 1.0
vba udf =sumolevar(A32,B32) 12.9 16.2
vba udf =sumolevarDec(A32,B32) 12.7 16.1
=addingTest!$A32+addingTest!$B32 1.3
vba udf =sumolevar(A32,B32) 14.2
vba udf =sumolevarDec(A32,B32) 13.1
com Addin =AddCellToCell(A32,B32) 18.6 18.9
=SUM(addingTest!$A32:$B32) 0.9
=SUM(random1 + random2) 0.9
xll =AddAtoB($A32,$B32) 2.4 2.9
=SUM(A32,B32) 1.0 1.0
=SUM(A32+B32) 0.9
=SUM(A32:B32) 1.0
in Table =[@random1]+[@random2] 1.6
=A32 + B32 0.7
=A32:A100031 +B32:B100031 0.8
vba udf =sumolevar(Table1[@random1],Table1[@random2]) 15.3
vba udf =sumolevarDec(Table1[@random1],Table1[@random2]) 15.2
xll =AddAtoB(Table1[@random1],Table1[@random2]) 15.2
ExcelDNA xll =AddTwoCells(A32,B32) 11.6