Wednesday, 5 February 2014

Bulk Load into MS SQL Server with rsqlserver

Trying to input  a few million records into MS SQL Server when they exist as a data.frame in R
Bulkload from rsqlserver was 100 times quicker than RODBC sqlSave, though a lot more hassle to get up and running.
 ---
For one sample of about 1.3 million records
via RODBC and  sqlSave

> system.time(
+ sqlSave(myconn, alldata, tablename = 'GTSDECoeff1', append = FALSE, fast=TRUE)
+ )
   user  system elapsed 
  18.45   40.65 1518.71 
 
compared to 
via rsqlserver and bulkcopy
 
> system.time(dbBulkCopy(conn, "GTSDEtest1", value = "N:/Hedge_Model_Parameters/RawSourceData/bulkload/gtsde.csv"))
   user  system elapsed 
   9.19    0.27   15.82 
 
That's a factor of 100 improvement using bulkload 
 
Mind you installing rsqlserver was harder than the standard install.package from CRAN
 - I downloaded the latest dot NET and installed it, also the latest MS c++ distributable.
Then I installed rClr from the binary
and finally  
require(devtools) 
install_github('rsqlserver', 'agstudy')
 worked.
The data needs to be clean - i.e. NA's don't convert into anything much on MS SQL Server, and one needs the bulk load permission on the server, but that's a big difference and makes some approaches viable.