3 Nov 00:47 2011

## Re: SVM Features in MADLib

Sorry for the delay in responding, Shankha. I have been travelling. > 1) How well does MADLib SVM implementation scale in PostgreSQL or > Greenplum? In short, very well. The SVM implementation in MADlib uses an online stochastic gradient descent algorithm to do learning, which means examples are processed one at a time, allowing it to scale to massive datasets. In particular, we don't have problems like computing the large kernel matrix that batch algorithms need to do. Perhaps the main limitation with scalability is the size of the model. For a billion row data set, even a compression factor of 0.01% would give you a fairly big model, which needs to be evaluated often. > 2) Does MADLib SVM implementation utilize parallel processing? Yes, you can learn an ensemble of SVMs. We don't use parallel processing when learning a single SVM though. > 3) If my training set were 150k instances and 3000 features, roughly > how long would the training time take when using the MADLib SVM > implementation? Probably seconds. At most minutes.(Continue reading)