15 Jun 2012 13:29
Working with large datasets, preprocessing and issues with categorical features
utku yabas <utkuyabas <at> gmail.com>
2012-06-15 11:29:44 GMT
2012-06-15 11:29:44 GMT
Hi,
I have 3 questions:
1-
I have a data set around 1GB (50.000 instances and 15.000 attr.)
Can you advise some ways to work with this data set for me? I can not even load the data. I need to apply Decision trees, Random Forests, Logistic Regression and so on.
2- How can I apply the same pre-processing steps as Training set to Test Set?
These steps include normalization, replacing missing values, etc. as well as data transformations like PCA's
3- If there are nominal attributes with huge vocabulary, Is there a preprocessing step for grouping the 10 or 20 most common values and grouping others as another group.
Thanks in advance
Utku
_______________________________________________ Wekalist mailing list Send posts to: Wekalist <at> list.scms.waikato.ac.nz List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
RSS Feed