Tuesday, July 30, 2013
Sunday, May 26, 2013
Text Mining Performance in RapidMiner
Did load testing with RapidMiner 5.3 on my laptop (Core i3, 8GB RAM, non-SSD hard drive). Here are the results.
I set up Java to use 6500 MB of memory (max).
I used the Read Database operator to get the documents. They were random Latin words, of 20 to 500 words in length.
The text processing was purposefully simple: tokenize the document and get the binary word vector.
I then stored the results in the RapidMiner repository, which creates a binary file.
In a different process, I then read the stored results and applied a Naive Bayes model to them. I didn't do all of them, but there wasn't much difference. As you can see, the model application is quite fast.
The store operator was much faster than the Write Database operator.
I set up Java to use 6500 MB of memory (max).
I used the Read Database operator to get the documents. They were random Latin words, of 20 to 500 words in length.
The text processing was purposefully simple: tokenize the document and get the binary word vector.
I then stored the results in the RapidMiner repository, which creates a binary file.
In a different process, I then read the stored results and applied a Naive Bayes model to them. I didn't do all of them, but there wasn't much difference. As you can see, the model application is quite fast.
# Records
|
Time to process + store (s)
|
Peak memory (GB)
|
Stored results file size (MB)
|
Time to apply (s)
|
100
|
0
|
0.400
|
0.223
|
1
|
1,000
|
1
|
0.576
|
2.1
|
0
|
10,000
|
8
|
1.3
|
21
|
1
|
20000
|
15
|
2.4
|
42
| |
30000
|
23
|
2.6
|
63
| |
40000
|
30
|
2.9
|
84
| |
50000
|
39
|
3.8
|
105
|
5
|
60000
|
48
|
4.0
|
126
|
5
|
70000
|
56
|
4.1
|
148
| |
80000
|
66
|
4.5
|
168
| |
90000
|
71
|
4.7
|
190
| |
100,000
|
88
|
5.3
|
211
|
The store operator was much faster than the Write Database operator.
Thursday, May 16, 2013
AWS Redshift: How Amazon Changed The Game
A good blog post on Amazon RedShift - their Postgres-based massive data warehouse. Some good analysis on performance and costs:
http://blog.aggregateknowledge.com/2013/05/16/aws-redshift-how-amazon-changed-the-game/
http://blog.aggregateknowledge.com/2013/05/16/aws-redshift-how-amazon-changed-the-game/
Thursday, April 18, 2013
Vancouver Training: Introduction to Data Mining and Predictive Analytics with RapidMiner - Save $500
I'll be teaching a RapidMiner course here in Vancouver next week:
Tuesday, April 23, 2013 at 8:30 AM - Wednesday, April 24, 2013 at 5:00 PM (PDT)
Details here:
http://rapid-i_us_20130423-eorg.eventbrite.com/
Save $500 with the coupon VAN_BLOG !
Tuesday, February 12, 2013
Google's Data Mining Research Papers
In case you missed it, here are Google's 104 data mining research papers:
http://research.google.com/pubs/DataMining.html
http://research.google.com/pubs/DataMining.html
Subscribe to:
Posts (Atom)