Vancouver Data Blog by Neil McGuigan: RapidMiner ETL - Sampling, Selecting Rows, Attributes

Thursday, August 25, 2011

RapidMiner ETL - Sampling, Selecting Rows, Attributes

In this video I show how to sample rows, including balancing class labels, bootstrap sampling. I also show how to filter rows by value, and select a subset of attributes.

You can get the dataset here

2 comments:

No Names NecessaryAugust 26, 2011 at 11:01 AM
I tend to push all of this work down to the database level. However, I can run into memory problems if I'm just selecting out (so you need to do a table read). The greater question is...with RapidMiner, is it better practice to deal with a database read directly into RapidMiner, or...a database read into a flat file, that is then read into RapidMiner...which is more efficient, and which is "good form".
ReplyDelete
Replies
Neil McGuiganAugust 26, 2011 at 8:45 PM
I like using RM for ETL as it's easy to test and save the process. It's easy to do etl in many programs, but how easy is it to reproduce and share?
ReplyDelete
Replies

Add comment

Vancouver Data Blog by Neil McGuigan

Pages

Thursday, August 25, 2011

RapidMiner ETL - Sampling, Selecting Rows, Attributes

2 comments:

Archive