Some RapidMiner, some JMP, some Google Docs
I tend to push all of this work down to the database level. However, I can run into memory problems if I'm just selecting out (so you need to do a table read). The greater question is...with RapidMiner, is it better practice to deal with a database read directly into RapidMiner, or...a database read into a flat file, that is then read into RapidMiner...which is more efficient, and which is "good form".
I like using RM for ETL as it's easy to test and save the process. It's easy to do etl in many programs, but how easy is it to reproduce and share?