Wednesday, November 10, 2010

Text Analytics With Rapidminer Part 3 of 6 - Association Rule Learning

Thanks for watching, and welcome Reddit!

This is part three of a six-part series on text mining in RapidMiner. This video describes how to find association rules in a collection of documents. An example would be if a job posting includes "data" and "mining" then it is also likely to include "RapidMiner". This is known as market basket analysis when applied to grocery stores :)

In this example, it can be useful for finding phrases and concepts that are important to job recruiters. You can use these phrases and concepts in your cover letter and resume, and increase your chances of getting them read.

Topics covered:
  • reading documents from a database
  • processing the text
  • creating a word vector
  • finding frequent itemsets using the FP-Growth algorithm
  • finding association rules
  • visualizing association rules

If you're not familiar with RapidMiner, see my other videos on my Youtube Channel

Up next, calculating the similarity between documents.


  1. Thank you very much for video, pal.

  2. thanks for your vedio.

  3. great tutorial - it really helped me, but I could not use the example set produced by the ReadDatabase component as you did it.

    The thing is, that the ProcessDocuments Component did not work correct. The reason was, that the output type of the ReadDatabase Component ist "nominal" - and not "text". As you can see, this is also why there is an error message in there.

    So because of that, tokenization, etc. did not work at all in my case. I just added the Nominal to Text Component between ReadDatabase and ProcessDocuments - to get it work.

    Do you have any idea why this runs through in your case?

  4. Hi Martin,

    Ya, that's why I have the big warning on Part 1. I'm not sure why I was able to do that without the conversion, may have something to do with SQL Server as a datasource. Either way, use the nominal to text operator just in case.



  5. Hi Neil,
    great videos and tutorials - I went through all of them an learned quite a bit.
    I have one problem with the Tut No3: RM is just not able to calculate this at my machine. It just wont move beyond the Growth/Create Association segment.
    Any suggestions what values to pick to speed this a bit up?

  6. Great videos...I appreciate the effort it took you to create these videos. Any guidance on how to use the model created on unseen data? Thanks in advance.

  7. Great tutorials! Thanks so much.

    I'm importing an Excel list of email messages.
    In my association rules results I get 'show rules matching' instead of the 'conjunctiontype'. Anyone know why?


  8. I wanted to get some more information on how to "read" or analyze the data. I guess I could create a fake shopping list and really understand it better but i wanted to get your thoughts on how to define terms like "premise" and "conclusion". I know what support and confidence mean but say we had the word pairs.
    {milk, bread} - > {butter}. IS the pair (milk and bread) considered the premise here and butter the conclusion? I was looking at the following on wikipedia and wanted to guidance