Vancouver Data Blog by Neil McGuigan: Text Analytics With Rapidminer Part 3 of 6

Wednesday, November 10, 2010

Text Analytics With Rapidminer Part 3 of 6 - Association Rule Learning

Thanks for watching, and welcome Reddit!

This is part three of a six-part series on text mining in RapidMiner. This video describes how to find association rules in a collection of documents. An example would be if a job posting includes "data" and "mining" then it is also likely to include "RapidMiner". This is known as market basket analysis when applied to grocery stores :)

In this example, it can be useful for finding phrases and concepts that are important to job recruiters. You can use these phrases and concepts in your cover letter and resume, and increase your chances of getting them read.

Topics covered:

reading documents from a database
processing the text
creating a word vector
finding frequent itemsets using the FP-Growth algorithm
finding association rules
visualizing association rules

If you're not familiar with RapidMiner, see my other videos on my Youtube Channel.

Up next, calculating the similarity between documents.

8 comments:

AnonymousNovember 11, 2010 at 5:15 AM
Thank you very much for video, pal.
ReplyDelete
Replies
AnonymousNovember 15, 2010 at 7:07 PM
thanks for your vedio.
ReplyDelete
Replies
UnknownJanuary 25, 2011 at 2:04 AM
great tutorial - it really helped me, but I could not use the example set produced by the ReadDatabase component as you did it.

The thing is, that the ProcessDocuments Component did not work correct. The reason was, that the output type of the ReadDatabase Component ist "nominal" - and not "text". As you can see, this is also why there is an error message in there.

So because of that, tokenization, etc. did not work at all in my case. I just added the Nominal to Text Component between ReadDatabase and ProcessDocuments - to get it work.

Do you have any idea why this runs through in your case?
ReplyDelete
Replies
Neil McGuiganJanuary 25, 2011 at 11:27 PM
Hi Martin,

Ya, that's why I have the big warning on Part 1. I'm not sure why I was able to do that without the conversion, may have something to do with SQL Server as a datasource. Either way, use the nominal to text operator just in case.

Cheers

Neil
ReplyDelete
Replies
AnonymousFebruary 24, 2011 at 8:53 AM
Hi Neil,
great videos and tutorials - I went through all of them an learned quite a bit.
I have one problem with the Tut No3: RM is just not able to calculate this at my machine. It just wont move beyond the Growth/Create Association segment.
Any suggestions what values to pick to speed this a bit up?
Thanks!
ReplyDelete
Replies
AnonymousApril 11, 2011 at 10:14 PM
Great videos...I appreciate the effort it took you to create these videos. Any guidance on how to use the model created on unseen data? Thanks in advance.
ReplyDelete
Replies
CSBWebEditor Steve PSeptember 13, 2011 at 8:53 AM
Great tutorials! Thanks so much.

I'm importing an Excel list of email messages.
In my association rules results I get 'show rules matching' instead of the 'conjunctiontype'. Anyone know why?

Thanks!
ReplyDelete
Replies
Neil PatelMarch 8, 2012 at 2:00 PM
I wanted to get some more information on how to "read" or analyze the data. I guess I could create a fake shopping list and really understand it better but i wanted to get your thoughts on how to define terms like "premise" and "conclusion". I know what support and confidence mean but say we had the word pairs.
{milk, bread} - > {butter}. IS the pair (milk and bread) considered the premise here and butter the conclusion? I was looking at the following on wikipedia and wanted to guidance

http://en.wikipedia.org/wiki/Association_rule_learning#Definition
ReplyDelete
Replies

Add comment

Vancouver Data Blog by Neil McGuigan

Pages

Wednesday, November 10, 2010

Text Analytics With Rapidminer Part 3 of 6 - Association Rule Learning

8 comments:

Archive