Pages

Tuesday, December 14, 2010

Custom stemming dictionary

You can create your own stemming dictionary in RapidMiner.

Add the Text Processing -> Stemming -> Stem (Dictionary) operator, and choose your dictionary file (plain text).

Your format should be like this:

stem:inflection
stem:inflection


example:

fish:fished

will turn fished into fish.

You can also use wildcards:

fish:fish.*

will turn fished, fishes, fishing or anything beginning with fish into fish.

You should put longer versions of similar words at the top. For example, to stem these words correctly:

computer, computerise, computerize, computerized, computerised, computers, compute, computed, computes

You should use

computer:computer.*
compute:compute.*


and not

compute:compute.*
computer:computer.*


assuming computer and compute are not the same stem.

No comments:

Post a Comment