The Porter Stemmer Algorithm is an algorithmn which was created to by Martin Porter to reduce english words to their root word stems. For example, the word “forms” would reduce to “form” and the word “connections” would reduce to “connect”. The details of the algorithmn can be found here
Typically, you would need this functionality when you want to create your own search engine so that you can index your content against what the user is searching for more effectively.
I came across a well written class in PHP on Jon Abernathy’s site and decided to port it to C# for a project I’m working on.
I won’t get into the depths of the code, but the basic documentation is this…
Method stem
Description: stems a single string to it’s root stem
Parameters: string
Returns: string
Method stem_list
Description: takes a comman, semi-comma, or space sperated string and returns an array list of the stemmed words
Parameters: string
Returns: ArrayList
You can download the class here: Stemmer C# Class
There’s wrong implementation of some parts of the original porter stemmer algorithm. Actually, I have tried the PHP implementation by Jon and it seems that it has the same error.
Try to stem the words “building”, “normalizing”…etc!!!