Algorithm for generating classification rules

Question

So we have potential for a machine learning application that fits fairly neatly into the traditional problem domain solved by classifiers, i.e., we have a set of attributes describing an item and a "bucket" that they end up in. However, rather than create models of probabilities like in Naive Bayes or similar classifiers, we want our output to be a set of roughly human-readable rules that can be reviewed and modified by an end user.

Association rule learning looks like the family of algorithms that solves this type of problem, but these algorithms seem to focus on identifying common combinations of features and don't include the concept of a final bucket that those features might point to. For example, our data set looks something like this:

Item A { 4-door, small, steel } => { sedan }
Item B { 2-door, big,   steel } => { truck }
Item C { 2-door, small, steel } => { coupe }

I just want the rules that say "if it's big and a 2-door, it's a truck," not the rules that say "if it's a 4-door it's also small."

One workaround I can think of is to simply use association rule learning algorithms and ignore the rules that don't involve an end bucket, but that seems a bit hacky. Have I missed some family of algorithms out there? Or perhaps I'm approaching the problem incorrectly to begin with?

rapaio · Accepted Answer · 2014-05-22 21:59:22Z

up vote 6 down vote accepted

C45 made by Quinlan is able to produce rule for prediction. Check this Wikipedia page. I know that in Weka its name is J48. I have no idea which are implementations in R or Python. Anyway, from this kind of decision tree you should be able to infer rules for prediction.

Later edit

Also you might be interested in algorithms for directly inferring rules for classification. RIPPER is one, which again in Weka it received a different name JRip. See the original paper for RIPPER: Fast Effective Rule Induction, W.W. Cohen 1995

edited May 22 at 21:59

answered May 22 at 21:54

rapaio
751113

I had experimented with C45/J48 in a previous project. I did not realize there were rules I could retrieve from it. I'll also check out RIPPER. Thanks! – Smashd May 27 at 13:53

Also check out the C50 package in R. – nfmcclure Jun 6 at 14:56

add comment

Therriault · Answer 2 · 2014-05-29 14:30:21Z

It's actually even simpler than that, from what you describe---you're just looking for a basic classification tree algorithm (so no need for slightly more complex variants like C4.5 which are optimized for prediction accuracy). The canonical text is:

http://www.amazon.com/Classification-Regression-Wadsworth-Statistics-Probability/dp/0412048418

This is readily implemented in R:

http://cran.r-project.org/web/packages/tree/tree.pdf

and Python:

http://scikit-learn.org/stable/modules/tree.html

Ger · Answer 3 · 2014-06-10 09:21:14Z

up vote 1 down vote

You could take a look at CN2 rule learner in Orange http://orange.biolab.si/docs/latest/widgets/rst/classify/cn2/

answered Jun 10 at 9:21

Ger
111

add comment

asked	1 month ago
viewed	121 times
active	13 days ago

current community

your communities

more stack exchange communities

Algorithm for generating classification rules

3 Answers

Your Answer

Not the answer you're looking for? Browse other questions tagged machine-learning classification or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Algorithm for generating classification rules

3 Answers

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged machine-learning classification or ask your own question.

Related

Hot Network Questions