I have a dataset that includes variables about customer income levels. The income was collected in binned fashion (Which range describes your income? 0-25k, 25k-50k,...
). My question is how best use this for modeling using glmnet
and gbm
packages in R
.
I have looked at the grouped
packaged in R
but it seems to do everything (coursening and regression) for you. Is there a package that converts binned data back to continuous data for use in with other algos?
EDIT: The current method I'm using is to convert them to the mid-point of the range (0-25k -> 12500
), then using an ifelse()
stmt to code a few variables to convey the fact that there is a relationship between the levels.
incOver25k <- ifelse(df1$income >= 25000,1,0)
incOver50k <- ifelse(df1$income >= 50000,1,0)
Then use these flags instead of using model.matrix()
.
Was curious if there were any better ideas.