Creating dummy variables in R

Randy Zwitch has a blog entry on creation of dummy variables from factor levels.

example <- span=""> as.data.frame(c("A", "A", "B", "F", "C", "G", "C", "D", "E", "F"))
names(example) <- span=""> "strcol"

#For every unique value in the string column, create a new 1/0 column
#This is what Factors do "under-the-hood" automatically when passed to function requiring numeric data
for(level in unique(example$strcol)){
  example[paste("dummy", level, sep = "_")] <- span=""> ifelse(example$strcol == level, 1, 0)
}
view raw
Often you encounter special characters in which case you can use gsub and regular expressions
example <- span=""> as.data.frame(c("AÆ", "AÆ", "B", "FÅ", "C", "G", "C", "D", "E", "FÅ"))
names(example) <- span=""> "strcol"

#For every unique value in the string column, create a new 1/0 column
#This is what Factors do "under-the-hood" automatically when passed to function requiring numeric data
for(level in unique(example$strcol)){
  example[gsub('[^a-zA-Z0-9_],"",paste("dummy", level, sep = "_"),fixed=FALSE)] <- span=""> ifelse(example$strcol == level, 1, 0)
} 
You may also use levels instead of unique in conjunction with subsetting, e.g. levels(example$strcol)[-1] to create dummy variables mapping reference level to baseline/intercept in your regression model.
Equation formulas are generated with the paste function:
paste("somevar ~",paste(names(dataframe),sep="",collapse="+"))

Comments

Popular Posts