Randy Zwitch has a blog entry on creation of dummy variables from factor levels.
Equation formulas are generated with the paste function:
paste("somevar ~",paste(names(dataframe),sep="",collapse="+"))
example <- span=""> as.data.frame(c("A", "A", "B", "F", "C", "G", "C", "D", "E", "F")) names(example) <- span=""> "strcol" #For every unique value in the string column, create a new 1/0 column #This is what Factors do "under-the-hood" automatically when passed to function requiring numeric data for(level in unique(example$strcol)){ example[paste("dummy", level, sep = "_")] <- span=""> ifelse(example$strcol == level, 1, 0) } view raw->->->Often you encounter special characters in which case you can use gsub and regular expressions
example <- span=""> as.data.frame(c("AÆ", "AÆ", "B", "FÅ", "C", "G", "C", "D", "E", "FÅ")) names(example) <- span=""> "strcol" #For every unique value in the string column, create a new 1/0 column #This is what Factors do "under-the-hood" automatically when passed to function requiring numeric data for(level in unique(example$strcol)){ example[gsub('[^a-zA-Z0-9_],"",paste("dummy", level, sep = "_"),fixed=FALSE)] <- span=""> ifelse(example$strcol == level, 1, 0) } ->->->You may also use levels instead of unique in conjunction with subsetting, e.g. levels(example$strcol)[-1] to create dummy variables mapping reference level to baseline/intercept in your regression model.
Equation formulas are generated with the paste function:
paste("somevar ~",paste(names(dataframe),sep="",collapse="+"))
Comments