prev next front |1 |2 |3 |4 |5 |6 |7 |8 |9 |10 |11 |12 |13 |14 |15 |16 |17 |18 |19 |20 |21 |22 |23 |24 |25 |26 |27 |28 |29 |30 |31 |32 |33 |review
A variable is a factor to R if it is a categorical variable. For example, let’s say we have a variable for occupation like:

occup={"doctor","engineer","software programmer"} – thus, the variable occup has three categories. R knows it as a factor. When used in a regression model, R will make dummy variables out of it automatically, assigning first level as reference (in this instance "doctor"). Sometimes, however, a categorical variable may contain numerical values. For example, let’s say grades for a student carry numbers like 1,2,3, and so on. We can represent grades as grades={1,2,3,4}. Here, "grades" is a categorical variable with levels 1,2,3, and 4. Unless specifically stated as factor(grades), R will treat grades as a numerical continuous variable in the context of a regression equation. This distinction between factors and regressors (continuous variables that can be put in a linear regression equation) needs to be emphasized. Also, R creates dummy variables automatically; you need not specify them in the equation.