We are going to explore some more features of datsets, namely the factor data type. Factors can be very problematic, so we need to understand them well. Many functions may or may want you to enter variables as factors. Let's use this homework as an opportunity to explore this in more detail. You can also look at section 6.1 of Matloff. The goal here is to help you better deal with problems when you encounter them.

R comes with a bunch of datasets pre-loaded. Also many packages also come with pre-loaded datasets. Type data() to see what is loaded. Spend a minute here thinking about how you could use these packages to help yourself learn R. Then load the car package and load the Greene dataset.

  1. Use the commands you know to understand the data. Without knowing the dataset, give me as much information as possible. What are the factors doing? What does the data type do? Why might this be an advantage or disadvantage from a programming perspective?
  2. Create two new variables called continent and continentF, one which is a factor and one which is of class character. Come up with a strategy to recode nation into the two new continent variables. Be explicit about the levels of the factor and why you chose them. How would these levels make a difference if you added further data?
  3. Let's say you made an error and the first observation is actually from the United States. Try to change the values of this variable. Explain what happens. If you are getting error messages, come up with a strategy to resolve the problem.
  4. Currently decision is a factor with two levels. Convert this into a dummy variable that is numeric with 0s and 1s. Why might you want to do this?
  5. Contingency tables are often important to look at. Using the with function and table functions explore several contingency tables. Write up your results. If you want to explore a fancier version of crosstables, explore the CrossTable function in the Descr package.
  6. (week 2 problem) Combine your skills to, in one line of code, write a command to create a table for all character data in the dataset. That is one line of code should yield you six tables of all the non-numeric variables. Hint: think about apply family of functions.