We are going to do some cleanup this week and learn a bunch of commands that you are coing to come across later in your classes that might be useful.

What is the difference between pmin(), min() and which.min(). First, work with a friend to figure this out not using the help file. Then go to the help file. This is useful to do when the help file is unclear. You can use the data matrix below to try to figure out what the differences are.

set.seed(20150508)
minMatrix <- matrix(data = sample(seq(1, 6)), 
                    nrow = 3, 
                    ncol = 2
                    )

What is the difference between cumsum() and sum() and prod and cumprod(). Add four colums to your minMatrix and figure this out. Which one would matter if the data is order specific?

Speaking of order, what is the difference between order() and sort()? When would you want to use one versus the other? Hint: this is relates to min and which.min(). You could use the code below to help you sort that out (terrible pun).

order(minMatrix[1, ])
## [1] 2 1
minMatrix2 <- minMatrix[order(minMatrix[1, ]), ]

R, like Wolfram Alpha, can also do closed form analytical calclus for us. However, more importantly, it can do approximations, which we will use all of the time in maximum likelihood analysis. To prepare you for these classes let’s look a little bit about how this works.

Let start with a very easy example of calculus and build our way up. \(y = x + x^2 + x^3\). First, let’s visualize this!

polynomial3 <- function(x){
  x + x^2 + x^3
  }

curve(expr = polynomial3) #wants either a function or just the actual right hand side terms
curve(expr = x + x^2 + x^3, from = -100, to = 100) 

#We can add another curve
curve(expr = -x^2, from = -100, to = 100, add = TRUE, col ="blue") #why does this look flat
curve(expr = -x^2, from = -10, to = 10, add = FALSE, col ="blue")

We can also do analytical derivation using symbols. We don’t have enough time here to go into the details of how this work in R, but the quote() command tells R that this is an expression to evaluate.

D(quote(x^2), "x")
## 2 * x
D(quote(x^2), "y")
## [1] 0
D(quote(x + x^2 + x^3), "x")
## 1 + 2 * x + 3 * x^2

What is the derivative when we use “y”? This answer suggests we can do a partial derivative.

D(quote(x^2 + y^3), "y")
## 3 * y^2

But generally when we are using R, we don’t want analytical solutions, we want numerical solutions. That’s generally how R works. If you want to do analytical integration, look at ryacas package.

integrate(polynomial3, -100, 100)
## 666666.7 with absolute error < 5.6e-07
integrate(polynomial3, -1000000, 100000)
## -2.499747e+23 with absolute error < 2.8e+09
#integrate(polynomial3, -Inf, Inf)

Generally in statistics you are dring to find a minimum or a maximum of a likelihood function

optimize(f = function(x)-(x^2),interval = c(-100, 100), maximum = TRUE)
## $maximum
## [1] -3.552714e-15
## 
## $objective
## [1] -1.262177e-29
optimize(f = function(x)-(x^2),interval = c(-100, 100), maximum = FALSE)
## $minimum
## [1] -99.99993
## 
## $objective
## [1] -9999.987
optimize(f = function(x)-(x^2) + 5,interval = c(-100, 100), maximum = TRUE)
## $maximum
## [1] -3.552714e-15
## 
## $objective
## [1] 5

Now, let’s try to minimze the sum of least squares. First, let’s generate some data with known properties.

x <- seq(0, 100, .5)
y <- (2 * x + rnorm(length(x), mean = 0, sd  = 2)) + 5

Just for fun, let’s visualize what happens as standard deviations increase around our line.

for(dev in seq(1 ,40, 5)) {
  yf <- (2 * x + rnorm(length(x), mean = 0, sd  = dev)) + 5
  plot(x, yf, cex = .5)
  cat ("Press [enter] to continue")
  line <- readline()
}
plot(x, y, cex =.5)
plot(x, y, xlim = c(0, 5), ylim = c(0, 10))

Now, let’s just verify with lm() that this works, for fun!

lmWorks <- lm(y ~ x)
library(texreg)
screenreg(lmWorks)
## 
## =======================
##              Model 1   
## -----------------------
## (Intercept)    5.31 ***
##               (0.28)   
## x              1.99 ***
##               (0.00)   
## -----------------------
## R^2            1.00    
## Adj. R^2       1.00    
## Num. obs.    201       
## =======================
## *** p < 0.001, ** p < 0.01, * p < 0.05
#abline(y ~ x)

Okay, but let’s say we don’t know how to minimize the sum of the squared errors. Let’s write a function that we can optimize to do that. We now have more than one parameter, so we are going to use the optim() function, which does multivariate optimization.

So, we have \(y = \beta_0 + \beta_1X_1\). And we want to minimize the sum of the squared errors, with two parameters – an intercept and a coefficient. I have added more parentheses to make this clear. \[((\beta_0 + \beta_1X_1) - y)^2\]

The function MinimizeSS puts that equation into R speak.

MinimizeSS <- function(betas, x, y) {
  sum( ( (betas[1] + betas[2] * x) - y)^2 )
  }

Now, we need to use the function optim, to optimize, since we have more than one paramenter. We need to give optim starting values, and anything else that MinimizeSS needs. The way optim() is designed, the first argument of MinimizeSS gets passed into the first argument of MinimizeSS. These are the starting guess for the optimization algorithm.

optim(par = c(0, 0), fn = MinimizeSS, x = x, y = y)
## $par
## [1] 5.324249 1.993485
## 
## $value
## [1] 809.1988
## 
## $counts
## function gradient 
##       93       NA 
## 
## $convergence
## [1] 0
## 
## $message
## NULL

We can see how robust optim() is to where the starting values are.

optim(par = c(-2000, -2000), fn = MinimizeSS, x = x, y = y)
## $par
## [1] 5.75759 2.01179
## 
## $value
## [1] 1233.779
## 
## $counts
## function gradient 
##       77       NA 
## 
## $convergence
## [1] 0
## 
## $message
## NULL

Alright!