## Tuesday, November 15, 2016

### Using R to work through Sokal and Rohlf's Biometry: Chapter 5 (Descriptive Statistics), section 5.3

Previously in this series: Chapter 5 (Introduction to Probability Distributions: Binomial and Poisson), sections 5.1-5.2.

### Section 5.3: the Poisson distribution

?Distributions #Let's look in help again to find the Poisson distribution.
?dpois #We can see that R does similar calculations for Poisson in addition to the binomial distribution.

#To get column 3 (expected absolute frequencies) in table 5.5,
#there is the recursion formula given in equations 5.11 and 5.12.
#5.12 is for calculations with sample means to get the relative expected frequency,
#and in the text below it notes what term to add to get the absolute ones given in column 3 of table 5.5

abs.expected.freq.pois<-400*dpois(c(0:9),
lambda=1.8,
log=FALSE)

#I also want to do equation 5.12 manually for relative expected frequencies.
#This is not normally needed because R does it so nicely with the base stats function dpois().

#http://stackoverflow.com/questions/5273857/are-recursive-functions-used-in-r
#this one has a simple one for factorials that makes it clearest to me:
#http://www.programiz.com/r-programming/recursion

rel.exp.freq.pois<-function(samplemean, i) {
if (i==0) return (exp(-samplemean))
else      return (rel.exp.freq.pois(samplemean,
i-1)*(samplemean/i))
}

#To get absolute frequencies, multiply by 400.
#For one example:
400*rel.exp.freq.pois(1.8, 1)

#To get all the frequencies at once, use lapply.
rel.freq.pois<-unlist(lapply(c(0:9),
rel.exp.freq.pois,
samplemean=1.8))
(abs.freq.pois<-400*rel.freq.pois)

#On page 82, they show how to calculate the coefficient of dispersion.  Here is a function that will do it.
coefficient.of.dispersion<-function(data) {
var(data)/mean(data)
}
#You can input any set of data here and get the coefficient of dispersion.

#Figure 5.3 shows Poisson distributions with different means.
#We can make this with the dpois() function generating y data.
#Oddly, to add the extra lines, it is easiest to use points() with
#type="l" (for lines).
plot(dpois(c(0:18),
lambda=0.1,
log=FALSE),
type="l",
ylab="Relative expected frequency",
xlab="Number of rare events per sample")
points(dpois(c(0:18),
lambda=1,
log=FALSE),
type="l")
points(dpois(c(0:18),
lambda=2,
log=FALSE),
type="l")
points(dpois(c(0:18),
lambda=3,
log=FALSE),
type="l")
points(dpois(c(0:18),
lambda=10,
log=FALSE),
type="l")

#The remaining tables and examples in this section do not add anything new to code.