Chapter 6 moves on from discrete distributions to the normal distribution,
#a continuous distribution that will be the basis of most statistics in the book.
Section 6.1 Frequency Distributions of Continuous VariablesA brief discussion of the section header's topic, nothing to code here.
Section 6.2 Properties of the Normal Distribution
#Let's check where it is.
#We can draw the normal distribution in a custom function as a plot using curve()
#and equation 6.1 on page 95.
-4, 4, #go from -4 to +4 standard deviations.
add = FALSE,
type = "l")
#You can get this automatically in R with dnorm.
curve(dnorm(x, mean=0, sd=1),
#I added a white dashed line over the original black line so you can see that they follow the same path.
#You can draw any of the distributions in Figure 6.2 using these functions
#(either the R one or the one we just made).
#Figure 6.3 shows the cumulative normal distribution function.
#Let's do that and then plot the normal probability density function on it.
#Cumulative first since that plot will go from 0 to 1.
curve(pnorm(x, mean=0, sd=1),
#Then add the normal probability density function (like the one we made earlier, just with the default line color).
curve(dnorm(x, mean=0, sd=1),
#To get out some of the values shown in page 96, we use pnorm().
pnorm(0, mean=0, sd=1)
#Just adding one value gives you a point estimate.
#50% will be found up to zero (the mean).
#The other values shown in Figure 6.3
#0 and 1 are the default values for mean and standard deviation, so you can leave them out if you want in this case.
#If you add more than one, it will tell you the values.
#Use diff as suggested here to get the area under the curve.
#This works because it uses the cumulative function to calculate it.
#If you go from -1 to 1 you can add them up to get the 68.27% shown under Figure 6.2 and on page 96.
#To go the other way, i.e. to see where 50% of the items fall, you have to use qnorm.
#Let's see what qnorm looks like.
#It is essentially tipping Figure 6.3 on its side.
#The frequency is now x on the x-axis, and the standard deviations are the y axis.
#You can get the point values in Figure 6.3 this way:
#The last two are close with rounding since those values in Figure 6.3 were rounded.
#How to get the values on page 96, of where 50%, 95%, etc of items are found?
#The values that you input are not the percent values given on page 96.
#You put them in proportions (the quantile you want to get) PLUS
#half of the remaining value.
#If you put in 0.95, it will go from 0 to 0.95. That isn't centered at the mean of 0,
#where the 50% quantile is. There is 0.05 left at the end. So we divide it by two
#to put it on either end.
#You would get the negative number for the standard deviation if you put in the opposite value.
#This also explains it nicely down where it starts talking about qnorm.
#On pg. 98, they show calculation of standard deviates.
#This is described in such a way that they seem to be z scores although they are never named as such in the book.
#You can easily calculate this manually.
#You can also use the scale() function.
#help notes that scale=TRUE divides by the standard deviation when
#center=TRUE, and center=TRUE substracts each number by the column mean.
#This is the same thing we did in the z-score function.
#Table 6.1 has expected frequencies for the normal distribution in column 2 for a sample of 1000 individuals.
#We can generate this with pnorm() and thinking about what the class marks mean.
#Because the class marks are separated by 0.5, we need to go 0.5 around each class mark and start at -5.25.
#this takes the last entry off because we only need the lower bound for each class,
#and adds 0.25 to get the class mark.
#Then you can have a look at the table and see that it gives the same results as column 2 in Table 6.1.