Monday, January 26, 2015

Beginning R: operators, columns, rows, using help, and installing packages

I'm co-teaching quantitative methods with Dr. Nicola Koper this semester at the University of Manitoba. Occasionally I'll post some of my sample code here from the labs, as the instructions may be of use to others outside our class as well. This lesson just shows some basic operations you can perform in R (the package installation instructions assume you have RStudio as well and are running R from inside that program).

 #Some basic arithmetic operators.
1+1
1/2
2*4
8-4

#Save the values as objects with whatever name you like.
(object<-1+1)
(another.object<-object+3)
#Any line preceded by a pound sign is a comment and will not run.
#Use parentheses to print your object's result.

#Create lists of items.
simple.list<-c(1,5,2,4,3)
word.list<-c("one","five","two","four","three")
short.list<-c(1,2,3,4)

#Combine these into a data frame.
simple.data<-data.frame(simple.list,word.list)

#View a column (variable) by name or by number in three ways.
simple.data$simple.list
simple.data[1]
simple.data[,1]
#View a row (observation) by number.
simple.data[1,]
#View a row by its values.
simple.data[simple.list>=2,] #select all rows where simple.list is equal to or greater than 2.
simple.data[simple.list==2,] #all rows where simple.list equals 2.
simple.data[simple.list!=2,] #all rows where simple.list is not 2.
#How do we find more operations?
??operators
#This brings up R help.  Let's try the various operators.  It turns out we might want relational
#and logical operators.
simple.data[simple.list==2|simple.list!=3,] #all rows where simple.list equals two OR does not equal three.

#Create a new, empty column.
simple.data$new.column<-NA

str(simple.data)
#What is str exactly?
?str

#word.list is a factor (categories), while simple.list is num (numeric).
#new.column is logi (logical) with NA.

#View the names of the variables.
names(simple.data) #names of variables

simple.data$new.column<-short.list
#Get an error: Error in `$<-.data.frame`(`*tmp*`, "new.column", value = c(1, 2, 3, 4)) : replacement has 4 rows, data has 5
#The lengths must match.

length(simple.data$new.column) #has five items
length(short.list) #has four items
length(simple.data) #counts its variables (three)

#We added a column.  How do we delete a column?
#First, we could just create a new data frame and leave the old one there.
simple.data.new<-data.frame(simple.data$simple.list,
                            simple.data$word.list)
#Hmm, that works, but gives us ugly variable names that combine the previous dataframe name too.
simple.data.new<-data.frame("simple.list"=simple.data$simple.list,
                             "word.list"=simple.data$word.list)
#Much better!

#Or, just delete it if you're really sure.
#I didn't remember how to do this at first, so I googled it (or search engine of your choice).
#"how to delete column in R".
#and get as the first result several stackoverflow questions to this effect.
#Things to look for in answers: potential new search terms (you'll get an idea of
#what else you can look for if these answers don't help you.)
#For example, here you'll note that deleting a column is also phrased as
#dropping (as in "how do I drop a column?")
#Another website lists everything from making a new frame to the subset function
#and using "NULL".  It also has info on excluding rows (observations), which we discussed
#a bit with operators above.
#http://www.statmethods.net/management/subset.html
#I list this as an example of how to find new and more efficient ways to do these things,
#not that this one website has all the answers.
#Let's try the null method.  We'll just delete the column because we never actually put
#any data there and it's annoying to have it in the data frame.
simple.data$new.column<-NULL
str(simple.data) #back down to two variables!
#str() works for other values/objects too.
str(another.object)

#(You may recall we used NA earlier.  It's slightly different from NULL.
#If you are interested in more details about the distinction you can check out this link:
#http://stackoverflow.com/questions/15496361/what-is-the-difference-between-nan-and-inf-and-null-and-na-in-r )

#How do we cite using R in a publication or paper?
??citation

#The last entry in the search is utils::CITATION.
#Apparently we need
citation()

#How to install a new package.
#One we'll use later is lme4.
#Go to the packages tab.  Click "Install packages".  Install from the CRAN repository.
#Type in the name of the package you want.
#Install to the default location so R can always find the package.
#Make sure "Install dependencies" is checked so the package can work correctly.
#Once the package is installed, you still need to load it for each session to use its functions.

library(lme4)
#You can also click the package in the packages list.  Various red text appears in the R console
#about the packages that it loads (it uses these in its running) and for lme4, that it was built under
#a previous version of R.  (We'll ignore that in this case.)

No comments:

Post a Comment

Comments and suggestions welcome.