An afternoon thunderstorm rolling in near Brooks, Newell County, Alberta. 19 June 2014. |
Wednesday, January 28, 2015
Monday, January 26, 2015
Beginning R: operators, columns, rows, using help, and installing packages
I'm co-teaching quantitative methods with Dr. Nicola Koper this semester at the University of Manitoba. Occasionally I'll post some of my sample code here from the labs, as the instructions may be of use to others outside our class as well. This lesson just shows some basic operations you can perform in R (the package installation instructions assume you have RStudio as well and are running R from inside that program).
#Some basic arithmetic operators.
1+1
1/2
2*4
8-4
#Save the values as objects with whatever name you like.
(object<-1+1)
(another.object<-object+3)
#Any line preceded by a pound sign is a comment and will not run.
#Use parentheses to print your object's result.
#Create lists of items.
simple.list<-c(1,5,2,4,3)
word.list<-c("one","five","two","four","three")
short.list<-c(1,2,3,4)
#Combine these into a data frame.
simple.data<-data.frame(simple.list,word.list)
#View a column (variable) by name or by number in three ways.
simple.data$simple.list
simple.data[1]
simple.data[,1]
#View a row (observation) by number.
simple.data[1,]
#View a row by its values.
simple.data[simple.list>=2,] #select all rows where simple.list is equal to or greater than 2.
simple.data[simple.list==2,] #all rows where simple.list equals 2.
simple.data[simple.list!=2,] #all rows where simple.list is not 2.
#How do we find more operations?
??operators
#This brings up R help. Let's try the various operators. It turns out we might want relational
#and logical operators.
simple.data[simple.list==2|simple.list!=3,] #all rows where simple.list equals two OR does not equal three.
#Create a new, empty column.
simple.data$new.column<-NA
str(simple.data)
#What is str exactly?
?str
#word.list is a factor (categories), while simple.list is num (numeric).
#new.column is logi (logical) with NA.
#View the names of the variables.
names(simple.data) #names of variables
simple.data$new.column<-short.list
#Get an error: Error in `$<-.data.frame`(`*tmp*`, "new.column", value = c(1, 2, 3, 4)) : replacement has 4 rows, data has 5
#The lengths must match.
length(simple.data$new.column) #has five items
length(short.list) #has four items
length(simple.data) #counts its variables (three)
#We added a column. How do we delete a column?
#First, we could just create a new data frame and leave the old one there.
simple.data.new<-data.frame(simple.data$simple.list,
simple.data$word.list)
#Hmm, that works, but gives us ugly variable names that combine the previous dataframe name too.
simple.data.new<-data.frame("simple.list"=simple.data$simple.list,
"word.list"=simple.data$word.list)
#Much better!
#Or, just delete it if you're really sure.
#I didn't remember how to do this at first, so I googled it (or search engine of your choice).
#"how to delete column in R".
#and get as the first result several stackoverflow questions to this effect.
#Things to look for in answers: potential new search terms (you'll get an idea of
#what else you can look for if these answers don't help you.)
#For example, here you'll note that deleting a column is also phrased as
#dropping (as in "how do I drop a column?")
#Another website lists everything from making a new frame to the subset function
#and using "NULL". It also has info on excluding rows (observations), which we discussed
#a bit with operators above.
#http://www.statmethods.net/management/subset.html
#I list this as an example of how to find new and more efficient ways to do these things,
#not that this one website has all the answers.
#Let's try the null method. We'll just delete the column because we never actually put
#any data there and it's annoying to have it in the data frame.
simple.data$new.column<-NULL
str(simple.data) #back down to two variables!
#str() works for other values/objects too.
str(another.object)
#(You may recall we used NA earlier. It's slightly different from NULL.
#If you are interested in more details about the distinction you can check out this link:
#http://stackoverflow.com/questions/15496361/what-is-the-difference-between-nan-and-inf-and-null-and-na-in-r )
#How do we cite using R in a publication or paper?
??citation
#The last entry in the search is utils::CITATION.
#Apparently we need
citation()
#How to install a new package.
#One we'll use later is lme4.
#Go to the packages tab. Click "Install packages". Install from the CRAN repository.
#Type in the name of the package you want.
#Install to the default location so R can always find the package.
#Make sure "Install dependencies" is checked so the package can work correctly.
#Once the package is installed, you still need to load it for each session to use its functions.
library(lme4)
#You can also click the package in the packages list. Various red text appears in the R console
#about the packages that it loads (it uses these in its running) and for lme4, that it was built under
#a previous version of R. (We'll ignore that in this case.)
#Some basic arithmetic operators.
1+1
1/2
2*4
8-4
#Save the values as objects with whatever name you like.
(object<-1+1)
(another.object<-object+3)
#Any line preceded by a pound sign is a comment and will not run.
#Use parentheses to print your object's result.
#Create lists of items.
simple.list<-c(1,5,2,4,3)
word.list<-c("one","five","two","four","three")
short.list<-c(1,2,3,4)
#Combine these into a data frame.
simple.data<-data.frame(simple.list,word.list)
#View a column (variable) by name or by number in three ways.
simple.data$simple.list
simple.data[1]
simple.data[,1]
#View a row (observation) by number.
simple.data[1,]
#View a row by its values.
simple.data[simple.list>=2,] #select all rows where simple.list is equal to or greater than 2.
simple.data[simple.list==2,] #all rows where simple.list equals 2.
simple.data[simple.list!=2,] #all rows where simple.list is not 2.
#How do we find more operations?
??operators
#This brings up R help. Let's try the various operators. It turns out we might want relational
#and logical operators.
simple.data[simple.list==2|simple.list!=3,] #all rows where simple.list equals two OR does not equal three.
#Create a new, empty column.
simple.data$new.column<-NA
str(simple.data)
#What is str exactly?
?str
#word.list is a factor (categories), while simple.list is num (numeric).
#new.column is logi (logical) with NA.
#View the names of the variables.
names(simple.data) #names of variables
simple.data$new.column<-short.list
#Get an error: Error in `$<-.data.frame`(`*tmp*`, "new.column", value = c(1, 2, 3, 4)) : replacement has 4 rows, data has 5
#The lengths must match.
length(simple.data$new.column) #has five items
length(short.list) #has four items
length(simple.data) #counts its variables (three)
#We added a column. How do we delete a column?
#First, we could just create a new data frame and leave the old one there.
simple.data.new<-data.frame(simple.data$simple.list,
simple.data$word.list)
#Hmm, that works, but gives us ugly variable names that combine the previous dataframe name too.
simple.data.new<-data.frame("simple.list"=simple.data$simple.list,
"word.list"=simple.data$word.list)
#Much better!
#Or, just delete it if you're really sure.
#I didn't remember how to do this at first, so I googled it (or search engine of your choice).
#"how to delete column in R".
#and get as the first result several stackoverflow questions to this effect.
#Things to look for in answers: potential new search terms (you'll get an idea of
#what else you can look for if these answers don't help you.)
#For example, here you'll note that deleting a column is also phrased as
#dropping (as in "how do I drop a column?")
#Another website lists everything from making a new frame to the subset function
#and using "NULL". It also has info on excluding rows (observations), which we discussed
#a bit with operators above.
#http://www.statmethods.net/management/subset.html
#I list this as an example of how to find new and more efficient ways to do these things,
#not that this one website has all the answers.
#Let's try the null method. We'll just delete the column because we never actually put
#any data there and it's annoying to have it in the data frame.
simple.data$new.column<-NULL
str(simple.data) #back down to two variables!
#str() works for other values/objects too.
str(another.object)
#(You may recall we used NA earlier. It's slightly different from NULL.
#If you are interested in more details about the distinction you can check out this link:
#http://stackoverflow.com/questions/15496361/what-is-the-difference-between-nan-and-inf-and-null-and-na-in-r )
#How do we cite using R in a publication or paper?
??citation
#The last entry in the search is utils::CITATION.
#Apparently we need
citation()
#How to install a new package.
#One we'll use later is lme4.
#Go to the packages tab. Click "Install packages". Install from the CRAN repository.
#Type in the name of the package you want.
#Install to the default location so R can always find the package.
#Make sure "Install dependencies" is checked so the package can work correctly.
#Once the package is installed, you still need to load it for each session to use its functions.
library(lme4)
#You can also click the package in the packages list. Various red text appears in the R console
#about the packages that it loads (it uses these in its running) and for lme4, that it was built under
#a previous version of R. (We'll ignore that in this case.)
Friday, January 23, 2015
A cold day at home
Not as cold as Winnipeg! Note the fine green and brown colors, the lack of snow, and the post oak leaves.
The dog is still for scale. Lyndon B. Johnson National Grasslands, Wise County, Texas. 28 December 2014. |
Wednesday, January 21, 2015
A dragonfly on a cold day
Not right now in Winnipeg, of course. This is a "summer" odonate from Brooks, Alberta. It was very cloudy and chilly that day and the critter was clinging to the wall outside the apartment. I believe it is a Paddle-tailed Darner (Aeshna palmata) but I wouldn't swear to it, and welcome corrections.
Paddle-tailed Darner, Brooks, Alberta. 29 June 2014. |
Paddle-tailed Darner, Brooks, Alberta. 29 June 2014. |
Monday, January 19, 2015
The mysteries of postscript files
Last week I talked about saving figures from R. I mentioned .eps files, a type of vector image that is sometimes requested as a format for figures in submitting manuscripts for publication. Even without having ghostscript installed on a Windows machine, R will correctly save the image (at least it did on mine). However, you can't view it. Additionally, if you make an svg file and try to save it as a .eps file from Inkscape, it doesn't save the file correctly. (My test file left off the axis labels, for example.) So, here's how to view, and save (or print to) .eps files on a Windows 7 computer (I assume similar steps will work for other Windows computers).
If you need to add anything to your R plot, then you have two options.
(1) If you have Adobe Illustrator, save it as .eps, then open in Illustrator and edit it to your liking.
(2) Don't have Illustrator (like me)? Save as a .svg from R, edit in Inkscape, then save as .eps (sometimes this works, sometimes it cuts off pieces of the figure and I haven't figured out why). If the .eps from Inkscape is not good, save your .svg file instead as a .pdf. Then open the .pdf in Adobe Acrobat Pro (just the reader won't work) and save as .eps. Or, if you don't have Adobe Acrobat Pro, print as .eps from your pdf reader. A few more steps, but still doable with all free programs (Ghostscript, GSView, Inkscape, and your pdf reader of choice) and pretty easy.
Viewing .eps files (works well, do this!)
First, I downloaded GPL Ghostscript. Choose the appropriate type (32-bit or 64-bit). You can find out which you have by right-clicking on "Computer" in the start menu and selecting "Properties". This detail is under the "System" section as "System type". I installed the program and then got GSView and installed it. I tried opening a .eps file. It worked!Inkscape and .eps files (sometimes it works, sometimes it doesn't, so skip this step if you want)
Now, Inkscape. I've always had mixed success saving files from there to .eps files. Theoretically, this should work. Sometimes it cuts off bits of the graph though. If you want to try it, here's an example. Just be warned I've never gotten it to completely work! Open a .svg file in Inkscape (perhaps the one created last week). I saved again, and it still cuts off the axis labels and axis numbering. I need to link up the two programs according to the Inkscape forum. In the Start Menu, click on "Computer". In that window near the "System Properties". Go to the "Advanced" tab and click on the "Environment Variables" button. There are two lists of variables. The top one is "User variables for [yourusername]". Click "New" and name the new variable "GS_PROG" with value: "C:\Program Files\gs\gs9.15\bin\gswin32.exe"; navigate there to confirm it's where your file lives (or gswin64.exe if you have 64-bit windows). Under "System variables" edit "PATH". There are already a bunch of path values listed in there; DO NOT DELETE THEM. At the end of the current list, place a semicolon, and add: "C:\Program Files\gs\gs9.15\bin;C:\Program Files\gs\gs9.15\lib", similar to how it's described in this tutorial. Then click "OK" on both windows. Now, start Inkscape. It now recognizes .eps files as a type it can import. However, I've never been able to get that to work (Inkscape just locks up). So, close the import window and just open one of your .svg files. You can then save it as a .pdf, open the .pdf in your usual pdf viewer, and then print it to a .eps file in the next section.Printing as .eps (works well!)
You can print to .eps files using the regular print features in whatever program. I followed these instructions (where to find "Add Printer" varies a bit depending on which version of Windows you have but it's pretty easy to find. If you are having trouble just type "Add Printer" into the Start Menu search box. Anyway, it worked beautifully. I used an existing port ("FILE: print to file") as they said to. With this .eps printer set up, I could take my .svg to .pdf in Inkscape, then print to .eps from the pdf viewer (I use Foxit Reader but this print to .eps should work in any program that prints). I can't recall if you have to already have Ghostscript installed to make this work, but you'll need to view it to confirm, so you might as well get Ghostscript and GSView installed.Summary of how I get to .eps files
Do you need to add anything to your file after using R to create your figure? No? Then use R and view them in GSView. Easy!If you need to add anything to your R plot, then you have two options.
(1) If you have Adobe Illustrator, save it as .eps, then open in Illustrator and edit it to your liking.
(2) Don't have Illustrator (like me)? Save as a .svg from R, edit in Inkscape, then save as .eps (sometimes this works, sometimes it cuts off pieces of the figure and I haven't figured out why). If the .eps from Inkscape is not good, save your .svg file instead as a .pdf. Then open the .pdf in Adobe Acrobat Pro (just the reader won't work) and save as .eps. Or, if you don't have Adobe Acrobat Pro, print as .eps from your pdf reader. A few more steps, but still doable with all free programs (Ghostscript, GSView, Inkscape, and your pdf reader of choice) and pretty easy.
Friday, January 16, 2015
Seasonally appropriate photography
Wednesday, January 14, 2015
Globemallows
I seem to have a liking for globemallows. I realized I have photos from at least three different regions now.
Here is a bud from Alberta. I'm guessing from the rangemap that it's scarlet globemallow.
Then something from Arches National Park:
And Arizona's Painted Desert region:
Here is a bud from Alberta. I'm guessing from the rangemap that it's scarlet globemallow.
Scarlet globemallow, near Brooks, Alberta. 16 July 2014. |
Globemallow sp, near Arches National Park, Utah. 21 June 2013. |
And Arizona's Painted Desert region:
Globemallow sp, Petrified Forest National Park, Arizona. 08 August 2012. |
Monday, January 12, 2015
Making repeatable figures in R
Of course when you code out a figure in R, you can run the code again and get the figure again. However, I've been just using plot() or whatever variation and then sizing the picture when I export it from RStudio. This method isn't very repeatable with regards to the size of the plot area. You can specify pixel sizes and the file name in RStudio's export window as well, which is fine just once, but annoying after finding tiny problems with the figure a dozen times in a row. Plus, hypothetically, sometimes one's paper gets rejected and one needs to re-do just one figure, but then it won't match the sizes of the remaining figures without much searching about for dimensions. Just hypothetically, of course. Anyway, with all this in mind, I put some time into learning how to use the print to device functions where width and height is specified in the code.
data<-data.frame(
"x"=c(rnorm(n=10, mean=1, sd=0.5)),
"y"=c(rnorm(n=10, mean=2, sd=1))
)
plot(data)
#The files are written to your working directory, so set that:
setwd("C:\\Users\\YourUserName\\YourFolders\\")
#Replace this with your working folder path.
#First, we'll do a vector format called eps.
my.width<-8 #eps() and svg() (another vector format, code below) use inches.
(my.height<-my.width*(174/129))
#I want my plot's height/width ratio to match these dimensions
#which were given in pixels by a journal's instructions to authors.
setEPS()
postscript(file="testing.eps",width=my.width,height=my.height)
plot(data)
dev.off()
#If you don't have Adobe Illustrator, install first GPL Ghostscript and then GSView to view this file.
#R will make the file correctly regardless (I tested this on a computer without Ghostscript
# then viewed it on another computer with Ghostscript and GSView, and it wrote the file correctly.
#I will post more information on working with .eps next week.
#I've also used another vector format called svg.
#You can use a free program called Inkscape to edit and add in additional graphics.
svg(file="testing.svg", width=my.width, height=my.height)
plot(data)
dev.off()
#png() make a raster file by default with measurements in pixels instead of inches.
png(file="testing.png", width=129, height=174)
plot(data)
graphics.off() #I'm not sure why but I needed to use this here instead of dev.off()
#When I used just dev.off() I got a sharing violation error when trying to open the png.
?png
#In help you can see that other raster formats available as well: bmp(), jpeg(), and tiff().
#The help file also says that we can use units= to change the units to inches ("in"),
#millimeters ("mm"), or centimeters ("cm") instead of the default pixels ("px").
data<-data.frame(
"x"=c(rnorm(n=10, mean=1, sd=0.5)),
"y"=c(rnorm(n=10, mean=2, sd=1))
)
plot(data)
#The files are written to your working directory, so set that:
setwd("C:\\Users\\YourUserName\\YourFolders\\")
#Replace this with your working folder path.
#First, we'll do a vector format called eps.
my.width<-8 #eps() and svg() (another vector format, code below) use inches.
(my.height<-my.width*(174/129))
#I want my plot's height/width ratio to match these dimensions
#which were given in pixels by a journal's instructions to authors.
setEPS()
postscript(file="testing.eps",width=my.width,height=my.height)
plot(data)
dev.off()
#If you don't have Adobe Illustrator, install first GPL Ghostscript and then GSView to view this file.
#R will make the file correctly regardless (I tested this on a computer without Ghostscript
# then viewed it on another computer with Ghostscript and GSView, and it wrote the file correctly.
#I will post more information on working with .eps next week.
#I've also used another vector format called svg.
#You can use a free program called Inkscape to edit and add in additional graphics.
svg(file="testing.svg", width=my.width, height=my.height)
plot(data)
dev.off()
#png() make a raster file by default with measurements in pixels instead of inches.
png(file="testing.png", width=129, height=174)
plot(data)
graphics.off() #I'm not sure why but I needed to use this here instead of dev.off()
#When I used just dev.off() I got a sharing violation error when trying to open the png.
?png
#In help you can see that other raster formats available as well: bmp(), jpeg(), and tiff().
#The help file also says that we can use units= to change the units to inches ("in"),
#millimeters ("mm"), or centimeters ("cm") instead of the default pixels ("px").
Friday, January 09, 2015
Same song, different verse
We have tons of Common Wood-Nymphs (Cercyonis pegala) at home in Texas and Oklahoma, but that didn't stop me from wondering what a bunch of fluttery, dark brown butterflies were in Alberta this summer. They look pretty different. They also were very flighty so it took me a while to get close enough for pictures. It turned out they were the same species!
I present an example of a wood-nymph from Texas:
And an Alberta wood-nymph:
I present an example of a wood-nymph from Texas:
A Common Wood-Nymph on lotebush flowers, Wilbarger County, Texas, 23 June 2005 |
And an Alberta wood-nymph:
On a thistle, near Brooks, Alberta. 28 July 2014. |
Wednesday, January 07, 2015
Changing the order of x-axis factor labels using levels
I had some problems with my factors showing up in alphabetical order instead of the order I needed.
I googled this problem and found that I should be using levels. The linked post is more about different ways this works, but I found it helpful in understanding the principles.
So, here's a set of invented data with a factor category that we'll examine.
ordinals<-c("first", "second", "fourth", "third", "fourth", "third", "second", "first")
numerals<-c(1,2,4,3,4,3,2,1)
data<-data.frame(ordinals,numerals)
plot(x=data$ordinals, y=data$numerals, #your x and y data
xlab='ordinals', ylab='numerals') #always label your axes
#This plot orders the factors alphabetically.
#However, it makes no sense logically to have fourth come right after first.
#I've also used this when I want to order sites
#by geography instead of alphabetically.
#Let's look at the data structure.
str(data)
#You can see that there are four levels of the factor data.frame$ordinals.
#'data.frame': 8 obs. of 2 variables:
#$ ordinals: Factor w/ 4 levels "first","fourth",..: 1 3 2 4 2 4 3 1
#$ numerals: num 1 2 4 3 4 3 2 1
#data$ordinals is ordered alphabetically (you can see fourth comes after first).
#To remedy this oddity, make a list ordering your levels as desired.
levels.we.want<-c("first", "second", "third", "fourth")
#Create a new column from the ordinals column,
#using the new level ordering as levels.
data$ordinals.ordered<-factor(data$ordinals, levels=levels.we.want)
str(data)
#The results now show the new column has correctly ordered levels.
#'data.frame': 8 obs. of 3 variables:
#$ ordinals : Factor w/ 4 levels "first","fourth",..: 1 3 2 4 2 4 3 1
#$ numerals : num 1 2 4 3 4 3 2 1
#$ ordinals.ordered: Factor w/ 4 levels "first","second",..: 1 2 4 3 4 3 2 1
plot(x=data$ordinals.ordered, y=data$numerals, #your x and y data
xlab='ordinals in order', ylab='numerals')#axis labels
#All better!
Monday, January 05, 2015
I'm running out of critters on signs, so I continue with signs about critters
Toothy warning sign, La Milpa, Belize. 09 January 2013. |
Unlike the previous post about the jaguar crossing sign, I actually got to see the referenced crocodile. I recall it was pointed out to me as a Morelet's Crocodile. The American Crocodile also occurs in Belize.
Morelet's Crocodile, La Milpa, Belize. 09 January 2013. |
Friday, January 02, 2015
It's a sign
By which I mean a link to a kingfisher perching on a sign, of course! The picture is from a collection of interesting and scenic photos of signs. These signs reminded me of this very artistic jaguar crossing sign in Belize.
Sadly, we saw no jaguars. Cockscomb Wildlife Sanctuary, Belize. 09 January 2013. |
Subscribe to:
Posts (Atom)