intRo, day one, day two, day 3 goodies, useful functions
So, we had a very busy day. We learned to navigate our working directory, read a .csv file, make a few plots, write a function, write a for loop, creating and using a data frames, and using logical statements. Way to go team!
Below, I briefly summarize these concepts, with their example usage. I also point out a few things that I left out from class, and present the script we came up with (with a few tweaks). Note that I introduce how to generate random variables below.
Setting and knowing your working directory
You should always know where you are. Staying in a working directory is an easy (but not always necessary) way to make sure that you are loading the right data, and to know where the files that you save will go.
setwd() #example setwd("/Users/yanivbrandvain/Desktop/intRo")
To check which working directory we’re in write getwd().
#example getwd() [1] "/Users/yanivbrandvain/Desktop/intRo"
Files come in many formats. Here I focus on .csv fies. For information on reading other files, try help(read.table)
read.csv() #example MS=read.csv("Trans_MS.csv",header=TRUE,as.is=TRUE)
There are many things you can do elegantly in one or two lines of R code. But often we want to do somewhat more complicated things. As a general rule, it’s useful to write a function whenever you’re dealing with more than two or three lines of code. This is useful because it creates well defined problems to solve that are generally well described and transparent. Functions are also nice because they don’t clog up your memory.
get.y=function(x){
#new = some new vector or data frame
#do some fancy stuff to make new what you want
return(new)
}
y=get.y(x)
There are a number of useful loops (e.g. for(){}, . Here we focus on a for loop because of it’s simplicity and utility. Note that often we can replace loops with statements. If you feel very comfortable with while(){}, etc...)for() try using apply().
for (i in 1:10){
print(i)
}
Data frames
Making a data frame
this.frame=data.frame(matrix(nrow=100,ncol=2)) colnames(this.frame)=c("obs1",("obs2"))
There are two easy ways to navigate your data frame. We can refer to specific columns by first writing the name of the data frame followed by a $ (e.g. this.frame$obs1 or by using the column number in brackets (e.g. this.frame[,1] ). We can get a specific entry from our data frame similarly (e.g. this.frame$obs1[1] or this.frame[1,1]).
Lets make (somewhat) interesting entries in our data frame. Here, I’m introducing a generally useful thing, creating random numbers from a distribution.
this.frame$obs1=rnorm(nrow(this.frame),0,1) this.frame$obs2=this.frame$obs1+rnorm(nrow(this.frame),0,1)
plot(y~x) #or plot(x,y) #example plot(this.frame$obs1~this.frame$obs2)
We can spruce this up by putting a purple regression line through it.
abline(lm(this.frame$obs2~this.frame$obs1),col="purple")
We’ll spend more time making fancy plots soon!
Let’s say we want to only grab columns for which obs1 > obs2. We can get a logical vector of TRUE/FALSE statement by typing (this.frame$obs1-this.frame$obs2)>0. We can get a vector showing which entries satisfy this criteria by typing which(this.frame$obs1-this.frame$obs2)>0. Finally, we can get the values of all columns for which obs1 >obs2:
this.frame[(this.frame$obs1-this.frame$obs2)>0,]
get.means=function(our.data){
MS=unique(our.data$MS)
new=data.frame(matrix(nrow=length(MS),ncol=2))
colnames(new)=c("MS","Trans")
new$MS=MS
for(i in 1:nrow(new)){
new$Trans[i]=mean(our.data$Trans[our.data$MS==new$MS[i]])
}
return(new)
}

Wow, these kids are so lucky to be taught by you!
A big step up from last year