Monday, 12 January 2015

Manipulating data and data frames...

Some nice commands today:

To remove a data file:

rm(df.data2) 


To transpose a data frame (i.e. to swap columns and rows):

data2 <- as.data.frame(t(data)) #transform the columns into rows.

info from here:
http://stackoverflow.com/questions/6778908/r-transposing-a-data-frame

Getting started... advice for Jo Part 1....

OK Jo,

Please download R - I am guessing you have already don't that but if not, it's here.
Please download R-studio from the R-studio home page:

STARTING R-STUDIO
Please open R-studio....
This should give you a window like this:


Choose "New Directory"

You get this:


Choose "Empty Project"


You need to give a name to your project and choose a location to save the files into. 
Try to choose a name that is meaningful. 

The name I chose is "JosAnal20150112"

This now provides this kind of window:


This area allows you to use R in a slightly easier way than just the prompt. That is a bit more interactively. 

ORIENTING YOU IN R-STUDIO
The window on the left that makes up most of this screen is the Console. 
Please press the top right hand corner of this window:


This changes the window to this:


This is the same window with some explanations:


Don't worry if this doesn't make much sense yet.
To make R do stuff you will write some "scripts", you will run them to import your data and to make graphs and then you will get the output.

OK so far?

GETTING YOUR DATA INTO R...
To get your data into R, I have saved your .csv file into the folder we created above entitled: JosAnal20150112.
To make it a bit simplier, I have used Excel to delete the first 19 columns and metadata from the row except the "TargetFullName".
Then I saved this as a file named: "Column Variance HybMedNorm_Jo_Cut1.csv"

To import this into R-studio, I wrote the following text in the top box:

data<-read.csv("Column Variance HybMedNorm_Jo_Cut1.csv", header=TRUE)

Then you need to select the text with your mouse and press "Run" in the top right hand corner of this box.



See annotations:



This will run the little script by placing it inside the "Console" window below. You get an entry in your data box on the right hand side which you can now open by double clicking.

It's describes your data as being 8 observations of 1129 variables.

You can get a summary of your data and drawing your first simple graphs by writing the following:

summary(data)

plot(data$Complement.C4b)

in the "script" box, selecting them and pressing run.

This should give you output like this:



See the little graph in the bottom right hand corner?

See if you can make that work. That's enough for right now....