Wednesday 18 June 2014

Download MySQL....

Useful video here:

https://www.youtube.com/watch?v=IOicYSxHlPc

Need to know if I have a 32bit or 64bit kernal.
Suliman's computer is 32bit.

Don't know about the others.
Where do we want to download this?

Talks about Condor and Navicat - I don't need this!

Another useful video:
https://www.youtube.com/watch?v=U1yrajrypJI

Describes how to import data into a MySQL database

Apache...

A tutorial about Apache and PHP:
http://machiine.com/2013/how-to-install-apache-and-php-on-a-mac-with-osx-10-8-mamp-part-1/

Part 2 covers MySQL
http://machiine.com/2013/how-to-setup-mysql-on-a-mac-with-osx-10-8-mamp-part-2/


Monday 16 June 2014

Adding tool tips to Electives World (v0.2)

Using code and advice from http://www.d3noob.org/2013/01/adding-tooltips-to-d3js-graph.html, I have just added tool tips with the location to Electives World.
It proved to be much easier than I had expected. While thing took about 30 minutes.
It's located here:
http://www.science2therapy.com/d3visualisations/electivesworld_0.2.html
It's not perfect but it's improved.

Next step: use more of the data.
Maybe colour by speciality
Or add speciality to tool tip.

I need to review the data to see what's feasible.


Friday 13 June 2014

webhosting with godaddy.com

I have set it up.

Files have to go into the public folder to be viewed.

There is a cPanel software for manipulating it.

To check it out go here:
http://www.science2therapy.com/d3visualisations/electivesworld.html


Not working through dropbox...

For some reason, my html page doesn't want to work through dropbox except when shared a file.
I need to get some web hosting....


Some geomapping success...

The map is here: https://dl.dropboxusercontent.com/u/7729166/electivesworld.html

It's not brilliant and most of the code is reused from various other sources but it is a success.

Not wanting to get ahead of myself, I am pretty happy because I have had to solve a lot of little issues.

Some of the issues:

I have sent the link off to Sam for comment. 
Happy for now....

Next step is to add some information when I put the mouse over the dots, I think. 

Some good advice about that here: www.d3noob.org/2013/01/adding-tooltips-to-d3js-graph.html


Find longitude and lattitude in a batch...

This place seems to do it nicely.

http://www.findlatitudeandlongitude.com/batch-geocode/#.U5sIXJRdU1c

Success drawing a map....

Baby steps, baby steps....

So I have successfully cloned Mike Bostock's map.

I had to get the data from github and put it on Dropbox.
I put it in the public folder.

I replace the link in the file to my Dropbox file.

Now fire up Chrome, load the local file and off we go.

It works!!

Very happy about that.


Github for Mac...

So I have opened up Rosaria's computer and I am downloading Github for mac.

It might make life easier.

At least from a forking point of view....

Let's see.....

Well it does make cloning etc easy but I am not sure it really helps.

Still, what I did find is a way to download files from git.

There is a download zip button on the website which really works!

This is very good!!!




node and npm....

I am clearly messing about with things I don't understand.
Usually, if I am using the dialog box, this is true but hey, I don't seem to have any choice if I want to make a good quality d3 map.

I found the following Introduction to npm  - so npm is a node programme manager.
It helps to install various node programmes.

In includes a warning about using sudo - the super user work around all the time.
Helpfully, it gives a solution too.

$ sudo chown -R $USER /usr/local

Ok, so it seems that using npm requires Xcode and I am pretty sure I don't have that installed.
Basically that's because some C+ compiling is required.
Answer here: http://stackoverflow.com/questions/22996677/why-is-npm-install-failing-because-of-xcode

Going to install Xcode 3.2
I'm working with earlier versions of things but hopefully it will work.

So, it does seem as if these have now worked.

I have installed Xcode 3.2 into a Developer folder in my local user area.

I have installed Contextify
$ npm install contextify

I have installed topojson
$ npm install -g topojson

$npm ls

gives this:

/Users/paulbrennan
└─┬ contextify@0.1.8
  ├── bindings@1.2.0
  └── nan@1.0.0


Thursday 12 June 2014

Maps and Homebrew.... arghh

I am trying to make a map on which to put the places that the Medical Students have gone on their elective.

I want to use the example of Extradition Treaties as a starting point for at least a map of the world.

I did the easy stuff and copy and pasted the script.

However, it now requires a JSON files entitled: world-50m.json

However, I don't really know how to get this.

This file seems to be a special type of JSON called topojson. I find this a little confusing as surely JSON is JSON but hey....

To use topojson, it seems that I need to install Node.js   - a software platform that provides scalable serve-side applications and networking.

Again why I need to do this is confusing.

Still I have done that.
I opened my terminal window.
I typed brew install node
This created a pile of error messages:

"m095:~ paulbrennan$ brew install node
Warning: Xcode is not installed
Most formulae need Xcode to build.
It can be installed from https://developer.apple.com/downloads/
==> Installing dependencies for node: pkg-config, readline, sqlite, gdbm,
==> Installing node dependency: pkg-config
==> Downloading http://pkgconfig.freedesktop.org/releases/pkg-config-0.28.tar.gz
######################################################################## 100.0%
==> ./configure --prefix=/usr/local/Cellar/pkg-config/0.28 --disable-host-tool -
checking for gcc... clang
checking whether the C compiler works... no
configure: error: in `/private/tmp/pkg-config-5K8f/pkg-config-0.28':
configure: error: C compiler cannot create executables
See `config.log' for more details
Error: Homebrew doesn't know what compiler versions ship with your version
of Xcode (dunno). Please `brew update` and if that doesn't help, file
an issue with the output of `brew --config`:
  https://github.com/Homebrew/homebrew/issues

Note that we only track stable, released versions of Xcode.

Thanks!

READ THIS: https://github.com/Homebrew/homebrew/wiki/troubleshooting

These open issues may also help:
document pkg-config configuration  (https://github.com/Homebrew/homebrew/issues/21129)
m095:~ paulbrennan$"

ADVICE
brew doctor

so I did that....

Again lots of WARNING messages:

Warning: Some directories in /usr/local/share/man aren't writable.
This can happen if you "sudo make install" software that isn't managed
by Homebrew. If a brew tries to add locale information to one of these
directories, then the install will fail during the link step.
You should probably `chown` them:

    /usr/local/share/man/mann

Warning: Xcode is not installed. Most formulae need Xcode to build.
It can be installed from
  https://developer.apple.com/downloads

Warning: Unbrewed dylibs were found in /usr/local/lib.
If you didn't put them there on purpose they could cause problems when
building Homebrew formulae, and may need to be deleted.

Unexpected dylibs:
    /usr/local/lib/libtcl8.6.dylib
    /usr/local/lib/libtk8.6.dylib

Warning: Unbrewed .pc files were found in /usr/local/lib/pkgconfig.
If you didn't put them there on purpose they could cause problems when
building Homebrew formulae, and may need to be deleted.

Unexpected .pc files:
    /usr/local/lib/pkgconfig/tcl.pc
    /usr/local/lib/pkgconfig/tk.pc

Warning: Unbrewed static libraries were found in /usr/local/lib.
If you didn't put them there on purpose they could cause problems when
building Homebrew formulae, and may need to be deleted.

Unexpected static libraries:
    /usr/local/lib/libtclstub8.6.a
    /usr/local/lib/libtkstub8.6.a

Warning: You have uncommitted modifications to Homebrew
If this a surprise to you, then you should stash these modifications.
Stashing returns Homebrew to a pristine state but can be undone
should you later need to do so for some reason.
    cd /usr/local/Library && git stash && git clean -d -f

Warning: Your Xcode is configured with an invalid path.
You should change it to the correct path:
  sudo xcode-select -switch /Developer
m095:~ paulbrennan$

Wednesday 11 June 2014

Removing data...

>rm(filename)


Getting protein size

The BioMart for Uniprot does not want to give any real protein features.

That's a bit disappointing.

Back to Ensembl:

According to this PDF:
http://www.stat.berkeley.edu/~sandrine/Teaching/PH292.F07/Docs/Durinck.pdf

I should be able to getSequence in Ensembl and get back the protein sequence.

This is possible.

For example, for BRCA1, using Ensembl:

> seq = getSequence(id="BRCA1", type="hgnc_symbol", seqType="peptide", mart=mart)

Challenge: it returns 29 different sequences!!!!


Using biomaRt...

This is a powerful R package for downloading data from mart enabled databases.

First download and enable the package:

source("http://bioconductor.org/biocLite.R")
biocLite("biomaRt")
library("biomaRt", lib.loc="/Library/Frameworks/R.framework/Versions/3.0/Resources/library")


Then
> listMarts()
This gives a list of the Mart (databases) that can be queried. Examples include ensembl, vega, unimart

Uniprot database is unimart

>ensembl = useMart("ensembl")

Then we have Datasets
> listDatasets(ensembl)

> ensembl = useDataset("hsapiens_gene_ensembl", mart=ensembl)

Give a subset of the data.

> filters = listFilters(ensembl)
gives a list of filters

Key command: getBM

  • attributes - vector of attributes on wants to get from the database
  • filters - vector of filters one will use as input for the query (e.g. affy_hg_u133_plus_2) 
  • values - values for the filters (e.g. Affy IDs)
  • mart - the database to be queried (e.g. ensembl)
eg:
> affyids=c("202763_at","209310_s_at","207500_at")
> getBM(attributes=c('affy_hg_u133_plus_2', 'entrezgene'), filters='affy_hg_u133_plus_2', vaulues= affyids, mart=ensembl)

Results:

  affy_hg_u133_plus_2 entrezgene
1         209310_s_at        837
2           207500_at        838
3           202763_at        836

This has the potential to be very useful. 



Starting with Bioconductor...

according to http://www.bioconductor.org/install/index.html

get lastest version of Bioconductor with these commands

source("http://bioconductor.org/biocLite.R")
biocLite()


I have just done this.


Monday 9 June 2014

Tutorial for using R and Bioconductor for Data Analysis for Genomics

http://genomicsclass.github.io/book/

Interesting link about bio-statistics

http://www.biogazelle.com/seven-tips-bio-statistical-analysis-gene-expression-data

Comparison of Pepper and Huttmann data...

So the two datasets are on similar patterns and so the same IDs are present in both.

However, the signals vary by a factor of 10:

Pepper Ave Signal: 184
Huttmann Ave Signal: 1725

That makes averaging the data as presented impossible.

Could do some kind of standardization to try to average or could do separate averages for the two data sets.

Could try calibrating with some known proteins....

Ideas here:
      some high expressing proteins (actin, et al)
      some prognostic markers CD38
      Zap-70

Some important functional molecules
      p50
      mcl-1

others....



Getting good quality Affy data...

Available to download:


Different data options.
DataSet full SOFT file seems the easiest to deal with at the moment.

Can also get the original CEL files....



so it's a bit different for the
 2004 Oct 1;22(19):3937-49.

Microarray gene expression profiling of B-cell chronic lymphocytic leukemia subgroups defined by genomic aberrations and VH mutation status.


This is data from 111 patients but is from different gene chips and is a bit more challenging. 

However, it would be excellent to get this, wouldn't it!

How many pieces in the CLL jigsaw?

According to Suliman’s thesis:

Average amount of protein in a 10^7 CLL cells = 382 microg

Round this up to 400 microg of protein

Average protein size = 40 kDa

1 kDa = 1.66 x 10^-21 g

40 kDa protein = 6.6 x 10^-20 g
Therefore, 1 g of an average protein = 6.6 x 10^20 molecules

400 microg reduces this to about 2.6 x 10^16 molecules in 10^7 CLL cells.

Therefore, one CLL cell has about 2.6 x10^9 protein molecules.

So each CLL cell has about 2.5 billion protein molecules.

First answer: 2.5 billion pieces in the CLL protein jigsaw.

Also DNA, RNA, lipids, carbohydrates, other macromolecules, ions and water but hey….

If we propose that approx 5,000 different proteins are expressed in a CLL cell:
Then there is an average of 5 x 10^5 molecules of each protein.

However, there is a range of sizes and a range of expression levels.


Add to this:
How many transcripts detected by Affymetrix chip?
How many proteins confirmed by proteomics?


How many cell surface proteins?
How many nuclear proteins?
How many mitochondria?



Monday 2 June 2014

What's in a boxplot...

I had to work out what the box plot represents.

Here's what I wrote:
"   In the plots, the dark line shows the median; the box shows the limits of the first and third quartile of data; the whiskers indicate the last data point within 1.5 times the inter-quartile range; the circles indicate data points that are outside 1.5 times the inter-quartile range.   "
 
Based on information from http://msenux.redwoods.edu/math/R/boxplot.php

Publication quality figures from R....

I have been preparing figures for Suliman's revised manuscript.
This has been a bit of an R learning curve as I have had to prepare the figures in a publication quality format.

However, I have done it and it's progress.

Firstly a list of resources that give some good tips and info:


So the key bit of code is as follows:


>ppi=600    #pixels per square inch
>png("boxplot_20140602.png", width=6*ppi, height=5*ppi, res=ppi)  
              #this creates the output file and adds information about size and resolution  
>boxplot(P1, P2, P3, P4, P5, P6, P7, P8, P9, P10, P11, P12, 
               frame=FALSE,   #suppresses the frame
               las =2,                 #turns the labels sideways
               names = c("Patient 1","Patient 2","Patient 3","Patient 4","Patient 5","Patient 6","Patient 7","Patient 8","Patient 9","Patient 10","Patient 11","Patient 12"), 
               yaxt="n",             #suppresses the y axis
               ylab = expression(bold("Relative iTRAQ signal")))
>axis(2, 
         0:4,
         labels=formatC(seq(0,4, by=1)), 
        las=1, 
        xpd=NA)  # allows the axis to go outside of the graph area and plot "0" and "4"
   #  This creates a new axis. 
   # R won't usually let the axis go outside the 'graph area'.

>dev.off()
    #very important command that tells R that you are finished plotting; otherwise your graph will not show up.

Some key concepts:
    This generates PNG files but similar commands are also available for PDFs, SVGs and others. VERY POWERFUL. 
    Increasing the resolution makes the lines thicker and the text more distinct. 

I used PNG to put into Powerpoint. 
Need to learn more about colour.