Friday, 31 January 2014
My first R generated word cloud
This prompts a few questions.
Firstly, what do the colours and sizes mean from a quantitative and statistical point of view?
Secondly, is it wise to eliminate some words like "will".
We have some funny words that have "e" and "es" taken off the end. For example "includ" and "communiti". This is a bit odd.
I probably need to understand more of the nuts and bolts of the code and what it really represents.
Interesting, though.
Text mining and making a wordcloud...
As my first experiment with R in terms of doing anything useful, I want to make a word cloud.
I found a page which helps - HERE.
I need to install various packages including 'tm' and 'wordcloud'.
I should probably read this reference: Introduction to the tm (text mining) Package.
I had to work out an error message:
It does seem to have worked but it's quite slow and the word "and" had come up a lot which doesn't seem quite right.
It also seems to have kind of crashed which is annoying!
I found a page which helps - HERE.
I need to install various packages including 'tm' and 'wordcloud'.
I should probably read this reference: Introduction to the tm (text mining) Package.
I had to work out an error message:
- "input string 33 is invalid UTF-8"
I did this by going back to Word and saving the file as UTF-8.
This is important - text must be in UTF-8 format!
It does seem to have worked but it's quite slow and the word "and" had come up a lot which doesn't seem quite right.
It also seems to have kind of crashed which is annoying!
Wednesday, 29 January 2014
Create a vector in R
The easiest way to create a vector is with the `c()` function, which stands for "concatenate" or "combine". To create a vector containing
| the numbers 1.1, 9, and 3.14, type `c(1.1, 9, 3.14)`. Try it now and
| store the result in a variable called `z`.
R-studio, swirl and a few more commands...
So it looks as if R-studio does help a bit.
Swirl is a program to help people learn statistics and R at the same time.
To install a package like Swirl:
To load a package like Swirl:
To run a package like Swirl:
Swirl is a program to help people learn statistics and R at the same time.
To install a package like Swirl:
> install.packages("swirl")
To load a package like Swirl:
> library("swirl")
To run a package like Swirl:
> swirl()
Tuesday, 28 January 2014
Simple R commands...
getwd() = gets the working directory
setwd("/Users/paulbrennan/Documents") = sets Documents in paulbrennan as the working directory.
read.csv("filename") = read a csv file.
An interesting R blog: http://onertipaday.blogspot.co.uk/
setwd("/Users/paulbrennan/Documents") = sets Documents in paulbrennan as the working directory.
read.csv("filename") = read a csv file.
An interesting R blog: http://onertipaday.blogspot.co.uk/
Getting good quality data that is worth visualising...
I need to learn how to scrape data.
Scraping from PDFs would seem like a good start.
A good search found this tutorial on scraping data from PDFs.
My first scrape from PDFs using scaperwiki didn't work as well as just cut and pasting into Word.
Scraping from PDFs would seem like a good start.
A good search found this tutorial on scraping data from PDFs.
My first scrape from PDFs using scaperwiki didn't work as well as just cut and pasting into Word.
Interesting free book on D3.js
The website about D3.js is here.
There are lots of online tutorials.
A free book entitled "Interactive Data Visualization for the Web".
There are lots of online tutorials.
A free book entitled "Interactive Data Visualization for the Web".
Tools for generating high quality visualisations
So there is:
- Excel
- Google Documents
- R
- Protovis
- D3.js
- Illustrator for polishing
- Flash
The question is what should I use?
Monday, 27 January 2014
Finding interesting data...
The Health in Wales website has a statistics and data section.
This mostly directs to the wales.gov.uk which really has a lot of data about Wales too. One interesting section is this one with Health-Statistics.
What I am really looking for is mortality statistics which is a bit of a challenge.
However, there is plenty of data in this Health Statistics section.
This mostly directs to the wales.gov.uk which really has a lot of data about Wales too. One interesting section is this one with Health-Statistics.
What I am really looking for is mortality statistics which is a bit of a challenge.
However, there is plenty of data in this Health Statistics section.
Subscribe to:
Posts (Atom)