Tuesday 27 October 2015

Converting PDF to text...

I want to try scraping some data from PDF reports....
The recommendation seems to be to use pdftotext.

This pages tells me how to install:

  • http://superuser.com/questions/286961/pdf-to-text-convertor
It's UNIX command line stuff....

brew install xpdf

or I could have tried:

brew install poppler

I don't know yet if there is a difference...

I followed the first one and put in the command 

pdftotext your_pdf_file.pdf
and it's fast on the document, I used....

It works but for a small number of documents (say 10), it's nearly easier just to download each document, copy and paste....




Tuesday 20 October 2015

manipulating csv files...

A colleague used Unix to manipulate a csv file.
It was very large so importing it into R or opening in Excel would have causes some problems.

the command used was


perl was also used directly from the unix prompt to remove everything starting with a hash tag #

Useful addition to the data munging world!