Wednesday, September 30, 2015

R

  1. 2015.09.03
    1. read file from remote server (windows)
      1. install openssh
        1. download the latest version from http://www.mls-software.com/opensshd.html
        2. the blank should not be included in the path where you install it
          1. e.g. D:\OpenSSH
      2. create keys
        1. run 'cmd'
        2. d:
        3. cd OpenSSH
        4. cd bin
        5. ssh-keygen -t rsa
          1. it will tell you where the keys are created
          2. e.g. your identification has been saved in /home/admin/.ssh/id_rsa
          3. the real path is something like, C:\Users\admin\.ssh
        6. scp /home/admin/.ssh/id_rsa.pub <user name>@<host name>:
        7. ssh <user name>@<host name>
        8. if there is no .ssh directory
          1. mkdir .ssh
          2. chmod 700 .ssh
        9. cat id_rsa.pub >> .ssh/authorized_keys
        10. chmod 644 .ssh/authorized_keys
        11. exit
        12. ssh <user name>@<host name>
          1. it should not ask you to enter the password this time
        13. warning: unprotected private key file! / permissions 0xxx for 'path/to/id_rsa' are too open (OpenSSH)
          1. chmod 600 path/to/id_rsa
          2. try again
      3. run R/Rstudio
        1. install.packages("RCurl", dependencies = TRUE)
        2. require(RCurl)
        3. x = scp("host name", "/path/to/the/file", key=c("C:/Users/admin/.ssh/id_rsa.pub", "C:/Users/admin/.ssh/id_rsa"), user="user name", binary=F)
        4. or
        5. d <- read.table(pipe( 'ssh -l <user name> <host name> "cat /path/to/the/file"' ))

Tika

  1. 2015.09.02
    1. building the sources
      1. wget http://apache.tt.co.kr/tika/tika-1.10-src.zip
      2. md5sum tika-1.10-src.zip
        1. 092d8bbc51756b180a8d65bbd4620801
      3. unzip tika-1.10-src.zip
      4. cd tika-1.10-src
      5. mvn install
  2. 2015.08.31
    1. introduction
      1. The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.
    2. supported formats
      1. http://tika.apache.org/1.10/formats.html