Wednesday, September 30, 2015

Tika

  1. 2015.09.02
    1. building the sources
      1. wget http://apache.tt.co.kr/tika/tika-1.10-src.zip
      2. md5sum tika-1.10-src.zip
        1. 092d8bbc51756b180a8d65bbd4620801
      3. unzip tika-1.10-src.zip
      4. cd tika-1.10-src
      5. mvn install
  2. 2015.08.31
    1. introduction
      1. The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.
    2. supported formats
      1. http://tika.apache.org/1.10/formats.html

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.