Tika
- 2015.09.02
- building the sources
- wget http://apache.tt.co.kr/tika/tika-1.10-src.zip
- md5sum tika-1.10-src.zip
- 092d8bbc51756b180a8d65bbd4620801
- unzip tika-1.10-src.zip
- cd tika-1.10-src
- mvn install
- 2015.08.31
- introduction
- The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.
- supported formats
- http://tika.apache.org/1.10/formats.html
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.