Set up Apache Tika server (optional)
Apache Tika is a content analysis toolkit used to detect and extract metadata and text from different file types. It can be used both as a service and a command line utility.
1 Install
In the console, create jars directory in your home directory and
position into it:
mkdir ~/jars
cd ~/jars
Tika Server is a standalone runnable jar binary. Download the
appropriate version to the created jars directory from
http://tika.apache.org/download.html.
Execute on the command line:
wget https://www.apache.org/dyn/closer.cgi/tika/tika-server-1.24.1.jar
2 Start
Start the Tika server by executing on the command line:
java -jar jars/tika-server-1.24.1.jar
The server will run in the foreground, and you can stop it when needed
with Control-C.
The server will be available on 127.0.0.1 on port 9998. To find
about other available options, execute on the command line:
java -jar jars/tika-server-1.24.1.jar --help
3 Test
To test if Tika server is running, open http://localhost:9998.
This should open a web page describing Tika’s REST API endpoints.