Set up Apache Tika server (optional)¶
Apache Tika is a content analysis toolkit used to detect and extract metadata and text from different file types. It can be used both as a service and a command line utility.
1 Install¶
In the console, create jars
directory in your home directory and
position into it:
mkdir ~/jars
cd ~/jars
Tika Server is a standalone runnable jar binary. Download the
appropriate version to the created jars
directory from
http://tika.apache.org/download.html.
Execute on the command line:
wget https://www.apache.org/dyn/closer.cgi/tika/tika-server-1.24.1.jar
2 Start¶
Start the Tika server by executing on the command line:
java -jar jars/tika-server-1.24.1.jar
The server will run in the foreground, and you can stop it when needed
with Control-C
.
The server will be available on 127.0.0.1
on port 9998
. To find
about other available options, execute on the command line:
java -jar jars/tika-server-1.24.1.jar --help
3 Test¶
To test if Tika server is running, open http://localhost:9998.
This should open a web page describing Tika’s REST API endpoints.