Thursday, 12 September 2013

ContentExtraction of PDF file in solr using Apache Tika

ContentExtraction of PDF file in solr using Apache Tika

I am trying to index the PDF file in the solr using the following tutorial
http://wiki.apache.org/solr/ExtractingRequestHandler But everytime i am
firing the command
java -jar post.jar *.pdf
it says some org.apache.solr.common.SolrException: Invalid UTF-8 middle
byte 0xe3 Error Kindly help me in indexing the PDF to solr server.Is there
any other integration then tika which can help me.

No comments:

Post a Comment