PHP Classes

Converter issue

Recommend this page to a friend!

      PHP PDF to Text  >  All threads  >  Converter issue  >  (Un) Subscribe thread alerts  
Subject:Converter issue
Summary:Not yet converting PDF's correctly
Messages:2
Author:Rolf Kellner
Date:2016-05-13 17:18:34
 

  1. Converter issue   Reply   Report abuse  
Picture of Rolf Kellner Rolf Kellner - 2016-05-13 17:18:34
Hello,

Thank you very much for offering your PDF to text converter. Already tested some others without success. Also here at PHPClasses there are several approaches. Yours is the best! But (sorry) still I had to notice an issue by using your v.1.0.1. Please convert
sphider-plus.eu/test/dummy1.pdf
This is a PDF created with LibreOffice Writer version 5.0.5.2 containing text extracts with several non-ASCII characters. Currently it is not quite well converted. Of course all content is a UTF-8 based text. Just showing examples like defined in windows-1256 , gb2312, etc.
Thanks again for your efforts.

Tec
tec ( a t ) sphider-plus.eu

  2. Re: Converter issue   Reply   Report abuse  
Picture of Christian Vigh Christian Vigh - 2016-05-13 21:31:31 - In reply to message 1 from Rolf Kellner
Hello Rolf,

thank you for your greetings and your feedback...

My first reaction after converting your sample has been : wow ! it does not work so bad and I can even see characters in various languages such as arab, russian, etc. ! even Acrobat Reader generates dummy results when exporting your sample file to a .txt file.

My surprise is justified : the original need for this class was to extract text from documents written in french, and really unlikely to contain wide-range unicode characters. I added support for unicode but without really being able to test it thoroughly (because there was no need to do this at that time).

So your sample is a good starting point to allow me to further test unicode support.

I will have a look at it this week-end and will come back to you.

Christian.