| Recommend this page to a friend! |
| PHP PDF to Text | > | All threads | > | error | > | (Un) Subscribe thread alerts |
| |||||||||||||
| 1 - 10 | 11 - 20 | 21 - 30 | 31 - 40 | 41 - 41 |
Hi, I always get error
Warning: Unexpected character in input: '\' (ASCII=92) state=1 in C:\pdf_to_text\PdfToText.phpclass on line 402 Parse error: syntax error, unexpected '[' in C:\pdf_to_text\PdfToText.phpclass on line 710 Paul
Hello Paul,
I suspect that the PHP version you are using is a little bit outdated. You sould be using a version >= 5.5 that supports namespaces and short array notation. The error at line 402 comes from the declaration of the following class : class PdfToTextException extends \Exception { ... } saying that the PdfToTextException class inherits from the builtin PHP Exception class. The "\" before the class name, "Exception", is here to resolve namespace issues. Suppose for example that you want to customize my source and put it into a namespace ; for example, suppose you add the following line of code at the top of the file : namespace my\very\personal\namespace ; All the classes defined afterwards will belong to the "my\very\personal\namespace", including PdfToTextException. In such a context, if I simply declare : class PdfToTextException extends Exception without putting a leading backslash before the class name "Exception", then PHP will search for a class named "Exception" within the namespace "my\very\personal\namespace", instead of searching it in the root namespace, where this builtin class is declared. Since your namespace (my\very\personal\namespace) will not contain any class named "Exception", it will result in a fatal error. The second error at line 710 : $object_ids = [] ; uses the short array notation ; this notation has been introduced somewhere in PHP version >= 5.4.23 and is the equivalent of : $object_ids = array() ; I have set the class prerequisites to be used with PHP >= 5.5, so maybe you should consider upgrading your own version if you can ?
Hi
I’m not sure if I’m beating a dead horse but I tried to adapt you class for an older PHP version < 5.4. I’ve changed all the array definitions and pregmatch patterns etc. However I did not understand how to solve the backslash issue in \Exception, \ArrayAccess, \Countable. Simply removing the backslashes did not solve it With the examples I now get: Fatal error: Can't inherit abstract function ArrayAccess::offsetExists() (previously declared abstract in PdfTexterCharacterMap) in …/pdf-to-text/PdfToText.phpclass on line 3772 Do you have a suggestion on how to solve this or is it not possible to get this class working on older PHP versions within reasonable effort? Cheers
Hi,
Normally, this class should work with older PHP versions with more or less corrections. Of course, if you're targeting PHP4, it will be a little bit harder ! well I'm not sure it's a backslash issue (however, you can safely remove them if you're not integrating this class within a namespace). I suspect it comes from the PdfTexterCharacterMap class which contains the following definitions : class PdfTexterCharacterMap implements ArrayAccess etc... { ... abstract function count ( ) ; abstract function offsetExists ( $offset ) ; abstract function offsetGet ( $offset ) ; } ... class PdfTexterUnicodeMap extends PdfTexterCharacterMap { ... } Apparently it comes from the version of PHP you're using, which does not support very well declaring "abstract" functions coming from implemented interfaces (such as offsetExists). You can safely transform the above functions into the following do-nothing functions in class PdfTexterCharacterMap : function count ( ) {} function offsetExists ( $offset ) {} function offsetGet ( $offset ) {} Please let me know if it helps, Christian.
Thank you for your quick followup!
This solved the error I got, however I got a new issue now Parse error: syntax error, unexpected T_FUNCTION in …/pdf-to-text/PdfToText.phpclass on line 3983 with I think has to do with an anonymous function ( $a, $b ) here: // Sort the ranges by their starting offsets $this -> RangeCount = count ( $this -> RangeMap ) ; if ( $this -> RangeCount > 1 ) { usort ( $this -> RangeMap, function ( $a, $b ) { return ( $a [0] - $b [0] ) ; } ) ; } } }
well, I think you are right : you are using a version of PHP that is not supporting closures ; so you could rewrite it this way :
if ( $this -> RangeCount > 1 ) { $callback = function ( $a, $b ) { return ( $a [0] - $b [0] ) ; } ; usort ( $this -> RangeMap, $callback ) ; } Don't forget the last semicolon at the end of the $callback declaration (after all, this is the declaration of a variable named $callback, so you need a trailing semicolon, even if you have the "{}" construct in it) Keep going, you've arrived at line 3983, you are near the end of file !
Thanks! It seems that I still have an issue with this syntax
Parse error: syntax error, unexpected T_FUNCTION in …/pdf-to-text/PdfToText.phpclass on line 3980 Line 3890 is: $callback = function ($a, $b) { return ($a[0] - $b[0]); } ; in // Sort the ranges by their starting offsets $this -> RangeCount = count ( $this -> RangeMap ) ; if ( $this -> RangeCount > 1 ) { $callback = function ($a, $b) { return ($a[0] - $b[0]); } ; usort ( $this -> RangeMap, $callback ) ; }
wow ! what is the PHP version you are trying to run ?
Anyway, I'm afraid that the only solution that remains is "the good old one" : declare either a function outside the class : function __sort_ranges ( $a, $b ) { return ($a[0] - $b[0]); } then do : usort ( $this -> RangeMap,'__sort_ranges' ) ; OR declare it inside the class : public function __sort_ranges ( $a, $b ) { return ($a[0] - $b[0]); } then : usort ( $this -> RangeMap, array ( $this, '__sort_ranges' ) ) ; This one exists since PHP 4.x, so it should work in your case.
Merci! It sort of works now :-)
I’m sorry for not mentioning this before but I’m testing it on PHP Version 5.2.4 Now the only problem is that high ASCII (and other) characters are not correctly translated, the first lines Original file contents : v01 – Bruce Demaugé-Bost – http://bdemauge.free.fr Les hiboux Charles Baudelaire Cycle 3 * POÉSIE Sous les ifs noirs qui les abritent Les hiboux se tiennent rangés Ainsi que des dieux étrangers Dardant leur oeil rouge. Ils méditent. Sans remuer ils se tiendront etc ----------------------------------------------------------- Extracted file contents : v01 ò°‘ Bruce Demaugḻ-Bost ò°‘ http://bdemauge.free.fr Les hiboux Charles Baudelaire Cycle 3 * POḧSIE Sous les ifs noirs qui les abritent ᱻᱡᱩᰴᱤᱥᱞᲒᱫᱮᰴᱩᱡᰴᱪᱥᱡᲑᲑᱡᲑᱪᰴᱨá±á²‘ᱣᱦᱩ ᱉ᱥᲑᱩᱥᰴᱧᱫᱡᰴᱠᱡᱩᰴᱠᱥᱡᱫᱮᰴᱦᱪᱨá±á²‘ᱣᱡᱨᱩ ᱌á±á±¨á± á±á²‘ᱪᰴá²á±¡á±«á±¨á°´ð‘§™á±¥á²á°´á±¨á²’ᱫᱣᱡᱩᰴ᱑á²á±©á°´á²á±¦á± ᱥᱪᱡᲑᱪᱩ Sans remuer ils se tiendront etc. I also tested an other pdf and saw similar errors, for example ö becomes Ṃ ä becomes Ḷ Ä becomes Ḣ I think this may have to do with the MacRomanCharacterMap at least if I edit the characters involved I see changes but still not the right characters.
It seems that you are running a relatively old version of the PdfToText class. The current version is 1.2.50.
I just tested the sample you mentioned (which contains poems from french authors) with the current version, and everything is - almost - fine (I'm saying "almost" because here and there, a few accentuated characters are replaced by plain text, for example "é" by "p", and "è" by "q" ; I've met this issue in some other PDF samples, and I know where it comes from, it affects a few PDF samples that were submitted to me - and I also know that it will require me some significant amount of work to fix...). I'm providing you at the end of this message with the contents of the text that has been extracted using the current PdfToText version, and the sample you quoted. All of that being said, I can recall that somewhere between 1.2.2x and 1.2.3x (maybe), I mistakenly introduced a regression that caused symptoms similar to the ones you described. I suggest that you use the latest PdfToText version ; I'm aware that it will require you doing the same work again but I cannot foresee a better solution. However, if it still gives the same output results (garbage like "ᱻᱡᱩᰴᱤᱥá"), please send me your modified PdfToText source code at the following address : [email protected] I will try to catch a version 5.2.4 of PHP and see what happens. Christian. PS : below is the output of the current version of the PdfToText class using the sample you cited... --------------------------------------------------- Cut here v01 – Bruce Demaugé-Bost – http://bdemauge.free.fr Les hiboux Charles Baudelaire Cycle 3 * POÉSIE Sous les ifs noirs qui les abritent Les hiboux se tiennent rangps Ainsi que des dieux ptrangers Dardant leur œil rouge. Ils mpditent. Sans remuer ils se tiendront Jusqu'à l'heure mplancolique Où, poussant le soleil oblique, Les tpnqbres s'ptabliront. Leur attitude au sage enseigne Qu'il faut en ce monde qu'il craigne Le tumulte et le mouvement ; L'homme ivre d'une ombre qui passe Porte toujours le châtiment D'avoir voulu changer de place. Les Fleurs du Mal 1857 Charles Pierre Baudelaire (1821 – 1867) est un poète français. |
| 1 - 10 | 11 - 20 | 21 - 30 | 31 - 40 | 41 - 41 |
info at phpclasses dot org.
