PHP Classes

error

Recommend this page to a friend!

      PHP PDF to Text  >  All threads  >  error  >  (Un) Subscribe thread alerts  
Subject:error
Summary:error in sample
Messages:41
Author:Pawel Lancucki
Date:2016-06-13 20:05:10
 
  1 - 10   11 - 20   21 - 30   31 - 40   41 - 41  

  1. error   Reply   Report abuse  
Picture of Pawel Lancucki Pawel Lancucki - 2016-06-13 20:05:10
Hi, I always get error

Warning: Unexpected character in input: '\' (ASCII=92) state=1 in C:\pdf_to_text\PdfToText.phpclass on line 402

Parse error: syntax error, unexpected '[' in C:\pdf_to_text\PdfToText.phpclass on line 710

Paul

  2. Re: error   Reply   Report abuse  
Picture of Christian Vigh Christian Vigh - 2016-06-13 21:00:13 - In reply to message 1 from Pawel Lancucki
Hello Paul,

I suspect that the PHP version you are using is a little bit outdated. You sould be using a version >= 5.5 that supports namespaces and short array notation.

The error at line 402 comes from the declaration of the following class :

class PdfToTextException extends \Exception
{
...
}

saying that the PdfToTextException class inherits from the builtin PHP Exception class.

The "\" before the class name, "Exception", is here to resolve namespace issues.

Suppose for example that you want to customize my source and put it into a namespace ; for example, suppose you add the following line of code at the top of the file :

namespace my\very\personal\namespace ;

All the classes defined afterwards will belong to the "my\very\personal\namespace", including PdfToTextException.

In such a context, if I simply declare :

class PdfToTextException extends Exception

without putting a leading backslash before the class name "Exception", then PHP will search for a class named "Exception" within the namespace "my\very\personal\namespace", instead of searching it in the root namespace, where this builtin class is declared. Since your namespace (my\very\personal\namespace) will not contain any class named "Exception", it will result in a fatal error.

The second error at line 710 :

$object_ids = [] ;

uses the short array notation ; this notation has been introduced somewhere in PHP version >= 5.4.23 and is the equivalent of :

$object_ids = array() ;

I have set the class prerequisites to be used with PHP >= 5.5, so maybe you should consider upgrading your own version if you can ?

  3. Re: error   Reply   Report abuse  
Picture of Aryan Schmitz Aryan Schmitz - 2016-10-19 13:45:29 - In reply to message 2 from Christian Vigh
Hi

I’m not sure if I’m beating a dead horse but I tried to adapt you class for an older PHP version < 5.4. I’ve changed all the array definitions and pregmatch patterns etc.

However I did not understand how to solve the backslash issue in \Exception, \ArrayAccess, \Countable. Simply removing the backslashes did not solve it

With the examples I now get:

Fatal error: Can't inherit abstract function ArrayAccess::offsetExists() (previously declared abstract in PdfTexterCharacterMap) in …/pdf-to-text/PdfToText.phpclass on line 3772

Do you have a suggestion on how to solve this or is it not possible to get this class working on older PHP versions within reasonable effort?

Cheers

  4. Re: error   Reply   Report abuse  
Picture of Christian Vigh Christian Vigh - 2016-10-19 15:29:41 - In reply to message 3 from Aryan Schmitz
Hi,

Normally, this class should work with older PHP versions with more or less corrections. Of course, if you're targeting PHP4, it will be a little bit harder !

well I'm not sure it's a backslash issue (however, you can safely remove them if you're not integrating this class within a namespace).

I suspect it comes from the PdfTexterCharacterMap class which contains the following definitions :

class PdfTexterCharacterMap implements ArrayAccess etc...
{
...
abstract function count ( ) ;
abstract function offsetExists ( $offset ) ;
abstract function offsetGet ( $offset ) ;
}

...

class PdfTexterUnicodeMap extends PdfTexterCharacterMap
{
...
}

Apparently it comes from the version of PHP you're using, which does not support very well declaring "abstract" functions coming from implemented interfaces (such as offsetExists).

You can safely transform the above functions into the following do-nothing functions in class PdfTexterCharacterMap :

function count ( ) {}
function offsetExists ( $offset ) {}
function offsetGet ( $offset ) {}

Please let me know if it helps,
Christian.

  5. Re: error   Reply   Report abuse  
Picture of Aryan Schmitz Aryan Schmitz - 2016-10-19 17:06:02 - In reply to message 4 from Christian Vigh
Thank you for your quick followup!

This solved the error I got, however I got a new issue now

Parse error: syntax error, unexpected T_FUNCTION in …/pdf-to-text/PdfToText.phpclass on line 3983

with I think has to do with an anonymous function ( $a, $b ) here:

// Sort the ranges by their starting offsets
$this -> RangeCount = count ( $this -> RangeMap ) ;

if ( $this -> RangeCount > 1 )
{
usort
(
$this -> RangeMap,
function ( $a, $b )
{ return ( $a [0] - $b [0] ) ; }
) ;
}
}
}


  6. Re: error   Reply   Report abuse  
Picture of Christian Vigh Christian Vigh - 2016-10-19 17:16:34 - In reply to message 5 from Aryan Schmitz
well, I think you are right : you are using a version of PHP that is not supporting closures ; so you could rewrite it this way :

if ( $this -> RangeCount > 1 )
{
$callback = function ( $a, $b ) { return ( $a [0] - $b [0] ) ; } ;

usort ( $this -> RangeMap, $callback ) ;
}

Don't forget the last semicolon at the end of the $callback declaration (after all, this is the declaration of a variable named $callback, so you need a trailing semicolon, even if you have the "{}" construct in it)

Keep going, you've arrived at line 3983, you are near the end of file !

  7. Re: error   Reply   Report abuse  
Picture of Aryan Schmitz Aryan Schmitz - 2016-10-19 18:13:52 - In reply to message 6 from Christian Vigh
Thanks! It seems that I still have an issue with this syntax

Parse error: syntax error, unexpected T_FUNCTION in …/pdf-to-text/PdfToText.phpclass on line 3980

Line 3890 is: $callback = function ($a, $b) { return ($a[0] - $b[0]); } ; in

// Sort the ranges by their starting offsets
$this -> RangeCount = count ( $this -> RangeMap ) ;

if ( $this -> RangeCount > 1 )
{
$callback = function ($a, $b) { return ($a[0] - $b[0]); } ;
usort ( $this -> RangeMap, $callback ) ;
}

  8. Re: error   Reply   Report abuse  
Picture of Christian Vigh Christian Vigh - 2016-10-19 19:42:07 - In reply to message 7 from Aryan Schmitz
wow ! what is the PHP version you are trying to run ?

Anyway, I'm afraid that the only solution that remains is "the good old one" : declare either a function outside the class :

function __sort_ranges ( $a, $b )
{ return ($a[0] - $b[0]); }

then do :

usort ( $this -> RangeMap,'__sort_ranges' ) ;

OR declare it inside the class :

public function __sort_ranges ( $a, $b )
{ return ($a[0] - $b[0]); }

then :

usort ( $this -> RangeMap, array ( $this, '__sort_ranges' ) ) ;

This one exists since PHP 4.x, so it should work in your case.

  9. Re: error   Reply   Report abuse  
Picture of Aryan Schmitz Aryan Schmitz - 2016-10-19 21:00:36 - In reply to message 8 from Christian Vigh
Merci! It sort of works now :-)

I’m sorry for not mentioning this before but I’m testing it on PHP Version 5.2.4

Now the only problem is that high ASCII (and other) characters are not correctly translated, the first lines

Original file contents :
v01 – Bruce Demaugé-Bost – http://bdemauge.free.fr

Les hiboux

Charles Baudelaire

Cycle 3

*

POÉSIE

Sous les ifs noirs qui les abritent

Les hiboux se tiennent rangés

Ainsi que des dieux étrangers

Dardant leur oeil rouge. Ils méditent.



Sans remuer ils se tiendront

etc

-----------------------------------------------------------
Extracted file contents :

v01 ò°‘ Bruce Demaugḻ-Bost ò°‘ http://bdemauge.free.fr

Les hiboux
Charles Baudelaire Cycle 3
* POḧSIE
Sous les ifs noirs qui les abritent
ᱻᱡᱩᰴᱤᱥᱞᲒᱫᱮᰴᱩᱡᰴᱪᱥᱡᲑᲑᱡᲑᱪᰴᱨá±á²‘ᱣᱦᱩ ᱉ᱥᲑᱩᱥᰴᱧᱫᱡᰴᱠᱡᱩᰴᱠᱥᱡᱫᱮᰴᱦᱪᱨá±á²‘ᱣᱡᱨᱩ
᱌á±á±¨á± á±á²‘ᱪᰴá²á±¡á±«á±¨á°´ð‘§™á±¥á²á°´á±¨á²’ᱫᱣᱡᱩᰴ᱑á²á±©á°´á²á±¦á± ᱥᱪᱡᲑᱪᱩ

Sans remuer ils se tiendront
etc.

I also tested an other pdf and saw similar errors, for example
ö becomes Ṃ
ä becomes Ḷ
Ä becomes Ḣ

I think this may have to do with the MacRomanCharacterMap at least if I edit the characters involved I see changes but still not the right characters.


  10. Re: error   Reply   Report abuse  
Picture of Christian Vigh Christian Vigh - 2016-10-19 21:34:03 - In reply to message 9 from Aryan Schmitz
It seems that you are running a relatively old version of the PdfToText class. The current version is 1.2.50.

I just tested the sample you mentioned (which contains poems from french authors) with the current version, and everything is - almost - fine (I'm saying "almost" because here and there, a few accentuated characters are replaced by plain text, for example "é" by "p", and "è" by "q" ; I've met this issue in some other PDF samples, and I know where it comes from, it affects a few PDF samples that were submitted to me - and I also know that it will require me some significant amount of work to fix...). I'm providing you at the end of this message with the contents of the text that has been extracted using the current PdfToText version, and the sample you quoted.

All of that being said, I can recall that somewhere between 1.2.2x and 1.2.3x (maybe), I mistakenly introduced a regression that caused symptoms similar to the ones you described.

I suggest that you use the latest PdfToText version ; I'm aware that it will require you doing the same work again but I cannot foresee a better solution.

However, if it still gives the same output results (garbage like "ᱻᱡᱩᰴᱤᱥá"), please send me your modified PdfToText source code at the following address :

[email protected]

I will try to catch a version 5.2.4 of PHP and see what happens.

Christian.

PS : below is the output of the current version of the PdfToText class using the sample you cited...

--------------------------------------------------- Cut here

v01 – Bruce Demaugé-Bost – http://bdemauge.free.fr

Les hiboux
Charles Baudelaire Cycle 3
* POÉSIE
Sous les ifs noirs qui les abritent
Les hiboux se tiennent rangps Ainsi que des dieux ptrangers
Dardant leur œil rouge. Ils mpditent.

Sans remuer ils se tiendront
Jusqu'à l'heure mplancolique
Où, poussant le soleil oblique,
Les tpnqbres s'ptabliront.

Leur attitude au sage enseigne
Qu'il faut en ce monde qu'il craigne
Le tumulte et le mouvement ;

L'homme ivre d'une ombre qui passe
Porte toujours le châtiment
D'avoir voulu changer de place.

Les Fleurs du Mal
1857
Charles Pierre Baudelaire (1821 – 1867) est un poète français.

 
  1 - 10   11 - 20   21 - 30   31 - 40   41 - 41