PHP Classes

Extracting Text

Recommend this page to a friend!

      PHP PDF to Text  >  All threads  >  Extracting Text  >  (Un) Subscribe thread alerts  
Subject:Extracting Text
Summary:pdf
Messages:4
Author:Tom Perro
Date:2016-05-24 13:00:37
 

  1. Extracting Text   Reply   Report abuse  
Picture of Tom Perro Tom Perro - 2016-05-24 13:00:37
Is there a way to Extract certain lines of text from a pdf file?

  2. Re: Extracting Text   Reply   Report abuse  
Picture of Christian Vigh Christian Vigh - 2016-06-10 07:30:00 - In reply to message 1 from Tom Perro
Sorry for the late reply.

What do you mean exactly by "retrieve CERTAIN lines of text" ? I mean, do you want to retrieve lines by their line number or lines that contain a certain text ? or do you have in mind another way of retrieving certain lines ?

Anyway, although the PdfToText class does not currently implement such a feature, I think this is a good idea and I will put that in a future version somewhere in June.

Could you just provide me with more détails on your exact needs so that I'll better address them ?

  3. Re: Extracting Text   Reply   Report abuse  
Picture of Ron Boyle Ron Boyle - 2017-02-01 20:38:32 - In reply to message 2 from Christian Vigh
I have a similar need for extracting text from just the first page. Large PDF files can take a good amount of time to extract the entire text when all I need is the first page.

Thank you so much for this tool, it is great!!!

  4. Re: Extracting Text   Reply   Report abuse  
Picture of Christian Vigh Christian Vigh - 2017-02-01 22:01:46 - In reply to message 3 from Ron Boyle
Thank you for your congratulations and thank you for using this tool !

You are right, extracting whole text contents from a pdf file can take time, because you need to interpret postscript-like drawing instructions and figure out what to do with them.

If you have a need only in extracting the first page contents, I would be happy if you could send me a BIG pdf file, so that I will be able to try to measure the performance of extracting only the first page vs extracting the whole page contents

You can send it to me directly at the following address :

[email protected]

With kind regards,
Christian.