PHP Classes

Decode PDF, ODT, Word, DOC, DOCX, RTF: Need a decoder for multiple formats

Recommend this page to a friend!
  All requests RSS feed  >  Decode PDF, ODT, Word, DOC, DOCX, RTF  >  Request new recommendation  >  A request is featured when there is no good recommended package on the site when it is posted. Featured requests  >  No recommendations No recommendations  

Decode PDF, ODT, Word, DOC, DOCX, RTF

A request is featured when there is no good recommended package on the site when it is posted. Edit

Picture of herman lapre by herman lapre - 8 years ago (2016-02-20)

Need a decoder for multiple formats

This request is clear and relevant.
This request is not clear or is not relevant.

+7

The PHP file reader must be able to read PDF, ODT, DOC, DOCX and RTF documents.

  • 4 Clarification requests
  • 10. Picture of Nitin Shukla by Nitin Shukla - 7 years ago (2017-02-16) Reply

    I want to convert .doc, .rtf and .docx format into HTMl page without lost any content, style (bullets, tables, text format etc.). Can anyone please provide me any library/script that can handle all of my requirement.

    Script for individual format will also work for me.

    Thanks, Nitin

    • 9. Picture of Backiaraj by Backiaraj - 7 years ago (2016-12-02) Reply

      i like

      • 2. Picture of Christian Vigh by Christian Vigh - 8 years ago (2016-02-26) Reply

        Please clarify your demand : which data do you expect from the PDF/ODT/DOC/DOCX and RTF document reader ? do you want to manipulate document elements after decoding ? do you want to be able to perform modifications after decoding ? or do you simply want to display the document contents on a web page ?

        • 3. Picture of Manuel Lemos by Manuel Lemos - 8 years ago (2016-02-27) in reply to comment 2 by Christian Vigh Comment

          According to the request tags he wants a file viewer for those formats. So I suppose something that converts those formats to images will be helpful.

          It seems that OpenOffice/LibreOffice can be used for that purpose. the soffice program has options that can start the program opening a given file and convert the file to some other format, like Web pages with pictures, and then it exits without opening the GUI.

          So it can run from the console using the options --headless and --convert-to .

        • 4. Picture of Christian Vigh by Christian Vigh - 8 years ago (2016-02-27) in reply to comment 3 by Manuel Lemos Comment

          I have had some experience with OpenOffice/LibreOffice for converting .DOC/.DOCX to .PDF documents. I have encountered some formatting issues, especially with tables but in general it works well.

          In addition, the unoconv script provides a command-line interface for doing the conversion.

          However, as far as I can remember, I requires the openoffice daemon to be up and running.

          I don't know if this could address Herman's needs ?

        • 5. Picture of Manuel Lemos by Manuel Lemos - 8 years ago (2016-02-27) in reply to comment 4 by Christian Vigh Comment

          You do not need to have the OpenOffice daemon running. You can just start OppeOffice on demand to make the format conversion using the soffice command with the options mentioned above. So you do not need the unoconv script as well.

          Starting OpenOffice as a daemon has the advantage of keeping OpenOffice running in memory, just in case you need to convert many documents without delay. In that case you would use a script like unoconv to communicate with the daemon.

        • 6. Picture of satya teja by satya teja - 7 years ago (2016-07-11) in reply to comment 2 by Christian Vigh Comment

          hi i have the same question and i want to simply display the documents contents into their respective fields, for example if i upload a resume the data must be displayed into fileds like first name, last name etc.

        • 7. Picture of Muhammad Khalid Chaudhary by Muhammad Khalid Chaudhary - 7 years ago (2016-11-02) in reply to comment 4 by Christian Vigh Comment

          Can you explain how to convert .DOC/.DOCX to .PDF documents Using PHP and OpenOffice/LibreOffice ?

        • 8. Picture of Manuel Lemos by Manuel Lemos - 7 years ago (2016-11-02) in reply to comment 7 by Muhammad Khalid Chaudhary Comment

          I do not remember exactly. You need to check the documentation but I think it is something pretty easy. What may be hard is to have OpenOffice installed on the server. In any case maybe somebody can publish a class that can do that for you.

      • 1. Picture of Manuel Lemos by Manuel Lemos - 8 years ago (2016-02-26) Reply

        There are packages that can render some of those formats as images that you can display on a Web page.

        There are not packages for all those formats but some of them could be added later using external programs to render the files as images.

        That could be a innovative solution.

        Ask clarification

        1 Recommendation

        ApiLayer API Encapsulation: Send requests to APILayer REST APIs

        This recommendation solves the problem.
        This recommendation does not solve the problem.

        +4

        Picture of Christian Vigh by Christian Vigh package author package author Reputation 395 - 8 years ago (2016-02-26) Comment

        As Manuel said, there is currently no universal solution for that. The package referenced here is able to capture html contents and generate either an image or a pdf document, using a third party web service.

        • 3 Comments
        • 1. Picture of Dave Smith by Dave Smith - 8 years ago (2016-02-29) Reply

          While I support apiLayer, I do not think this is what the requester is looking for. They do NOT want to convert html, they want to view rtf, office doc and docx, and openoffice odt files.

        • 2. Picture of Dave Smith by Dave Smith - 8 years ago (2016-02-29) in reply to comment 1 by Dave Smith Reply

          Forgot to mention that they also want to read adobe pdf, not create them.

        • 3. Picture of herman lapre by herman lapre - 8 years ago (2016-03-04) in reply to comment 1 by Dave Smith Reply

          that is exactly what i need; i have research TET, TIKA , several pdf decoders etc. but they all cover partial solutions. Migrating to eg. elasticsearch solution is a bit overkill to me. I need a reader that is able to read the plain text from PDF,ODT,DOCX,DOC,RTF documents and the like


        Recommend package
        : 
        :