Pages in topic:   [1 2] >
Converting PDF to Word - best tool?
Thread poster: John Fossey
John Fossey
John Fossey  Identity Verified
Canada
Local time: 14:24
Member (2008)
French to English
+ ...
Nov 18, 2016

Is there a PDF to Word conversion tool that does a good job, including OCR, correct layout, etc.?

I have been using ABBYY PDF Transformer 2.0 for years. It allows you to specify which part of the page is text, table or image and you can tell it to put text on top of an image. But it has been unavailable for new installs for some years.

ABBYY suggested I upgrade to 3.0 but I found it not very good.

Microsoft Word 2016 claims to convert PDF to Word, but I f
... See more
Is there a PDF to Word conversion tool that does a good job, including OCR, correct layout, etc.?

I have been using ABBYY PDF Transformer 2.0 for years. It allows you to specify which part of the page is text, table or image and you can tell it to put text on top of an image. But it has been unavailable for new installs for some years.

ABBYY suggested I upgrade to 3.0 but I found it not very good.

Microsoft Word 2016 claims to convert PDF to Word, but I found the results completely useless.

Surely there must be some new and improved technology?
Collapse


 
Bernhard Sulzer
Bernhard Sulzer  Identity Verified
United States
Local time: 14:24
English to German
+ ...
Thoughts Nov 18, 2016

John Fossey wrote:

Is there a PDF to Word conversion tool that does a good job, including OCR, correct layout, etc.?

I have been using ABBYY PDF Transformer 2.0 for years. It allows you to specify which part of the page is text, table or image and you can tell it to put text on top of an image. But it has been unavailable for new installs for some years.

ABBYY suggested I upgrade to 3.0 but I found it not very good.

Microsoft Word 2016 claims to convert PDF to Word, but I found the results completely useless.

Surely there must be some new and improved technology?


This is not really a direct answer, but from experience, also involving CAT tools, if I feel I need a Word version, I ask the client to provide it (when they are an agency). That's as a security when things go awry later when it comes to replacing text or using that Word file in a CAT tool. I have had quite some bad experiences with converted PDF files.


 
Tom in London
Tom in London
United Kingdom
Local time: 18:24
Member (2008)
Italian to English
I agree with Bernhard Nov 18, 2016

Bernhard Sulzer wrote:

This is not really a direct answer, but from experience, also involving CAT tools, if I feel I need a Word version, I ask the client to provide it (when they are an agency). That's as a security when things go awry later when it comes to replacing text or using that Word file in a CAT tool. I have had quite some bad experiences with converted PDF files.


My experience is the same as Bernhard's. The best conversion tool is......the client!

I became convinced of this after one particularly nasty conversion job that threw the pagination of a long illustrated document into complete chaos. My translation was very good (as I think it always is) but I just could not fix the pagination and wasted a lot of time trying to do that.

There ensued an almighty row with that client and no more work from them for many months. Eventually they came back to me because I always do a good job on the translations but now I always insist that the client provide me with a Word conversion and -very importantly- I check through the conversion before accepting the job.


 
wotswot
wotswot  Identity Verified
France
Local time: 19:24
Member (2011)
French to English
Try Nuance or Solid PDF Tools Nov 18, 2016

Both these do quite a decent job for what I call "clean" PDFs, i.e. PDFs created by Adobe software or by Word 2013/2016 (Save as, PDF).
But in my experience none of these tools do an acceptable job for "dirty" PDFs, i.e. scans, photocopies, etc.


 
John Fossey
John Fossey  Identity Verified
Canada
Local time: 14:24
Member (2008)
French to English
+ ...
TOPIC STARTER
Client's conversion usually just as bad Nov 18, 2016

Well, the problem I frequently have is that the client's conversion is just as bad, so I end up reverting to PDF Transformer 2.0 to get a decent conversion. I was hoping there was progress in newer tools.

It seems that modern conversion tools don't let you set the job up, but depend on automatically formatting the output, which the tool inevitably gets wrong.


 
Tony M
Tony M
France
Local time: 19:24
Member
French to English
+ ...
SITE LOCALIZER
My own experience Nov 18, 2016

I often receive PDF > DOC conversions, and most agencies are dleighted with the facsimile formatting, but don't appreciate all the nightmarish 'fudges' used to achieve it — until I send them back my translation and the bill!

I currently had a need to do this myself, so I thought I'd experiment.

I first of all tried 2 of the free online services; the first one yielded rssults that were worse than useless — the amount of additional work needed would have been more tha
... See more
I often receive PDF > DOC conversions, and most agencies are dleighted with the facsimile formatting, but don't appreciate all the nightmarish 'fudges' used to achieve it — until I send them back my translation and the bill!

I currently had a need to do this myself, so I thought I'd experiment.

I first of all tried 2 of the free online services; the first one yielded rssults that were worse than useless — the amount of additional work needed would have been more than re-creating the original document from scratch.
The second one couldn't handle my large document (51 pages) and simply got stuck... I lfet it running for about an hour, but to no avail.
Another online service offered a free trial — but limited to a file size too small for my immediate needs.

I then downloaded a trial version of the Nitro software (fully functional, for 14 days), and so far, the results seem promising. I opted for the 'middle road' — partially formatted text supplied in a column, but without attmpting to create a facsimile of the original layout. My 2 large documents took some time to process on my ancient slow PC (I think it was built by Brunel!), but it did get there in the end; I didn't find it terribly intuitive to use — that said, I did manage to do what I wanted without needing to read the instructions!
The results were not bad at all; most of the OCR was spot on, even on pages where the text was seriously askew; the partial formatting was not a lot of help, but also not too much of a hindrance. The worst thing for me was that it used multiple spaces to simulate justification, which plays havoc later with CAT; it DOES enable you to get rid of line returns, and I was able to do a few global s&r passes to put most obvious things right, as well as globally chaning the font, spacing, etc.

Overall, I'm very pleased with it, and may well end up buying this one.
Collapse


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 18:24
Member (2009)
Dutch to English
+ ...
AABBYY FineReader 12 Nov 18, 2016

The absolute best is: ABBYY FineReader 12. I've tried and tested them all.

I don't really think asking the client is a good solution, because every time I tried that in the past, they supplied me with a worse job than I could have done myself.

(if using ABBYY FineReader 12, remember to manually scan through the file in the program, and select the fields, such as Table, Text, Image: this will really help improve the final quality)


 
John Fossey
John Fossey  Identity Verified
Canada
Local time: 14:24
Member (2008)
French to English
+ ...
TOPIC STARTER
Thanks for the info Nov 18, 2016

Michael Joseph Wdowiak Beijer wrote:

(if using ABBYY FineReader 12, remember to manually scan through the file in the program, and select the fields, such as Table, Text, Image: this will really help improve the final quality)


That's what I was hoping to hear - the manual selection of layout is what most of the other programs miss.


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 19:24
Member (2006)
English to Afrikaans
+ ...
Hmm... Nov 18, 2016

John Fossey wrote:
Is there a PDF to Word conversion tool that does a good job, including OCR, correct layout, etc.?


If you want OCR, then... no, can't help. But if the PDF is editable, then try Trados 2015 or Wordfast Pro 4. I was pleasantly surprised recently to discover that their conversions to Word/RTF were very good. In both cases, simply load the PDF as a translatable, and the CAT tool will generate a DOC/X somewhere.

I also recall earlier versions of OCR software allowed me to select boxes myself, but the latest versions all auto-select, and although I can then adjust the boxes, I can no longer specify the sequence in which the boxes are read/saved.


 
Bernhard Sulzer
Bernhard Sulzer  Identity Verified
United States
Local time: 14:24
English to German
+ ...
More thoughts Nov 18, 2016

Michael Joseph Wdowiak Beijer wrote:

The absolute best is: ABBYY FineReader 12. I've tried and tested them all.

I don't really think asking the client is a good solution, because every time I tried that in the past, they supplied me with a worse job than I could have done myself.

(if using ABBYY FineReader 12, remember to manually scan through the file in the program, and select the fields, such as Table, Text, Image: this will really help improve the final quality)


Asking the client for the Word file has one reason: not to get blamed if I screw up because I used my own file.
Have I converted files myself? Yes, I have.
Did I have a lot of great experiences? Not really.
It will depend on the structure and formatting of the PDF file and the actual software used to create the original file from which the PDF file was created (ist conversion) before converting that one again to Word (2nd conversion); there are certainly files that can be converted more easily, but there are many that are hardly manageable (after having been converted from a PDF file) in a CAT tool and will allow you to create another PDF file for the client that looks like the one you received from the client.

I personally don't really depend on a conversion tool - and if I did, I would charge the client for using it.


 
Artem Vakhitov
Artem Vakhitov  Identity Verified
Kyrgyzstan
English to Russian
+ ...
ABBYY PDF Transformer 2.0 or new-ish FineReader Nov 18, 2016

FineReader is good if you need for example to correct skewed original or set more complex image recognition options. I have FR 11 Pro and it produces good results but sometimes, sadly, worse than PDF Transformer 2.0. So I personally need them both. In addition, I like the PDF Transformer's GUI better in that it's uncluttered. Some things are best addressed by direct copying and pasting from the PDF file if it's not a scanned one.

 
Sergei Leshchinsky
Sergei Leshchinsky  Identity Verified
Ukraine
Local time: 20:24
Member (2008)
English to Russian
+ ...
Only FR + your head will work well Nov 18, 2016

I have been using ABBYY PDF Transformer 2.0 for years.

This tool is just "Scan&Read" button of the bigger product -- FineReader. It is fully automatic and it is not always smart, or course.

Get the full version of FineReader. Manual segmentation is much better, especially with tables and rich formatting: you can change options and methods and see the result on the fly.

It's like with kids — worth doing yourself.

[Редактировалось 2016-11-18 20:39 GMT]


 
Robert Rietvelt
Robert Rietvelt  Identity Verified
Local time: 19:24
Member (2006)
Spanish to Dutch
+ ...
Where do you need it for? Nov 18, 2016

I haven't got the solution, but when I receive a PDF-file, I can't use it in Studio, although you can import it, but the results are horrible. What I do (possible with most PDF's I receive) is copying the text (and sometimes the pictures) and paste it in Word. The results are reasonable, and above all, workable! Works for me.

Hence my question, where do you need it for.


 
Tony M
Tony M
France
Local time: 19:24
Member
French to English
+ ...
SITE LOCALIZER
Impossible with scanned 'image' PDF files Nov 18, 2016

Robert Rietvelt wrote:

What I do is copying the text and paste it in Word.


That's fine when it is a PDF created directly from a native-format file.

The problem we are discussing here is when the PDF originates from a scanned document — i.e. is in the form of an image — where the only solution is to process it using OCR; or else to re-create the entire document manually from scratch!

Many people — myself certainly included! — feel more comfortable working with an editable document in the source language, into which we can enter our target translated text, deleting as we go.

Or of couse we might want to process using a CAT tool, which naturally require access to the source text in order to be able to function.


 
Robert Rietvelt
Robert Rietvelt  Identity Verified
Local time: 19:24
Member (2006)
Spanish to Dutch
+ ...
That is why I said ..... Nov 18, 2016

Tony M wrote:

Robert Rietvelt wrote:

What I do is copying the text and paste it in Word.


That's fine when it is a PDF created directly from a native-format file.

The problem we are discussing here is when the PDF originates from a scanned document — i.e. is in the form of an image — where the only solution is to process it using OCR; or else to re-create the entire document manually from scratch!

Many people — myself certainly included! — feel more comfortable working with an editable document in the source language, into which we can enter our target translated text, deleting as we go.

Or of couse we might want to process using a CAT tool, which naturally require access to the source text in order to be able to function.


.... most PDF's I receive.

For the rest I never met a tool that converts PDF 1 om 1 correctly.

[Edited at 2016-11-18 22:09 GMT]


 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Converting PDF to Word - best tool?






Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »