Dealing with PDF files თემის ავტორი: David Jacques
|
These are increasingly the format clients send and there is no way of converting them so that any CAT software can handle them without messing up format and graphics/illustrations. Able2ExtractProfessional8 works if the pdf file is fairly basic but not if it's complex. Then pictures and graphics get either deleted completely or moved and the same happens to text because it's of different length in the target language to the length of the source language. This is true regardless of how the conver... See more These are increasingly the format clients send and there is no way of converting them so that any CAT software can handle them without messing up format and graphics/illustrations. Able2ExtractProfessional8 works if the pdf file is fairly basic but not if it's complex. Then pictures and graphics get either deleted completely or moved and the same happens to text because it's of different length in the target language to the length of the source language. This is true regardless of how the conversion's done. Worst of all is dragging and dropping in some OCR programme like Abby.
What's needed is a CAT programme able to handle PDF regardless of how complex it is. Is there any such thing in the pipeline? Or does anyone have a 100% guaranteed reliable way of dealing with this format?
Let's ban PDFs altogether!!!!:-) ▲ Collapse | | | Tom in London გაერთიანებული სამეფო Local time: 00:00 წევრი (2008) იტალიური -> ინგლისური
David Jacques wrote:
These are increasingly the format clients send and there is no way of converting them so that any CAT software can handle them without messing up format and graphics/illustrations. Able2ExtractProfessional8 works if the pdf file is fairly basic but not if it's complex. Then pictures and graphics get either deleted completely or moved and the same happens to text because it's of different length in the target language to the length of the source language. This is true regardless of how the conversion's done. Worst of all is dragging and dropping in some OCR programme like Abby.
What's needed is a CAT programme able to handle PDF regardless of how complex it is. Is there any such thing in the pipeline? Or does anyone have a 100% guaranteed reliable way of dealing with this format?
Let's ban PDFs altogether!!!!:-)
None of the tasks you describe have got anything to do with translating and we should not be doing them. | | | Jean Lachaud შეერთებული შტატები Local time: 19:00 ინგლისური -> ფრანგული + ... A real problem | Sep 16, 2014 |
It is a real problem, indeed.
I've tried several applications, including scanning and OCRing images. Nothing works reliably, especially when there are several columns, a very common occurrence. | | | Rolf Keller გერმანია Local time: 01:00 ინგლისური -> გერმანული There is no real solution | Sep 16, 2014 |
David Jacques wrote:
What's needed is a CAT programme able to handle PDF regardless of how complex it is. Is there any such thing in the pipeline?
This is impossible because PDF files don't contain all the necessary information. Cerating a PDF file is a one-way street: from a well-editable text to a well-printable text with limited editablity.
The recent version of Adobe Acrobat includes better editing features but "better" doesn't mean "full". And if your client uses an older version ... | |
|
|
dasmi Local time: 01:00 ინგლისური -> იტალიური + ... Iceni Infix + OmegaT | Sep 16, 2014 |
I translate PDFs successfully using Iceni Infix + OmegaT.
Iceni Infix is a PDF editor with an interesting Translate option: with this option you can export an XML (or tagged text) file containing all the text element of the PDF file.
You can then translate the XML file with OmegaT (that has an ad-hoc filter for Infix).
After translation you can import the translated XML into Infix and the software produces a translated PDF.
If there are layout problems you can fix... See more I translate PDFs successfully using Iceni Infix + OmegaT.
Iceni Infix is a PDF editor with an interesting Translate option: with this option you can export an XML (or tagged text) file containing all the text element of the PDF file.
You can then translate the XML file with OmegaT (that has an ad-hoc filter for Infix).
After translation you can import the translated XML into Infix and the software produces a translated PDF.
If there are layout problems you can fix them directly in Infix.
If the PDF files your customer sent you is the result of a scan you must OCR it (I use PDF Xchange Editor, but you can do it with Adobe Reader as well) so that you can obtain a text PDF. In this case, of course the text you will translate will have several typos. ▲ Collapse | | | iperbole10 იტალია Local time: 01:00 ფრანგული -> იტალიური + ... can be a solution? | Sep 16, 2014 |
you tried to open the file with OpenOffice Draw and then translate it with OO Draw? | | | Tatiana Grehan შეერთებული შტატები Local time: 19:00 ინგლისური -> რუსული + ...
"If the PDF files your customer sent you is the result of a scan you must OCR it (I use PDF Xchange Editor, but you can do it with Adobe Reader as well) so that you can obtain a text PDF. In this case, of course the text you will translate will have several typos."
Would you mind sharing how you OCR scanned PDFs using Adobe Reader? I have Adobe Reader XI, but never knew that it can be used to OCR "dead" PDF files.
[Edited at 2014-09-16 17:44 GMT] | | | ABBYY FineReader | Sep 16, 2014 |
I go with ABBYY FineReader all the time. Then I just have to review/correct the exported Word file for any typos before I begin translating with Wordfast.
It's by far the best bet around. | |
|
|
Rolf Keller გერმანია Local time: 01:00 ინგლისური -> გერმანული Adobe Reader for OCR? | Sep 17, 2014 |
Tatiana Grehan wrote:
Would you mind sharing how you OCR scanned PDFs using Adobe Reader?
Adobe Reader is not able to perform any OCR. OCR is a feature of Adobe Acrobat. | | | Alexander Somin გერმანია Local time: 01:00 ინგლისური -> რუსული + ... SITE LOCALIZER
There is a paid online OCR ABBYY service. And they have an online CAT tool for PDFs.
[Edited at 2014-09-17 07:13 GMT] | | | Hannah Doyle საფრანგეთი Local time: 01:00 ფრანგული -> ინგლისური + ...
Tom in London wrote:
David Jacques wrote:
These are increasingly the format clients send and there is no way of converting them so that any CAT software can handle them without messing up format and graphics/illustrations. Able2ExtractProfessional8 works if the pdf file is fairly basic but not if it's complex. Then pictures and graphics get either deleted completely or moved and the same happens to text because it's of different length in the target language to the length of the source language. This is true regardless of how the conversion's done. Worst of all is dragging and dropping in some OCR programme like Abby.
What's needed is a CAT programme able to handle PDF regardless of how complex it is. Is there any such thing in the pipeline? Or does anyone have a 100% guaranteed reliable way of dealing with this format?
Let's ban PDFs altogether!!!!:-)
None of the tasks you describe have got anything to do with translating and we should not be doing them.
Deep down I think you're right, yet when the issue arises I find it very difficult to point it out. It is frustrating though, particularly when the file in question is sent by an agency.
Considering this is such a common occurrence and considering the price of CAT tools, really we should have a CAT tool with sophisticated conversion features by now! | | | Platary (X) Local time: 01:00 გერმანული -> ფრანგული + ... The best and unique way | Sep 17, 2014 |
David Jacques a écrit :
Or does anyone have a 100% guaranteed reliable way of dealing with this format?
The way : not to deal with such files at all. | |
|
|
dasmi Local time: 01:00 ინგლისური -> იტალიური + ... Adobe Reader no OCR | Sep 17, 2014 |
Yes, Tatiana is right, actually with Adobe Reader you can not do the OCR (maybe it is possible with Acrobat). Anyway, as I said, you can use the OCR function of PDF Xchange Editor (my preferred PDF reader, actually). | | | Tom in London გაერთიანებული სამეფო Local time: 00:00 წევრი (2008) იტალიური -> ინგლისური Adobe Acrobat Professional | Sep 17, 2014 |
dasmi wrote:
Yes, Tatiana is right, actually with Adobe Reader you can not do the OCR (maybe it is possible with Acrobat). Anyway, as I said, you can use the OCR function of PDF Xchange Editor (my preferred PDF reader, actually).
I have a rather old version of Adobe Acrobat Professional that I was given for free when I was teaching at University.
This does have an OCR facility but I've always found it pretty useless at correctly extracting all the text from a PDF without making mistakes.
The other options I've tried, such as the one where you upload a document to a website and wait for the OCR text to come back, have been equally disappointing. The alternative software applications, such as ABBY, of which I purchased a copy, don't give any better results.
If the PDF is just a straight written text, translating it doesn't present too many problems, even if it's a handwritten document. Using my dictation software, I just read the PDF into a Word file and it is automatically typed out in English, which I can then correct at my leisure.
The problem comes with formatted PDFs; I've never yet found an OCR application that was able to reproduce the formatting correctly.
After all, isn't it the point of PDFs that they are very difficult to mess with? Isn't that why, if we're wise, we've uploaded our CVs to this website as PDFs?
After years of trying to keep clients happy by not complaining about PDFs, I now tend to refuse them because it takes so much time just to get them ready for translation.
And I'm a translator – not a converter of PDFs!
[Edited at 2014-09-17 16:49 GMT] | | | samehme შეერთებული შტატები Local time: 16:00 ინგლისური -> არაბული + ... The Application used to make the PDF.. | Sep 21, 2014 |
I, sometime, do use Adobe Acrobat Professional XI to Convert (Save As) the English text in a PDF into MS Word, for example. There are other Save As files options too.
Yet, complex typeset PDFs (e.g. with tables, illustrations, images, etc.) do require working on the file in the native application used to create the PDF. Which means: doing Desktop Publishing (DTP) work on the native file.
Nowadays, many translation projects do require doing the DTP task, after translatin... See more I, sometime, do use Adobe Acrobat Professional XI to Convert (Save As) the English text in a PDF into MS Word, for example. There are other Save As files options too.
Yet, complex typeset PDFs (e.g. with tables, illustrations, images, etc.) do require working on the file in the native application used to create the PDF. Which means: doing Desktop Publishing (DTP) work on the native file.
Nowadays, many translation projects do require doing the DTP task, after translating the text. And, this is done by the use of a CAT tool, say Trados.
A real life scenario, not that easy, of this would be:
- A complex PDF created (typeset) with InDesign.
- So, the customer sends the PDF to the translator.
- The translator asks for the IDML file, along with the images.
- IDML files can be analyzed, then translated using Trados.
- Then, and after the translation is completed, the target language IDML file can be Saved from Trados.
- This target language IDML can be opened with InDesign, and worked on to make sure all the text is displayed / formatted correctly, as per the English file.
- For the text that is embedded (outlined) in images, there are 2 ways to handle it:
1. Best option: Is to receive the images in their native format, so they can be translated and inserted into the native application, replacing the English text.
2. Text translated in a Word file, and then (this target text) (tweaked in a way) to be put in place of the English text.
- Finally, the target language PDF can be created from InDesign, and sent to the client for approval.
One final note is: if the customer's InDesign version supports the target language and their designer can work with this language, then after the translation is done, the IDML can be sent to the customer, for them to handle the DTP work. Otherwise, the services of a typesetter (or DTPier), who can handle the target language, can be used for doing this part. ▲ Collapse | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Dealing with PDF files Wordfast Pro | Translation Memory Software for Any Platform
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value
Buy now! » |
| Anycount & Translation Office 3000 | Translation Office 3000
Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |