This site uses cookies.
Some of these cookies are essential to the operation of the site,
while others help to improve your experience by providing insights into how the site is being used.
For more information, please see the ProZ.com privacy policy.
Abbyy PDF Transformer: breaking tags around non-English chars?!
Thread poster: Jan Sundström
Jan Sundström Sweden Local time: 10:56 English to Swedish + ...
Sep 9, 2008
Hi all,
I came across an annoying bug in Abbyy PDF Transformer, and I wonder if anyone else encountered it.
I want to OCR a layered PDF with Swedish text. In Abbyy, I select "Remove all formatting", and convert it to DOC.
When opening the resulting DOC file in Word, everything looks fine. I'm able to translate this with Trados TWB using the Word interface. Even with "show hidden characters", nothing unusual is noticed.
I came across an annoying bug in Abbyy PDF Transformer, and I wonder if anyone else encountered it.
I want to OCR a layered PDF with Swedish text. In Abbyy, I select "Remove all formatting", and convert it to DOC.
When opening the resulting DOC file in Word, everything looks fine. I'm able to translate this with Trados TWB using the Word interface. Even with "show hidden characters", nothing unusual is noticed.
BUT if I choose to open it with TagEditor instead, it reveals that the file has segment breaks (that were previously invisible) between all Swedish characters (å,ä,ö).
For instance "räksmörgås" is displayed in TE as räksmörgås
My guess is that it must be a problem with the character encoding, assigning the wrong code page to the output file.
But there is no setting for me to assign the character coding, neither in PDF Transformer 1.0 or 2.0.
I don't want any advice about switching to another OCR program. I have access to most of the other programs on the market, including FineReader. But I'd like to know if there is a solution to this bug?
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Ahmed Maher Local time: 11:56 English to Arabic + ...
Copy & Paste
Jan 11, 2009
Hello,
I remember that I have encountered the same problem, and I Just opened the word file and used sellect all to sellect all the file then I pasted it into a new word file. Then I imported this new one to tageditor, and every thing works well.
Regards, Ahmed Maher
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!
The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.
Manage your TMs and Terms ... and boost your translation business
Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.