CafeTran importing segments in wrong order
Thread poster: Elizabeth Morris
Elizabeth Morris
Elizabeth Morris  Identity Verified
United States
Local time: 05:22
Russian to English
+ ...
Dec 14, 2021

Hi all. I imported a Word document into CafeTran (version 10.8.3), but I guess because of the layout (book layout with two book pages per one landscape-oriented page in Word, lots of text boxes and pictures) CafeTran has not only divided up a lot of the sentences into multiple segments (fine) but imported them in the wrong order (not so fine). So I'll have half a sentence as a segment and then the next segment is half of a different sentence from the facing page in the book; the end of the first... See more
Hi all. I imported a Word document into CafeTran (version 10.8.3), but I guess because of the layout (book layout with two book pages per one landscape-oriented page in Word, lots of text boxes and pictures) CafeTran has not only divided up a lot of the sentences into multiple segments (fine) but imported them in the wrong order (not so fine). So I'll have half a sentence as a segment and then the next segment is half of a different sentence from the facing page in the book; the end of the first sentence will be several segments down the list. I know how to join and split segments that are right next to each other, but is there a way to rearrange segment order altogether so that the right ones can be combined? Is there another work-around? Should I just give up and bug my client for a different format of this document and hope that he has one? I couldn't find anyone mentioning having this problem...I'm guessing this isn't something the tool can do (maybe it would mess up the export to re-order the segments?), but thought I'd ask just in case.Collapse


 
Jean Dimitriadis
Jean Dimitriadis  Identity Verified
English to French
+ ...
Open a support ticket/Try a round trip solution Dec 14, 2021

Hi Elizabeth,

1. I suggest you open a support ticket and attach the Word document so that the developer can have a look. This might be an instance where the Word filter can be improved.

2. In the meantime, I suggest you try importing the file in another CAT tool. If the segments appear in a correct order, you can then import the bilingual file in CafeTran, and when done, just finalize/export
... See more
Hi Elizabeth,

1. I suggest you open a support ticket and attach the Word document so that the developer can have a look. This might be an instance where the Word filter can be improved.

2. In the meantime, I suggest you try importing the file in another CAT tool. If the segments appear in a correct order, you can then import the bilingual file in CafeTran, and when done, just finalize/export the file in that tool (this is what we call a "round trip").

Here are some tools that you might want to try:

Smartcat supports exporting and importing back translated XLIFF files, handles various additional file types and offers filter options. This makes it excellent for roundtrip scenarios. See also their best practices document.

XLIFF Manager is a cross-platform open source graphical user interface for OpenXLIFF Filters (an open source set of filters for creating, merging and validating XLIFF 1.2 and 2.0 files) written in JavaScript.

Memsource: The free version supports up to two documents at a time (which should not be an issue here).

Wordfast Pro: the demo version is sufficient for a round trip scenario.

MateCat: it offers a tool to finalize a downloaded XLIFF that has then been translated in another CAT tool.

These are discussed in more detail in: https://github.com/idimitriadis0/TheCafeTranFiles/wiki/4-File-formats (see section 2 "External projects" for a particular external project or section 4 "Solutions" for a particular round trip solution).

If no other CAT tool produces segments in the correct order, I think you should indeed bring this up with the client.


[Edited at 2021-12-14 14:08 GMT]
Collapse


 
Tom in London
Tom in London
United Kingdom
Local time: 10:22
Member (2008)
Italian to English
Exported from InDesign Dec 14, 2021

I had something similar to this a while ago. It turned out that the Word file was an export from InDesign. I asked for a better quality conversion and they sent me the whole thing as a new Word file, not formatted. Maybe you could try asking again. If so, the task would be easier for them and easier for you.

[Edited at 2021-12-14 09:41 GMT]


Hans Lenting
 
Elizabeth Morris
Elizabeth Morris  Identity Verified
United States
Local time: 05:22
Russian to English
+ ...
TOPIC STARTER
Thanks so much for your helpful response Dec 14, 2021

Jean Dimitriadis wrote:

Hi Elizabeth,

1. I suggest you open a support ticket and attach the Word document so that the developer can have a look. This might be an instance where the Word filter can be improved.

2. In the meantime, I suggest you try importing the file in another CAT tool. If the segments appear in a correct order, you can then import the bilingual file in CafeTran, and when done, just finalize/export the file in that tool (this is what we call a "round trip").

Here are some tools that you might want to try:



[Edited at 2021-12-14 14:08 GMT]


Lots of things here that I can and will try.
Unfortunately re: #1 I'm contract-bound to not share the document with anyone (not that I think the developer would do anything fishy with it, especially if they don't read Russian!). But that's still a good step to know about for future issues with any other documents. Thanks for writing all of this up for me!


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
2 on 1 Dec 14, 2021

Do you know how the 2 on 1 layout was created? Columns? Most likely not. A specific kind of page layout in Ms Word?

My initial idea was to print to pdf and then convert this to Ms Word again.

But the experiment wasn’t promising.


 
Elizabeth Morris
Elizabeth Morris  Identity Verified
United States
Local time: 05:22
Russian to English
+ ...
TOPIC STARTER
Not columns Dec 14, 2021

German Dutch Engineering Translation wrote:

Do you know how the 2 on 1 layout was created? Columns? Most likely not. A specific kind of page layout in Ms Word?

My initial idea was to print to pdf and then convert this to Ms Word again.

But the experiment wasn’t promising.


I think Tom's speculation that it was made in another program and converted to Word is probably correct.
I've converted a pdf to Word before (in a previous situation where a client first sent me a pdf instead of the Word file and I thought I might just deal with it myself) and that also resulted in some segment mixups and occasionally a dropped word - and that was on a much simpler document layout.


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
Complex stuff Dec 14, 2021

I had another look.

Document 1: the original order of text boxes
Document 2: a rearrangement of the text boxes
Document 3: ran a macro on document 2

Screen Shot 2021-12-14 at 21.00.34

https://www.dropbox.com/s/9toniileamwt02o/textboxes.png?dl=0

Import in CafeTran Espresso:

Order in the Grid: doc 2, 3, 1

As you can see: completely different order:

Screen Shot 2021-12-14 at 20.58.22

Macro:

Screen Shot 2021-12-14 at 21.00.50

Probably you've decided to use another solution, but here's my idea:

Ask an experienced macro developer (e.g. https://www.proz.com/profile/133369) to create a solution for this. He's the developer of TransTools.



[Edited at 2021-12-14 20:09 GMT]


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
Other approach Dec 14, 2021

Manually insert consecutive numbers at the beginning of every text box and every normal paragraph (not in a text box).

Instruct CafeTran to segment on paragraphs and import the document. Sort the segments via the Filter menu.


[Edited at 2021-12-14 22:05 GMT]


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
Like this Dec 14, 2021

Tagged document:

Screen Shot 2021-12-14 at 22.16.05

(Temporarily replaced ^p with ¶.)

Sorted view in CafeTran Espresso:

Screen Shot 2021-12-14 at 22.14.42

With a fairly simple macro you can insert consecutive numbers very fast.

If you want segmentation per sentence (and why wouldn't you)

  • Create a bilingual export table.
  • Replace ¶ with ^p.
  • Import the table with normal segmentation.


 
Stanislav Okhvat
Stanislav Okhvat
Local time: 13:22
English to Russian
Re: CafeTran importing segments in wrong order Dec 15, 2021

Hello Elizabeth,

Suggestions from Tom and Hans (German Dutch Engineering Translation) are correct. I perform PDF conversion regularly and it has been my experience that, if the resulting Word document contains textboxes, the textboxes may be in an incorrect order, especially if those textboxes are used to render multi-column text which must flow from column 1 into column 2. All CAT tools read textboxes in the order of appearance within the document structure, and the order of appear
... See more
Hello Elizabeth,

Suggestions from Tom and Hans (German Dutch Engineering Translation) are correct. I perform PDF conversion regularly and it has been my experience that, if the resulting Word document contains textboxes, the textboxes may be in an incorrect order, especially if those textboxes are used to render multi-column text which must flow from column 1 into column 2. All CAT tools read textboxes in the order of appearance within the document structure, and the order of appearance depends on the anchor points of each textbox (normally, textboxes are anchored to paragraphs and an X / Y offset determines where the textbox is located on the paragraph's page in relation to the paragraph, and sometimes textboxes may have absolute positioning relative to page corner). However, a poor PDF converter can anchor textboxes incorrectly (say, a textbox will be anchored to paragraph 10 but appear next to paragraph 1 due to a negative vertical offset) or apply incorrect order if the textboxes have absolute positioning. When a Word document is prepared manually, the textbox order will be correct in most cases.

I would suggest asking for the source PDF file and convert it to a Word document yourself. Using ABBYY Finereader and the approach described here (go here), you will ensure that the number of textboxes is minimized as much as possible. Proper PDF conversion is the best way to go in this case.

If the above is not possible, you will need to redo the document layout manually. This will include getting rid of textboxes that are used for text layout, as much as possible. For example, if you see two text columns and these are produced by using textboxes, you will need to move the text out of the textboxes into the document body, then apply 2 columns to the text under Page Layout tab. If the original PDF was a brochure or magazine, you will not be able to get rid of all textboxes, though, since Word does not have all the layout features of tools like Indesign, but you should try to minimize textboxes. If textboxes are used for labelling, e.g. next to images, keep them as is.

You can also use Hans' approach, placing numbering within textboxes and sorting the segments inside CafeTran before translation. Since you mentioned that you see parts of sentences inside CafeTran, the textboxes may contain partial sentences, so this approach will not work unless you merge those textboxes somehow. In such case, I would go with the above approaches (PDF conversion or manual layout) instead.

Best regards,
Stanislav Okhvat
TransTools – Useful tools for every translator
Collapse


Hans Lenting
Jean Dimitriadis
 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
Or OCR Dec 15, 2021

Tom in London wrote:

I had something similar to this a while ago. It turned out that the Word file was an export from InDesign.


Or FineReader, or SolidPDF, or ...

Especially when an option 'Create exact layout' has been chosen.


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
Thanks Dec 15, 2021

Stanislav Okhvat wrote:

I would suggest asking for the source PDF file and convert it to a Word document yourself. Using ABBYY Finereader and the approach described here (go here), you will ensure that the number of textboxes is minimized as much as possible. Proper PDF conversion is the best way to go in this case.


Thanks for the thorough reply. Much appreciated. I was hoping for another smart TransTools solution, but obviously things are much more complicated .


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
Nice webinar Dec 15, 2021

Stanislav Okhvat wrote:

I would suggest asking for the source PDF file and convert it to a Word document yourself. Using ABBYY Finereader and the approach described here (go here), you will ensure that the number of textboxes is minimized as much as possible. Proper PDF conversion is the best way to go in this case.


Very informative. Thank you.


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
Example Dec 15, 2021

Document:

1

Replace paragraph marks with a placeholder:

2

Import in CafeTran Espresso:

3

Replace placeholders with line breaks:

4

Result:

5

Sort the segments:

6

You can now:

  • Remove the {1} numbers via Find/Replace.
  • Export to 2-column Ms Word document.
  • Or translate as-is.


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
Info Jan 12, 2022

Elizabeth Morris wrote:

Hi all. I imported a Word document into CafeTran (version 10.8.3), but I guess because of the layout (book layout with two book pages per one landscape-oriented page in Word, lots of text boxes and pictures) CafeTran has not only divided up a lot of the sentences into multiple segments (fine) but imported them in the wrong order (not so fine). So I'll have half a sentence as a segment and then the next segment is half of a different sentence from the facing page in the book; the end of the first sentence will be several segments down the list. I know how to join and split segments that are right next to each other, but is there a way to rearrange segment order altogether so that the right ones can be combined? Is there another work-around? Should I just give up and bug my client for a different format of this document and hope that he has one? I couldn't find anyone mentioning having this problem...I'm guessing this isn't something the tool can do (maybe it would mess up the export to re-order the segments?), but thought I'd ask just in case.


As it turned out, this was an InDesign document converted to PDF, converted to Ms Word.

That is important info.

Moreover, the title of the post "CafeTran importing segments in wrong order" is not correct. This was not a CafeTran Espresso issue ...



[Edited at 2022-01-13 10:50 GMT]


 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Natalie[Call to this topic]

You can also contact site staff by submitting a support request »

CafeTran importing segments in wrong order






Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »