Formatting information present in the source file usually needs to be reproduced in the target file. The in-line formatting information made possible by the supported formats (at present DocBook, HTML, XHTML, Open Document, and OpenOffice.org) is presented as tags in OmegaT. In a normal case tags are ignored when considering the similarity between different texts for matching purposes. Tags reproduced in the translated segment will be present in the translated document.
Tag naming: The tags consist of one to three characters and a number. Unique number allows to group those tag that correspond to each other, and differentiate the tags that can have the same shortcut character, but are in fact different. The characters may or may not reflect the value of the formatting the tag represents (e.g. bold, italics, etc.)
Tag numbering:Tags are
numbered in an incremental way by tag group. What we call
"tag groups" here is either a single tag (like <br1>
), on its own, or 2 tags forming a pair
(like <i0>
and
</i0>
). Within a segment, the
first group (pair or singleton) gets the number 0, the
second the number 1 etc. The first example below has 3 tag
groups (a pair, a singleton, and then another pair), the second example only
has one group (a pair).
Pairs and singletons:Tags always come either in singletons or in pairs. Single tags indicate formatting information that does not affect the surrounding text (extra space or line break for example).
<segment 2132><b0><Ctr+N></b0>, <br1><b2><Enter></b2><end segment>
<br1>
is
a single tag and does not affect any surrounding text. Paired
tags usually indicate style information that applies to the text
between the opening tag and the closing tag of a pair. Whatever happens to a
pair, the opening tag should always come before the closing tag:
<segment 3167>Log file (<b0>log.txt</b0>) for tracking operations and errors.<end segment>
<b0>
and </b0>
are paired and affect the text
log.txt
.
OmegaT creates its tags before sentences segmenting is applied. Depending on the segmenting rules, it may appear as if tags do not respect the above rules of numbering and grouping. Check the following text: "Before: \. After: \s". One would expect segmentation as follows(<b0> and </b0> stand for the start and end of italics):
<segment ....> <b0>Before: \. After: \s</b0><end segment>
However, when the default segmenting rules are applied to this segment, we will end with the following result:
<segment 1990> <b0>Before: \. <end segment>
<segment 1991> After: \s</b0><end segment>
Applying the rule segment after a period followed by a space
splices the original text in two segments, even if the two tags
<b0>
and </b0>
should be kept together in the same
segment. In some cases that may even cause problems in the translation, when
tags must be placed differently in the target language to reflect the word
order of that language (see Tag operations
below).
Care must be exercised with tags. If they are accidentally changed, the formatting of the final file may be corrupted. The sound approach is "Don't fix, what's not broken". However, it is still good to know, what is possible and how to do it.
Tag group duplication: To duplicate tag groups, just copy them in the position of your choice. Keep in mind that in a pair group the opening tag must come before the closing tag. The formatting represented by the group you duplicated will be applied to the section where you duplicated it.
Example:
<segment 0001><b0>This formatting</b0> is going to be duplicated here.<end segment>
After duplication:
<segment 0001><b0>This formatting</b0> has been <b0>duplicated here</b0>.<end segment>
Tag group deletion:To delete tag groups, just remove them from the segment. Keep in mind that a pair group must have its opening as well as its closing tag deleted to ensure that all traces of the formatting are properly erased, otherwise the translated file might get corrupted. By deleting a tag group you will remove the related formatting from the translated file.
Example:
<segment 0001><b0>This formatting</b0> is going to be deleted.<end segment>
After deletion:
<segment 0001>This formatting has been deleted.<end segment>
Modifying the order of tag groups:To change the order of a tag group to reflect a different language structure in the translation, simply put the tag group where it should be in the translation. The formatting will follow the part it is applied to.
Example:
<segment 0001><b0>Formatting zero</b0> and <b1>formatting one</b1> are going to be inverted around.<end segment>
After order modification:
<segment 0001><b1>Formatting one</b1> and <b0>formatting zero</b0> have been inverted.<end segment>
Modifying a tag group order may result in the nesting of a tag group within another tag group. This is fine as long as the enclosing group totally encloses the enclosed group. In other words, when moving paired tags make sure that both the opening and the closing tag have been move in the target otherwise the translated file might be corrupted and may not open. The nested part will then have both formats apply to it.
Example:
<segment 0001><b0>Formatting</b0> <b1>one</b1> is going to be nested inside formatting zero.<end segment>
After nesting:
<segment 0001><b0>Formatting <b1>one</b1></b0> has been nested inside formatting zero.<end segment
Overlapping is the result of bad manipulations of tag pairs and will certainly result in formatting corruption and sometimes in the translating file not opening at all. Example:
<segment 0001><b0>Formatting</b0> <b1>one</b1> is going to be messed up.<end segment>
After bad manipulation:
<segment 0001><b0>Formatting <b1>one</b0> </b1>is very messed up now.<end segment>
The validate tags function detects changed tags (whether done deliberately or by accident), and shows the affected segments. Starting this function - by pressing Ctrl+T - will open a window containing all segments in the file that may have suspected broken or bad tags in the translation. Bugs in the translated tagged text are often a problem in OpenDocument or OpenOffice.org files as they may not open due to tag problems created in the process of translation. Fixing the tags and recreating the target documents is easy with the validate tags function. The window, that opens on pressing Ctrl+T, features a 3 column table with a link to the segment, the original segment and the target segment:
1 | A different display font can be selected via the <b0>Display Font</b0> dialog. Open it via the <i1>Settings</i1> > <i2>Display Font...</i2> menu item. The font type and size can be changed from the dialog. | 'n Mens kan 'n ander vertoonfont kies met die <b0>Vertoonfont</b0>-dialoogkassie. Kies <i1>Opstelling</i1> > Vertoonfont... op die kieslys. Die lettertipe én die lettergrootte kan met dié dialoogkassie verander word. |
The tags are highlighted in bold blue for easy comparison between the
original and the translated contents. Click on the link to activate the
segment in the Editor. Correct the error if necessary and press
Ctrl+T
to return to the tag
validation window to correct other errors. Tag errors
are tag manipulations in the translation that do
not reproduce the same tag order and number as in the original segment. Some
tag manipulations are necessary and are benign, some will cause problems when
the translated document is created.
Tags generally represent some kind of formatting of the original text. Simplifying the original text formatting greatly contributes to reducing the number of tags. Unifying used fonts, font sizes, colors etc should be considered if possible as it could simplify the translation and reduce the possibility for tag errors. Take a look at the Tag operations section to see what can be done with tags. Remember that if tags bother you and formatting is not extremely relevant for the current translation, removing tags may be the easiest way out of problems.
If you need to see tags in OmegaT but do not need to retain most of the format in the translated document you are free not to include tags in the translation. In this case pay extra attention to tag pairs since deleting one side of the pair but forgetting to delete the other one will certainly corrupt your document's formatting. Since tags are included in the text itself, it is possible to use Segmentation rules to create segments with less tags. This is an advanced feature and some experience is required if you want to apply it properly.
Important: OmegaT is not yet able to detect mistakes in formatting fully automatically, so it will not prompt you if you make an error or change formatting to fit your target language better. Sometimes, however, your translated file may look strange, and in case of OpenDocument / OpenOffice.org files it may even refuse to open.
In some programming languages (e.g. PHP, C) special tags are used as placeholders in strings that are used in combination with the printf-function. E.g.:
$var =_("cool"); echo printf(_("OmegaT is very %s"),$var);
If the text strings are extracted from this source code (e.g. via the
PO-filter), OmegaT does not replace these variables with tags, because it
cannot know for certain if e.g. %s
is actually a placeholder or
just a part of a text. You can however enable validation of these placeholders.
Select Options→ Tag Validation... from the menu. You can choose
between simple and full validation.
The official syntax for printf-variables is
"%" [ARGUMENTSWAPSPECIFIER] [SIGNSPECIFIER] [PADDINGSPECIFIER] [ALIGNMENTSPECIFIER] [WIDTHSPECIFIER] [PRECISIONSPECIFIER] TYPESPECIFIER
Full validation fully supports this format, except for the
WIDTHSPECIFIER
.
In simple validation the following syntax is checked:
"%" [ARGUMENTSWAPSPECIFIER] [PRECISIONSPECIFIER] TYPESPECIFIER
You can change the order of the placeholders, but then you have to add the
ARGUMENTSWAPSPECIFIER if it is not there yet. This means adding a sequence
number and a dollar sign after the %
. E.g.
"%s is a %s application"
equals
"%1$s is a %2$s application"
which you can translate with
"a %2$s application is %1$s"
.
Legal notices | Home | Index of contents |