2013-02-07

Free online CAT weighting tool available


Language services providers spend valuable time using CAT analyses to determine project prices with different weighting factors.

LSP.net now provides a free online tool that makes price determination much easier. Users upload the analysis file in their browser and enter the price per word. The tool immediately determines the total price for the translation project, in addition to the price per file as required. When weighting factors are entered, the tool displays the weighted prices.

CAT Weighting Tool

The price per word and weighting factors can be adjusted to reflect changes. Automatic conversion into standard lines is also an option for the German-speaking region.

Online CAT weighting toolt
Users can save their personal settings for future visits to the website in a browser cookie. They can easily print their weighting tool results.

Unrestricted use, free of charge. Currently, the tool can only process SDL Trados analysis files (XML) from Studio 2009 and 2011 – but it will gradually be expanded to include analysis files from additional CAT environments. Visitors to LSP.net are invited to forward their wishes and suggestions for improvement to the developers using a convenient online form.

CAT Weighting Tool –> www.LSP.net/cat

Data privacy protection: LSP.net does not save user data or files. The analysis files are deleted as soon as the the LSP.net server has read the data.

SDL TRADOS® is a registered trademark of SDL

2012-09-04

Adding macros to the MS Word Normal template

Macros can be used in many ways in Microsoft Office applications, and quite a few people have published macros that are useful in preparing texts for translation in various ways. However, with the changes in Microsoft Office versions and the confusion of templates, toolbars, button and clickable text to run macros, a great number of very smart people are simply afraid to use tools that could save them many hours of work.

One fairly easy way to install and run macros is to put them in a global template, such as the Normal template, which is used every time Microsoft Word runs. It is a convenient place to put macros you might want to use in many different documents.

Start Microsoft Word. In MS Word 2003 and earlier versions, the macro list is found under Tools > Macros > View Macros.  In MS Word 2007/2010 the same functionality is accessed on the View ribbon with the Macros icon or Alt+F8. Open the list of macros and select the Normal template from the dropdown list:




Type the name (1) of the new macro and click Create (2):




The editor for Visual Basic macros will open and the the beginning and end of your new macro will be created automatically. You can change the name if you like or paste over the generated text. Type or copy the code for your new macro, save it and close the editor.




To run the new macro, open the macro list, select the macro (1) and click Run (2):





2012-08-01

Cleaning up messy tags in Microsoft Word documents

Some years ago there was considerable frustration among users of translation environment tools who encountered increasing numbers of irrelevant markers (also known as tags or codes) in the documents they translated. These might appear{1}around{2} words or even in the mid{3}dle of them, causing terms not to be recognized and matches in translation memories to fail or be downgraded. This problem is particularly acute with OCR documents, but it can occur in perfectly "normal" RTF, DOC or DOCX files as well.

Many complained about the problem, and providers of translation tools made excuses and avoided dealing with the matter for the most part until one gifted translator with formidable programming skills for macros in Microsoft Word came to the rescue. Dave Turner's CodeZapper collection has probably been one of the most useful support tools for handling RTF, DOC and DOCX files in CAT tools that the market has seen in many years. It has literally saved me hundreds of hours of trouble since I started using the macros.

If you work often with Microsoft Word documents in Trados, WordFast, memoQ and other environments, it is very much worth your while to learn how to use CodeZapper. It is so useful in fact that Atril integrated it in the release of its latest working environment, Déjà Vu X2, as an import option. I hope that other will eventually follow suit.

The collection also contains other useful macros (for tidying up PDF converted files, temporarily moving bulky pictures out and back into files to speed up import, etc.). No installation as such is required. The template file can be copied to the Startup folder of Microsoft Word or loaded from the templates and add-ins folder.

Detailed information on CodeZapper and how to get it can be found here:
http://www.asap-traduction.com/CodeZapper

2012-07-27

A plus for translator productivity!

The PlusToyz by German/English to Ukrainian/Russian translator Arkady Vyosotsky are named only half right. They are definitely a plus, but they are not toys.


On a single page in a Microsoft Word document, Arkady has given us thirteen great macros to improve translator productivity. Each macro is activated by double-clicking the relevant blue text and selecting the file to process in the dialog that appears. Personally, I find the format conversion macros most useful.

The first two enable uncleaned bilingual files in the classic Trados format (also WordFast Classic and Anaphraseus) to be reformatted as a table for easier proofreading, then switched back. The second macro - the one for converting tables in Microsoft Word to the classic Trados uncleaned bilingual format - can also be used as a basis for moving tabular data of many kinds into other translation tools for editing or translation or feeding to a translation memory.

Users of leading edge translation environment tools might not need the macros for handling Star Transit projects or Trados TTX (though users of simpler systems often do), but nonetheless gaps in the functionality of high-end tools like SDL Trados Studio or memoQ still make some of these macros very, very useful in certain situations.

So much value for so little money... well, no money to be more precise. The macros are free! Usually I say you get what you pay for, but in this case, there's a lot more to be had. Thank you, Arkady!

The macros can be downloaded here in a zipped Microsoft Word document.

2012-03-22

Improving scanned PDFs for translation reference

It' quite common these days to receive scanned documents from faxes or other sources as PDFs. These can be easy or rather devilish to convert to editable text using a variety of tools, but in some cases, they are simply wanted for reference. How do you search a large, scanned PDF document for a particular bit of text?

Mostly you don't.

Unless, of course, you are clever and convert the PDF to one of the various "text-on-image" PDF formats. If you are scanning hardcopy documents, it is also possible with many scanning applications to convert the input directly to such a format.

I use ABBYY FineReader 11 to make my scanned reference PDFs searchable. This is a quick and easy process that can be performed two ways.

The first and quickest method is to use the context menu by right-clicking on a PDF or image file in the Windows Explorer.

This creates a temporary, searchable PDF which can be saved under whatever name you like. I do this for documents which serve purely as references, where I have no interest in extracting text for translation. It has the disadvantage with FineReader of working with whatever defaults are in place for the last language used.

The second method involves importing the image document into the OCR program, then saving as a searchable document after OCR. This may be useful for documents that have more than one language, where you may apply different OCR settings (for languages) on various pages.

If automatic conversion is used (usually not recommended if you plan to extract text for translation), the process can be rather quick as well, though it is a bit more cumbersome than the context menu method. For example, a 114 page scanned German insurance policy from which I had to translate excerpts was imported from the original PDF, read (processed by optical character recognition) and saved as a PDF/A (searchable text on image PDF which is the current ISO standard for long-term archiving) in slightly less than 4 minutes.

Here's a screenshot of the text search in the PDF/A document using the Adobe Reader. Without this conversion process, it would be impossible to find any text in the document using search functions, because the entire content would be bitmapped images.

Even if you do very careful OCR to extract text for translation, defining zones and optimizing as I do, there are still significant advantages to making a searchable PDF as a reference. First of all, it is often very useful to see text in its proper layout context. Secondly, doing this also helps to identify and correct OCR errors during translation work. I recently translated a scientific article with horrible resolution in the faxed and scanned source document. It was definitely a borderline case for OCR, and when I imported it into a CAT tool for translation, I had to look up a number of places in the original document to see what the text really said. Copying the errors from the source of the OCR text and pasting them in the Search box of Adobe Reader made identifying the correct text a faster, easier process.

There are a number of tools available to convert PDF files to enable them for text search. This makes such resources "translator-friendlier" and may help us find the information we need to do a better job faster. Project managers and clients who are scanning documents for translation can jut as easily prepare the PDF files in this format and help their service providers.

2011-12-27

Best practice interoperability for outsourcers with memoQ

Given the growing popularity of Kilgray's memoQ as a staging platform for translation management in projects involving end customers and translators working with a variety of tools, it is increasingly important for translators using other tools to understand the best types of memoQ "bilingual" files to work with for their tools and the best procedures to apply. Here is a summary with links and advice relevant to various common tools and the reasons to adopt a particular approach where possible:
  • Trados Workbench macros in Microsoft Word - While the bilingual DOC format of memoQ seems natural for this tool, it is often a very bad idea. Experience has shown that these are very prone to "break" when segmentation is changed or the content is copied into another file which does not contain the properties information needed for memoQ to recognize and re-import the bilingual DOC, updating the file to translate. Thus it is recommended to use the bilingual RTF tables, preferably with the mqInternal style set for the tags when the RTF file is generated in memoQ. The color difference makes it easier to check the tags when proofreading. The file should be cleaned before returning it to the outsourcer, so the target column contains only the translation.
  • Trados TagEditor - The cleanest, most robust method involves using the source cells from a memoQ bilingual RTF file created with the mqInternal style specified for the tags. This source content is copied into a DOC or DOCX file, the dark red tag text hidden, and the prepared file is then translated in TagEditor. When the cleaned target file is saved, its content is pasted into the target column of the original memoQ bilingual RTF. If a comments column is provided in the RTF file, notes about terms to check or other matters can be added, and these will be available to the outsourcer after re-importing the bilingual file to memoQ. The procedure is described in detail in the lower part of the article here.
  • SDL Trados Studio (2009 & 2011) - Because using the memoQ RTF tables enables certain formatting, such as bold, italic or underlined text, to be seen in SDL Trados Studio, this is recommended over the use of XLIFF files for exchange. This also avoids the current bug in SDL Trados Studio which makes it difficult to import XLIFF files if the sublanguages are not specified. A robust procedure offering tag protection is described here.
  • WordFast Pro - The procedure to work with memoQ content and protect the tags is essentially the same as the recommendation for TagEditor, except that WordFast Pro can work directly with RTF files, so it is not necessary to move the content to a Microsoft Word file. The method is described here.
  • Wordfast Classic -While the bilingual DOC format of memoQ is "inviting" for this tool just like with the Trados TWB macros in Microsoft Word, experience has shown that translators are very prone to "break" the DOC files by changing segmentation or copying the content into another file which does not contain the properties information needed for memoQ to recognize and re-import the bilingual DOC, updating the file to translate. Thus, as with the Trados Workbench approach, it is recommended to use the bilingual RTF tables, preferably with the mqInternal style set for the tags when the RTF file is generated in memoQ. The color difference makes it easier to check the tags when proofreading. The file should be cleaned before returning it to the outsourcer, so the target column contains only the translation.
  • OmegaT - Although OmegaT handles XLIFF nicely in general, there are possibly problems with the current build of memoQ. Here there is a recommended procedure for working with the bilingual RTF tables by copying the source content into an ODT (Open Office) or DOCX file; after translation, the cells are copied into the target column of the bilingual RTF and any comments necessary are added (if the column for them is provided). The article also contains tips on the terminology data format for OmegaT to facilitate the export of terminology from memoQ.
This article will be updated as other workflows are tested and verified or as other methods are developed which offer better working conditions or better results.

2011-12-25

Translating content from memoQ using Trados TagEditor

The growing popularity of Kilgray's memoQ among translation agencies and corporate clients has sometimes posed challenges for users of other tools. One of the great advantages of memoQ is its ability to provide data which is compatible with many other tools, but it is still necessary to know the best way to do so to avoid trouble.

If you use an older version of Trados with TagEditor, one way to work with your client using memoQ is to request the content to translate as a bilingual XLIFF (*.xlf) file where the entire source text has been copied to the target segments. SDL Trados 2007 includes a default INI for XLIFF which will then allow you to read those "target" segments as the source in TagEditor. However, the default INI file for XLIFF in TagEditor requires optimization; among other things, it does not protect sensitive header information in XLIFF files from memoQ and SDL Trados Studio. (The German consultancy Loctimize has written some instructions on updating the INI; although these are focused on SDLXLIFF files, some of the information is relevant to XLIFF from memoQ and probably other sources.)

Translation memory content, if available, should be provided to you in TMX format, which can be read into your TWB translation memory. memoQ can also export terminology content as CSV for opening in Excel or as MultiTerm XML to import into SDL Trados MultiTerm if you use that tool. Thus your client is also able to provide you with any translation memory or terminology resources which are available.

After you have completed your translation, clean the TTX file from TagEditor to create a target XLIFF file (or just use the File > Save Target As... menu option in TagEditor). This finished XLIFF is all you need to return to the client, not your "uncleaned" TTX. When the XLIFF file is re-imported to memoQ it will include your complete translation. In case there are problems with the tags, your client will also be able to determine this and make corrections using memoQ's QA tools, though you should of course perform a careful tag check using the functions in TagEditor before you deliver.

Another popular method of data exchange for clients working with memoQ is to use the "bilingual RTF tables" in memoQ. If the files are properly prepared with a special workflow involving hiding the tags and converting the RTF to Microsoft Word format (which is described here), this is currently the best method for translating content from memoQ with TagEditor. If the RTF content is imported unmodified into TagEditor, the memoQ tags will not be protected and must be checked by the client very carefully in your delivered file. (The bilingual RTF file from memoQ must also be saved as a Microsoft Word file, because TagEditor will not read RTF properly - after translation, the file needs to be saved as RTF again.) If the client uses this method, ensure that the entire content of the source text column is copied to the target column and that the text property of all the text in the file except the target column content to translate is set to "hidden". TagEditor will then ignore the hidden text and allow you to translate the rest. After you have finished the translation, create a target file and set all the text in it to visible again. If you do work with memoQ content in this format, it is convenient if your client includes a Comments column in the file, because when you proofread your work, you can note any uncertain terms or source text problems (or other matters) in that Comments column. When the bilingual RTF table is re-imported into the client's memoQ project, the commented content can be filtered quickly and any issues identified and addressed quickly.