Live chat
Click here to chat with Qabiria!Twitter Updates
Related articles
Request a quotation
If you would like us to provide a proposal for any project you may have, please use the contact form.
|
When you must translate a PDF file, there are different options for converting it to an editable format, depending on its structure.P DF files are one of the translators' worst enemies. A PDF file must be converted into a editable format in order to be analysed or translated with a CAT program. Based on the PDF type, the conversion can be more or less difficult - sometimes impossible. The rapid identification of the PDF type is the prerequisite for choosing the most appropriate conversion process, selecting the best tools and save time.PDF stands for Portable Document Format. PDF was developed by Adobe in 1993 for representing two-dimensional documents in a manner independent of the application software, hardware, and operating system. Basically a PDF is always viewed and used in the same way, no matter what computer is used. Thanks to these characteristics, PDF has become one of the preferred formats for sharing documents. For many people creating a PDF version of a document is like making a "virtual photocopy". While this method is very handy, it presents several disadvantages when the document needs to be modified or translated. A PDF document is composed of different elements. Some of them are independent from the visible text such as the text properties (author, title, etc.) Others are parts of the text and generally include: Text, bitmap images (pictures), vector graphics (lines, diagrams, etc.) It is important to determine whether the document we are viewing is a text; in this case, it can be selected. To find out, you only need open the document with Adobe Reader (or any other PDF viewer) and click on the Select Text icon in the toolbar. Alternatively, you can zoom the document. If at some point the text appears out of focus or badly printed, the document is a scan. On the contrary, if the text can be selected or if its resolution does not deteriorate when zooming, the PDF was generated by an application. In order to identify which application was used the PDF, you can press CTRL+D (or select File / Document / Properties) and read the file description tab. Under Application you should see the name of the program used to generate the PDF. Ideally, at this point you should ask your client for the editable file, as you have just confirmed its existence.. A good way to persuade the client is applying an extra charge for converting the PDF file. Obviously, the approach depends on the relationship with the client and/or the specific project. To tell the truth, it may happen that the client - especially if it is a multinational organisation - does not have the editable file. Indeed, DTP is often managed at Headquarters and local branches only receive PDF files for printing. The need for translation may only arise at this stage. In this case, finding the original document can be very difficult. If despite all efforts it is not possible to obtain the original file, there are various options on how to export the text. At this point we should warn that none of the export options will allow obtaining a file identical to the original (including fonts) - in particular when it contains bitmap images and certain types of formatting . The choice of the export method, as well as the degree of accuracy, will also depend on the text intended use. There are two possible situations:
If you do not have Adobe Acrobat:
In the case where a file needs not only preserving its format, but also being entirely reproduced (provided that the source file is not available), there are two possibilities:
Infix Professional (at about 160 $) has a useful feature to export the text content of a PDF in XML format. The resulting XML file can be processed by a CAT tool and translated (e.g. OmegaT, since version 2.3.0 which is equipped with a filter to directly open this file type, as per this in-depth tutorial from OmegaT website). Then Infix Professional can reimport the translated file into the original PDF. The Infix website shows the whole procedure in a self-explaining video. Those who do not want to purchase an OCR program or only need it occasionally can use one of the many online convertors, such as Zamzar (http://www.zamzar.com). As already stated, what we have explained so far only applies to PDF files that are generated by an application. When the PDF text is an image (this is typically the case of a scanned fax), the only way to export it to an editable format is using an OCR program.The eventual document protection settings represent an additional complication. In fact, two protection levels can be activated using a "user password" and an "owner password". The "user password" prevents the document from being opened. The "owner password" restricts access to one or more functions such as print, copy, modify, insert notes, etc. If the PDF author restricts access to functions using a password, the methods described above cannot be used. You must contact the client and ask for the password. If this is impossible, you should be aware that there are several tools to decipher "owner passwords". You only need search "PDF crack" on Google. You can also use online programs such as http://www.ensode.net/pdf-crack.jsf). The situation is more complicated when the "user password" prevents the PDF from opening; in this case it is only possible to use intrusive software that may take hours or even days before deciphering the password. Please note that the use of this software may infringe property rights. Qabiria does not promote their utilisation by any means.
Bookmark
Email this
Trackback(0)TrackBack URI for this entryComments (8)Subscribe to this comment's feed...
ciao, mi e' piaciuto molto, anche se e' un po' lunghetto... i link esterni sono comunque molto interessanti.
,
February 13, 2009
...
Un artículo escrito con claridad y muy completo. Felicidades. Pienso que, sin embargo, los traductores deberíamos ofrecer esta conversión como un servicio añadido. Hay un artículo muy interesante al respecto en la web de unos traductores australianos (a ver si encuentro el enlace). Dado que el volumen de trabajo que implican estas conversiones es elevado, deberíamos poner al cliente en la disyuntiva de recibir el texto sin formateos o de pagar por la conversión. La forma: al recibir el encargo convertir 1 ó 2 páginas y enviárselas al cliente diciéndole que por "x euros" más puedes entregarle la traducción formateada casi igual que el original. Cuando el cliente ve el "x euros más" le falta tiempo para buscar el archivo fuente. Si tiene mucho interés en la conversión y no dispone del archivo original, que pague.
Para mi propia vergüenza, no me he aplicado el cuento y sigo convirtiendo documentos sin cobrar por ello a mis clientes.
,
January 24, 2010
...
Gracias, Michael. Efectivamente, muy a menudo el simple hecho de mencionar un "recargo por conversión" tiene el efecto de que aparezcan de la nada los archivos fuente que generaron el PDF...
,
January 24, 2010
...
For free you can use gDoc Creator to convert pdf files to word. One of the convert to Word options in the software is to retain text flow so that it is easily editable. It may be of use to you and I would be interested in your comments about it. Here's a link to the product page: http://bit.ly/5SFT2h
,
February 04, 2010
...
Thanks a lot for sharing the information, Graeme. Actually, there are dozens of programs that claim to easily convert from PDF to Word. However, the scope of this article is just the opposite. We weren't looking for a "quick and dirty" solution, but for the better way of producing a Word document while keeping in control of the format during the conversion. From our experience, the only way to achieve this is using the advanced features of plain OCR software, not out-of-the-box solutions.
,
February 04, 2010
...
Has anyone tried Infix for searchable PDFs? http://www.iceni.com/infix-Translate.htm
Just wondering...
,
April 19, 2011
Write comment |






Veramente omnicomprensivo e chiaro - grazie! 
