A roundtrip with po2csv
Quickstart
1) pofilter --fuzzy --review -t untranslated -i po-dir -o po-filtered-dir (this step is
optional)
1a) divide into sections
2) po2csv -i po-dir|po-filtered-dir -o csv-out
3) edit in Excel
3a) iconv -i windows-1250 -t utf-8 < xx.csv > xx2.csv (needed if the encoding
got messed up.
4) csv2po -i csv-in -o po-in -t templates (you must work against a template
directory)
5) progress - to do basic checks sort out encoding issues
6) pomerge --mergeblank=no -i po-in -o po-dir -t po-dir
# remove fuzzy entries
egrep -v "^#, fuzzy" < po-dir/file.po > po-dir/file.po2
mv po-dir/file.po2 po-dir/file.po
7) cvs diff -u > x.diff - check the changes
8) cvs ci
Introduction
po2csv allows you to send CSV files, which can be edited in any spreadsheet, to
a translator. This document outlines the process to follow from the raw po
files -> CSV files -> back to PO. We also look at a case where you may have
submitted a subset of the PO files for translation and you need to integrate
these.
Creating a subset
This step is optional.
To send a transltor only those messages that are untransalted, fuzzy or need
review run:
pofilter --fuzzy --review -t untranslated -i po-dir -o po-filtered-dir
Divide into sections
You might want to divide the work into sections if you are apportioning it to
different translators. In that case create new directories:
eg. po-filtered-dir-1 po-filtered-dir-2
or po-filtered-dir-bob po-filtered-dir-mary
Copy files from po-filtered-dir to po-filtered-dir-N in a way that balance the
work or apportions the amounts you want for each translator. Try to kep
sections together and not break them up to much eg. Give one translator all
the OpenOffice.org Calc work don't split it between two people - this is just a
simple measure to ensure consitancy.
Now continue as normal and convert to CSV and perform wordcounts for each
seperate directory.
Creating the CSV files
po2csv -i po-dir|po-filtered-dir -o csv-out
This will create a set of CSV files in csv-out which you can compress using zip
(we use zip because most people are Windows users)
Creating a wordcount
Professional translators work on source word counts. So we create a wordcount
to go with the file
pocount ` find po-dir|po-filtered-dir -name "*.po"`
We work on source words regardless of whether the string is fuzzy or not. You
might want to get a lower rate for work on fuzzy strings.
Place the wordcount file in both the PO and CSV directory to avoid the problem
of finding it later. Check the number to make sure you haven't inadvertantly
including something that you didn't want in.
Package the CSV files
zip -r9 work.zip csv-out/
Translating
Translators can use most Spreadsheets. Excell works well. However there are a
few problems with spreadsheets:
1) Encoding - you can sort that out later
2) Strings that start with ' - most spreadsheets treat cells starting with ' as
text and gobble up the '. A work around is to escape those like this \'.
FIXME: confirm whether po2csv does that for msgid strings.
3) Autocorrect - Excell changes ... to a single character and does other odd
things. pofilter will help catch these later.
Converting CSV back to PO
Extract the CSV files here we assume they are in 'csv-in'.
csv2po -i csv-in -o po-in -t templates
This will extract create new PO files in 'po-in' based on the data in the
'csv-in' CSV files merged into the 'templates' template files. You cannot run
the csv2po command without templates.
Note (1): running cs2po using the input PO files as templates give spurious
results. It should probably be made to work but doesn't
Note (2): you might have encoding problems with the returned files. Use iconv
to convert between encodings. Usually Windows user will be using something
like WINDOWS-1250
iconv -f windows-1250 -t utf-8 < xx.csv > xx2.csv
Check the file after conversion to see that characters are in fact correct if
not try another encoding.
Checking the new PO files
We run the progress script against the files as this allows the gettext tools
to pickup encoding and other errors.
Manually edit the files to correct these or use iconv to convert between
charactersets.
Merging PO files into the main PO files
This step would not be necisary if the CSV contained the complete PO file. It
is only needed when the translator has been editing a subset of the whole PO
file.
pomerge --mergeblank=no -i po-in -o po-dir -t po-dir
This will take PO files from po-in merge them with those in po-dir using po-dir
as the template - ie overwriting files in po-dir. It will also ignore entries
that have blank msgstr's ie it will not merge untranslated items. The default
behaviour of pomerge is to take all changes from po-in and apply them to
po-out by overwriding this we can ignore all untranslated items.
There should be an option to override the status of the destination PO files
with that of the input PO. This works with setting fuzzy status but you cannot
remove fuzzy status.
Therefore all your entries that were fuzzy in the destination will still be
fuzzy even thought the input was corrected. If you are confident that all your
input is correct then
egrep -v "^#, fuzzy" < po-dir/file.po > po-dir/file.po2
mv po-dir/file.po2 po-dir/file.po