Babel: current work

This page is old. The current site for babel, the multilingual framework to localize LaTeX, LuaLaTeX and XeLaTeX, is here.

Updated: 2015-12-11

Babel currently provides translations for some strings and dates. Useful additions are, for example, time, currency, addresses and personal names. I’m currently working on a new way to define languages in a descriptive way, more than in a programmatic one.

The idea is to create a set of ini file like those in the LaTeX repository (the keys are tentative). Why the ini syntax? — because it’s easy to create, read, edit, parse and process.

The main source are the ldf files as well as the CLDR. However, the latter in intended for displaying plain text, while TeX is about fine typesetting, which is making things a bit harder than expected.

Some additional decisions must be taken in this regard – for example, several languages have several names for a single caption, depending on the class; there should be also keys for the treatement of labels or the order of captions and their corresponding numbers (not all languages place the number after the caption, as LaTeX assumes). And are \alph labels just the exemplarCharacters in CLDR? (certainly not, at least in general).

More interesting are changes in the sentence structure or related to it. For example, in Basque the number precedes the name (including chapters), in Hungarian “from (1)” is “(1)-ből”, but “from (3)” is “(3)-ból”, in Spanish an item labelled “3.^o” may be referred to as “3.^er ítem”, and so on.

Even more interesting is right-to-left, vertical and bidi typesetting. Babel provided a basic support for bidi text as part of the style for Hebrew, but it is somewhat unsatisfactory and internally replaces some hardwired commands by another hardwired commands (generic marks would be much better).

2016-01-23

I don’t think the mechanism for loading hyphenation patterns in luatex is satisfactory. Information is loaded three times (from language.dat and language.dat.lua when the format is built, and again language.dat.lua when typesetting the document), which can lead to inconsistent data. I’m currently working on a revanped loader based solely on language.dat, read at run time.