I recently found myself writing a document containing mostly English and a little bit of Hebrew on Linux in OpenOffice.org Writer (using a prerelease of the upcoming Fedora 11).
I was pleased to see that since the last time I looked, right-to-left (RTL) text entry has improved by leaps and bounds in both Linux and OpenOffice.org. I was able to easily find the necessary control panel to allow me to add a Hebrew keyboard layout and configure how to switch to it, and it even generated a printed keyboard diagram showing all the Hebrew keys. Best of all, entry of Hebrew vowels is now supported and works astoundingly well. I was even able to cut and paste voweled text from a Web page into OpenOffice.org, and it “just worked.” It’s really quite impressive.
However, there was one annoying quirk that I just couldn’t figure out. I came up with a workaround, but I’m wondering if there’s a “right” solution that I was unable to find.
I wanted to typeset a single Hebrew word at the end of an English sentence, then another lone Hebrew word at the beginning of the next English sentence, like this: “… from תנ”ךa. חַנָּה called …” As soon as I started typing the second Hebrew word, Writer decided that both words, with the period and spaces in between them, were part of a continuous span of Hebrew text, and therefore it incorrectly typeset them all RTL, like this: “… from תנ”ך. חַנָּה called …” Doh!
I searched and searched for the correct way to tell Writer, “This is the end of this Hebrew phrase, regardless of what comes after it,” but I couldn’t find anything. The workaround I eventually came up with in Writer (and, for that matter, in HTML — the same workaround is employed above in the first example) is to insert a tiny English letter with a white foreground color (to make it invisible) after the first Hebrew word, to force the editor back into English mode before the period.
I put the invisible letter before the period rather than after it because I was typesetting with full justification and I didn’t want there to be a tiny visible gap if the first Hebrew word ended up being the last word on the line.
Is this a known problem in typesetting circles? Is there a better solution?
I found the codes that Andrew mentioned above documented at http://www.user.uni-hannover.de/nhtcapri/bidirectional-text.html. The ones I wanted were U+202A to start a LTR sequence of characters and U+202C to end it (in HTML, that’s ‪ and ‬).
These characters don’t appear in the “Insert Special Character” dialog in OpenOffice, at least not as far as I could tell. However, you can enter the Unicode codes manually. I believe there’s a way to do this directly under Linux, although I haven’t figured it out for certain, and there are tools to allow you to do this under Windows and then cut and paste the result into OpenOffice or any other application.
The other thing you can do is save your document as HTML, enter the HTML Unicode entities shown above where you want them, and then cut and paste the text from the browser back into OpenOffice. I tested this and it works — OpenOffice obeys the Unicode text direction control characters when they are present.
More than a year and a half ago :-).
I’m curious — when was the last time you looked? I used OO writer to paste vowelled Hebrew and do some additional editing when helping my daughter print her bat mitzva speech, a year and a half ago. The only problems I encountered were ones similar to what you’re experiencing now (and I don’t have a better solution).
The gotcha comes if, like me, the person reading it has an aggregator with a dark background. Although if your ultimate output is print, then no worries.
In Davkawriter you can select a range of text and explicitly assign it a language.
In Unicode I believe there are codes that indicate “text until the next mark is explicitly RTL” (or LTR) to handle this. I don’t know if OpenOffice honors those.
In extremis, I’ve been known to “edit visually” — that is, enter text in the wrong logical order so that it typesets correctly.