fantasai | 11 Jun 2005 18:27

Re: Arabic letters separated by markup

Andreas Prilop wrote on the Unicode mailing list[1]:
> Does the Unicode standard only deal with plain text or
> does it also deal with text in markup languages like SGML/HTML?
> 
> I wonder whether Arabic letters should join when they are
> separated by markup. Here's an example:
> 
>  http://www.unics.uni-hannover.de/nhtcapri/temp/nastaliq.html
> 
> Current programs display the letters separated by markup
> differently: Internet Explorer 6 and StarOffice 7 join the
> letters, but Mozilla 1.7 does not.
> 
> Is it left to the rules of SGML/HTML to decide or
> has the Unicode standard any opinion about this?

In semantic markup languages like HTML, it's really the domain of the
formatting system used to process the markup, not the markup system
itself. [1] So, for web pages, this behavior would be governed by the
Unicode and CSS specs. I haven't read the Unicode book cover to cover,
but since there's an argument here, I'm guessing it's not covered by
Unicode quite yet. :)

Like many other people here, I think that the goal should be to make
the text as readable as possible, even if it means ignoring some of
the styling.

Therefore, these are the rules I suggest:

  For characters within the same inline sequence.
(Continue reading)

Erik van der Poel | 13 Jun 2005 19:08

Re: Arabic letters separated by markup


[I'm not on the www-style list.]

fantasai wrote:
>  For characters within the same inline sequence.
> 
>   1. Shaping and joining behavior MUST NOT be affected by element
>      boundaries.

If the CSS "display" property is set to "none" for a particular element, 
then perhaps the characters in adjacent displayable elements should not 
be joined to the characters in the "display: none" element.

(Maybe you already thought of this, and that is what is meant by "same 
inline sequence"?)

>   4. Obligatory ligatures MUST NOT be broken if the formatting rules
>      introduce no extra space between the affected characters, even
>      if this means some of the characters are rendered in the wrong
>      font or as part of the wrong visual element.

Perhaps the spec could say that an implementation MAY honor such things 
as a color change (which may not be possible in current font 
technologies such as OpenType?) or MAY instead use the isolated forms of 
the individual characters. I don't know whether the obligatory ligature 
rules should trump the style rules.

>   5. Combining characters MUST be rendered as the combined grapheme
>      cluster if the system is capable of rendering the combination,
>      even if this means some of the characters are rendered in the
(Continue reading)

fantasai | 14 Jun 2005 00:14

Re: Arabic letters separated by markup


Erik van der Poel wrote:
> [I'm not on the www-style list.]
> 
> fantasai wrote:
> 
>>  For characters within the same inline sequence.
>>
>>   1. Shaping and joining behavior MUST NOT be affected by element
>>      boundaries.
> 
> If the CSS "display" property is set to "none" for a particular element, 
> then perhaps the characters in adjacent displayable elements should not 
> be joined to the characters in the "display: none" element.
> 
> (Maybe you already thought of this, and that is what is meant by "same 
> inline sequence"?)

No, I hadn't thought of that. But if an element is display: none, then
for all rendering purposes it is to be treated as if it wasn't there.

>>   4. Obligatory ligatures MUST NOT be broken if the formatting rules
>>      introduce no extra space between the affected characters, even
>>      if this means some of the characters are rendered in the wrong
>>      font or as part of the wrong visual element.
> 
> Perhaps the spec could say that an implementation MAY honor such things 
> as a color change (which may not be possible in current font 
> technologies such as OpenType?)

(Continue reading)

Steve Zilles | 16 Jun 2005 19:29
Picon
Favicon

Re: Arabic letters separated by markup

At 03:14 PM 6/13/2005, fantasai wrote:

Erik van der Poel wrote:
[I'm not on the www-style list.]
fantasai wrote:

 For characters within the same inline sequence.

  1. Shaping and joining behavior MUST NOT be affected by element
     boundaries.
If the CSS "display" property is set to "none" for a particular element, then perhaps the characters in adjacent displayable elements should not be joined to the characters in the "display: none" element.
(Maybe you already thought of this, and that is what is meant by "same inline sequence"?)

No, I hadn't thought of that. But if an element is display: none, then
for all rendering purposes it is to be treated as if it wasn't there.

  4. Obligatory ligatures MUST NOT be broken if the formatting rules
     introduce no extra space between the affected characters, even
     if this means some of the characters are rendered in the wrong
     font or as part of the wrong visual element.
Perhaps the spec could say that an implementation MAY honor such things as a color change (which may not be possible in current font technologies such as OpenType?)

Of course if the system is somehow capable of honoring both the style
rules and the ligature formation, it should be allowed to do so. :)

or MAY instead use the isolated forms of the individual characters. I don't know whether the obligatory ligature rules should trump the style rules.

Yeah, I'm not too set on this one. But I don't know how critical it is
for the affected scripts. If the font isn't changing at all, though, then
the spec should require that the ligature be formed across element
boundaries. I suspect it might be simpler just to make the exception apply
even in cases where the font changes.

For what it is worth the following text comes from the XSL 1.0 REC concerning when a ligature substitution is to be done. From section 4.7.2 Line Building:

...substitutions may occur because of addition of hyphens or spelling changes due to hyphenation, or glyph image construction from syllabification, or ligature formation.

Substitutions that replace a sequence of glyph-areas with a single glyph-area should only occur when the margin, border, and padding in the inline-progression-direction (start- and end-), baseline-shift, and letter-spacing values are zero, treat-as-word-space is false, and the values of all other relevant traits match (i.e., alignment-adjust, alignment-baseline, color trait, background traits, dominant-baseline-identifier, font traits, text-depth, text-altitude, glyph-orientation-horizontal, glyph-orientation-vertical, line-height, lineheight-shift-adjustment, text-decoration, text-shadow).

This indicates a bias to honoring the author's/user's styling choices over ligature formation. I am not sure how well these paragraphs have been tested in practice.


  5. Combining characters MUST be rendered as the combined grapheme
     cluster if the system is capable of rendering the combination,
     even if this means some of the characters are rendered in the
     wrong font or as part of the wrong visual element. The combined
     grapheme cluster SHOULD be rendered as part of the base
     character's element, or, in the case of combining jamos, the
     initial character's element.
Here again, shouldn't the style rules trump the Unicode rules? Otherwise, why should we even allow tags to be inserted between such characters?

In this case, I think it's more important for the grapheme cluster to
be rendered as one unit. An 'a' with an acute accent should have its
acute accent on top, and a Hangul syllable expressed as individual
pieces should be presented as its proper syllable block. Breaking
ligatures like alef-lam looks weird, but it wouldn't be as bad as
breaking such combinations: alef and lam appear individually quite
frequently, but combining vowels and diacritics don't.

~fantasai

         Steve
=====================================
Steve Zilles
115 Lansberry Court,
Los Gatos, CA 95032-4710
steve <at> zilles.org
Erik van der Poel | 13 Jun 2005 22:06

Re: Arabic letters separated by markup


>>   4. Obligatory ligatures MUST NOT be broken if the formatting rules
>>      introduce no extra space between the affected characters, even
>>      if this means some of the characters are rendered in the wrong
>>      font or as part of the wrong visual element.
> 
> Perhaps the spec could say that an implementation MAY honor such things 
> as a color change (which may not be possible in current font 
> technologies such as OpenType?)

It should be possible to implement multi-color obligatory ligatures by 
creating 2 or more glyphs for each ligature, possibly with kerning. I 
haven't checked to see whether any APIs can kern across runs or change 
colors within a run, but that's a separate issue.

>>   5. Combining characters MUST be rendered as the combined grapheme
>>      cluster if the system is capable of rendering the combination,
>>      even if this means some of the characters are rendered in the
>>      wrong font or as part of the wrong visual element. The combined
>>      grapheme cluster SHOULD be rendered as part of the base
>>      character's element, or, in the case of combining jamos, the
>>      initial character's element.
> 
> Here again, shouldn't the style rules trump the Unicode rules? 
> Otherwise, why should we even allow tags to be inserted between such 
> characters?

Perhaps tags would be inserted between such characters for reasons other 
than style. I.e. some other semantic. So if there is no style change 
across the tag(s), the characters should be combined and presented in 
the usual way.

If there is a style change across the tag(s) but the implementation 
cannot honor it, it's hard to say whether the author considers that 
style change (e.g. color) to be more important than the normal 
presentation of the character sequence.

We are talking about rather strange cases here, so the implementors 
might not get around to implementing them soon even if the specs were 
embellished.

Erik

Mark Davis | 13 Jun 2005 23:44

Re: Arabic letters separated by markup


It is always possible to replace a ligature by a sequence of shapes which,
when put together, form the ligature shape. These then can each have the
style of the corresponding original character. However, this does depend on
the rendering system placing the glyphs exactly adjacent to one another so
that the result is reasonable. (In earlier lives I remember that we created
Arabic shaping forms to slightly overlap, so that if there were variances in
positioning it would not leave a gap.)

However, whether it is worthwhile for the average font designer to go to the
(perhaps considerable) extra trouble to do this is questionable, so the
rendering system always needs to be prepared to deal with real ligatures;
and applying a uniform style to them based on some combination of the styles
of the component characters is a perfectly reasonable approach.

‎Mark

----- Original Message ----- 
From: "Erik van der Poel" <erik <at> vanderpoel.org>
To: "fantasai" <fantasai.lists <at> inkedblade.net>
Cc: "Unicode Mailing List" <unicode <at> unicode.org>; <www-international <at> w3.org>
Sent: Monday, June 13, 2005 13:06
Subject: Re: Arabic letters separated by markup

> >>   4. Obligatory ligatures MUST NOT be broken if the formatting rules
> >>      introduce no extra space between the affected characters, even
> >>      if this means some of the characters are rendered in the wrong
> >>      font or as part of the wrong visual element.
> >
> > Perhaps the spec could say that an implementation MAY honor such things
> > as a color change (which may not be possible in current font
> > technologies such as OpenType?)
>
> It should be possible to implement multi-color obligatory ligatures by
> creating 2 or more glyphs for each ligature, possibly with kerning. I
> haven't checked to see whether any APIs can kern across runs or change
> colors within a run, but that's a separate issue.
>
> >>   5. Combining characters MUST be rendered as the combined grapheme
> >>      cluster if the system is capable of rendering the combination,
> >>      even if this means some of the characters are rendered in the
> >>      wrong font or as part of the wrong visual element. The combined
> >>      grapheme cluster SHOULD be rendered as part of the base
> >>      character's element, or, in the case of combining jamos, the
> >>      initial character's element.
> >
> > Here again, shouldn't the style rules trump the Unicode rules?
> > Otherwise, why should we even allow tags to be inserted between such
> > characters?
>
> Perhaps tags would be inserted between such characters for reasons other
> than style. I.e. some other semantic. So if there is no style change
> across the tag(s), the characters should be combined and presented in
> the usual way.
>
> If there is a style change across the tag(s) but the implementation
> cannot honor it, it's hard to say whether the author considers that
> style change (e.g. color) to be more important than the normal
> presentation of the character sequence.
>
> We are talking about rather strange cases here, so the implementors
> might not get around to implementing them soon even if the specs were
> embellished.
>
> Erik
>
>
>

John Hudson | 14 Jun 2005 00:38
Favicon
Gravatar

Re: Arabic letters separated by markup

Mark Davis wrote:

> It is always possible to replace a ligature by a sequence of shapes which,
> when put together, form the ligature shape.

Yes, although I can think of some traditional Greek ligatures that would be hard to break 
into shape sequences that obviously correspond to the underlying characters. Knowing which 
parts of a Latin fi ligature correspond to which underlying characters, and hence which 
might be rendered as separate and independently coloured shaped, is not difficult. 
Something like the attached graphic is trickier.

John Hudson

--

-- 

Tiro Typeworks        www.tiro.com
Vancouver, BC        tiro <at> tiro.com

Currently reading:
Truth and tolerance, by Benedict XVI, Cardinal Ratzinger as was
War (revised edition), by Gwynne Dyer

Gmane