Aaron Boodman | 1 Jul 2009 02:01

Re: Extensions i18n Design Doc Draft


On Tue, Jun 30, 2009 at 4:58 PM, Nebojša Ćirić<cira <at> chromium.org> wrote:
>> * This document should propose a specific JavaScript API for
>> programmatically resolving messages.
>
> What do you thing about having a getMsg(key, optional_namespace) function
> for JavaScript API (similar to gadgets api)?
> So for __MSG_greeting__ script would call getMsg("greeting"), and get
> translation for current locale.
> Namespace would be necessary only if we go with multiple catalogs.

sgtm. I think chrome.i18n.getMessage("greeting") would fit better with
our other APIs.

- a

--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: chromium-dev <at> googlegroups.com 
View archives, change email options, or unsubscribe: 
    http://groups.google.com/group/chromium-dev
-~----------~----~----~----~------~----~------~--~---

Jói | 1 Jul 2009 15:16
Picon

Re: Extensions i18n Design Doc Draft


Sorry to jump in a bit late.  I'm really glad to see i18n of
extensions being addressed so early on.  I have a couple of high-level
comments on the format used for the catalogs, based on my experience
writing GRIT.

It's likely that whatever file format is used will end up as a source
file for localization, since most extension authors will just use our
format directly rather than generating it from some more
fully-featured format.  To this end, the format should include at
least the following:

a) A way to add a description of the message for translators (some
kind of attribute that is empty by default); someone already mentioned
this.

b) A way to distinguish between two messages that are textually the
same, but have separate meanings, e.g. "Open" (as a verb) and "Open"
(as an adjective).  An attribute of the message that is empty by
default is ideal.  I would keep this separate from the description
attribute, as this facilitates calculating a message ID as a hash over
the message contents plus the 'meaning' attribute (this is a useful
approach to avoid translating each message more than once, see how it
is used in GRIT).

c) A way to demarcate bits of the message that should not be
translated - generally these are called "placeholders" but that
conflicts with how that term is currently used in the document.  It's
important to do this, otherwise translators are going to receive
messages that contain "code" bits that shouldn't be translated, and
(Continue reading)

Aaron Boodman | 1 Jul 2009 19:56

Re: Extensions i18n Design Doc Draft


2009/7/1 Jói <joi.sigurdsson <at> gmail.com>:
> b) A way to distinguish between two messages that are textually the
> same, but have separate meanings, e.g. "Open" (as a verb) and "Open"
> (as an adjective).  An attribute of the message that is empty by
> default is ideal.  I would keep this separate from the description
> attribute, as this facilitates calculating a message ID as a hash over
> the message contents plus the 'meaning' attribute (this is a useful
> approach to avoid translating each message more than once, see how it
> is used in GRIT).

Good point.

> c) A way to demarcate bits of the message that should not be
> translated - generally these are called "placeholders" but that
> conflicts with how that term is currently used in the document.  It's
> important to do this, otherwise translators are going to receive
> messages that contain "code" bits that shouldn't be translated, and
> which will cause errors in the running program if they are incorrectly
> translated.  Consider for example a message like "Hello $USER$, how
> are you?" and the implications if the translator translates $USER$.
> Ideally, you could use a format such as XML which allows the extension
> author to mark any piece of text as a placeholder, but for a simpler
> approach compatible with more formats, you could require a specific
> format for non-translateables, e.g. $SOMETHING$ and/or printf-style
> format specifiers.

So I think this is what Cira meant by "sprintf" in his original
document. However, I have to admit I'm not crazy about that. It seems
like overkill. I prefer something simpler like $SOMETHING$.
(Continue reading)

Erik Kay | 1 Jul 2009 20:12

Re: Extensions i18n Design Doc Draft

On Wed, Jul 1, 2009 at 10:56 AM, Aaron Boodman <aa <at> chromium.org> wrote:

2009/7/1 Jói <joi.sigurdsson <at> gmail.com>:
> b) A way to distinguish between two messages that are textually the
> same, but have separate meanings, e.g. "Open" (as a verb) and "Open"
> (as an adjective).  An attribute of the message that is empty by
> default is ideal.  I would keep this separate from the description
> attribute, as this facilitates calculating a message ID as a hash over
> the message contents plus the 'meaning' attribute (this is a useful
> approach to avoid translating each message more than once, see how it
> is used in GRIT).

Good point.

> c) A way to demarcate bits of the message that should not be
> translated - generally these are called "placeholders" but that
> conflicts with how that term is currently used in the document.  It's
> important to do this, otherwise translators are going to receive
> messages that contain "code" bits that shouldn't be translated, and
> which will cause errors in the running program if they are incorrectly
> translated.  Consider for example a message like "Hello $USER$, how
> are you?" and the implications if the translator translates $USER$.
> Ideally, you could use a format such as XML which allows the extension
> author to mark any piece of text as a placeholder, but for a simpler
> approach compatible with more formats, you could require a specific
> format for non-translateables, e.g. $SOMETHING$ and/or printf-style
> format specifiers.

So I think this is what Cira meant by "sprintf" in his original
document. However, I have to admit I'm not crazy about that. It seems
like overkill. I prefer something simpler like $SOMETHING$.

> For more ideas on the resource format, you could look at GRIT's .grd
> format or at http://xml.coverpages.org/xliff.html for inspiration.
> Both are probably more complex than what we'd like to have for
> extension message catalogs, and so as long as the format supports the
> things I mentioned above, I believe it should be fine.
>
> Finally, keep in mind that messages may contain embedded line-breaks,
> so it's good to have a format that supports this naturally.

I realized that for the message format, there is one other consideration:

We cannot parse untrusted JSON or XML in the browser, so we will need
to do this in a sandboxed process. We already have a nice mechanism
for doing this with JSON, but we'd have to come up with something new
for XML.

We're already planning to do sandboxed XML for extension autoupdate, so we could depend on that too.

Erik
 


--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: chromium-dev <at> googlegroups.com
View archives, change email options, or unsubscribe:
    http://groups.google.com/group/chromium-dev

-~----------~----~----~----~------~----~------~--~---

Aaron Boodman | 1 Jul 2009 20:34

Re: Extensions i18n Design Doc Draft


On Wed, Jul 1, 2009 at 11:12 AM, Erik Kay<erikkay <at> chromium.org> wrote:
> On Wed, Jul 1, 2009 at 10:56 AM, Aaron Boodman <aa <at> chromium.org> wrote:
>>
>> 2009/7/1 Jói <joi.sigurdsson <at> gmail.com>:
>> > b) A way to distinguish between two messages that are textually the
>> > same, but have separate meanings, e.g. "Open" (as a verb) and "Open"
>> > (as an adjective).  An attribute of the message that is empty by
>> > default is ideal.  I would keep this separate from the description
>> > attribute, as this facilitates calculating a message ID as a hash over
>> > the message contents plus the 'meaning' attribute (this is a useful
>> > approach to avoid translating each message more than once, see how it
>> > is used in GRIT).
>>
>> Good point.
>>
>> > c) A way to demarcate bits of the message that should not be
>> > translated - generally these are called "placeholders" but that
>> > conflicts with how that term is currently used in the document.  It's
>> > important to do this, otherwise translators are going to receive
>> > messages that contain "code" bits that shouldn't be translated, and
>> > which will cause errors in the running program if they are incorrectly
>> > translated.  Consider for example a message like "Hello $USER$, how
>> > are you?" and the implications if the translator translates $USER$.
>> > Ideally, you could use a format such as XML which allows the extension
>> > author to mark any piece of text as a placeholder, but for a simpler
>> > approach compatible with more formats, you could require a specific
>> > format for non-translateables, e.g. $SOMETHING$ and/or printf-style
>> > format specifiers.
>>
>> So I think this is what Cira meant by "sprintf" in his original
>> document. However, I have to admit I'm not crazy about that. It seems
>> like overkill. I prefer something simpler like $SOMETHING$.
>>
>> > For more ideas on the resource format, you could look at GRIT's .grd
>> > format or at http://xml.coverpages.org/xliff.html for inspiration.
>> > Both are probably more complex than what we'd like to have for
>> > extension message catalogs, and so as long as the format supports the
>> > things I mentioned above, I believe it should be fine.
>> >
>> > Finally, keep in mind that messages may contain embedded line-breaks,
>> > so it's good to have a format that supports this naturally.
>>
>> I realized that for the message format, there is one other consideration:
>>
>> We cannot parse untrusted JSON or XML in the browser, so we will need
>> to do this in a sandboxed process. We already have a nice mechanism
>> for doing this with JSON, but we'd have to come up with something new
>> for XML.
>
> We're already planning to do sandboxed XML for extension autoupdate, so we
> could depend on that too.

I don't think it's the same.

For autoupdate, we only need to parse the XML so that we can pick a
few fields out of it.

For i18n, we need to actually use the XML later, at runtime. We don't
want to convert it to some other format because we also need to handle
the --load-extension case, where we won't want to convert to an
intermediate format.

This means we need to sanitize the XML, more like what we do with the
manifest today. So we'd need to serialize it to some intermediate
format and send it back to the browser process. We already have a way
to do this for JSON -- I'm just saying we'd need to do something
similar for XML.

- a

--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: chromium-dev <at> googlegroups.com 
View archives, change email options, or unsubscribe: 
    http://groups.google.com/group/chromium-dev
-~----------~----~----~----~------~----~------~--~---

Erik Kay | 1 Jul 2009 21:37

Re: Extensions i18n Design Doc Draft

On Wed, Jul 1, 2009 at 11:34 AM, Aaron Boodman <aa <at> chromium.org> wrote:
On Wed, Jul 1, 2009 at 11:12 AM, Erik Kay<erikkay <at> chromium.org> wrote:
> On Wed, Jul 1, 2009 at 10:56 AM, Aaron Boodman <aa <at> chromium.org> wrote:
>>
>> 2009/7/1 Jói <joi.sigurdsson <at> gmail.com>:
>> > b) A way to distinguish between two messages that are textually the
>> > same, but have separate meanings, e.g. "Open" (as a verb) and "Open"
>> > (as an adjective).  An attribute of the message that is empty by
>> > default is ideal.  I would keep this separate from the description
>> > attribute, as this facilitates calculating a message ID as a hash over
>> > the message contents plus the 'meaning' attribute (this is a useful
>> > approach to avoid translating each message more than once, see how it
>> > is used in GRIT).
>>
>> Good point.
>>
>> > c) A way to demarcate bits of the message that should not be
>> > translated - generally these are called "placeholders" but that
>> > conflicts with how that term is currently used in the document.  It's
>> > important to do this, otherwise translators are going to receive
>> > messages that contain "code" bits that shouldn't be translated, and
>> > which will cause errors in the running program if they are incorrectly
>> > translated.  Consider for example a message like "Hello $USER$, how
>> > are you?" and the implications if the translator translates $USER$.
>> > Ideally, you could use a format such as XML which allows the extension
>> > author to mark any piece of text as a placeholder, but for a simpler
>> > approach compatible with more formats, you could require a specific
>> > format for non-translateables, e.g. $SOMETHING$ and/or printf-style
>> > format specifiers.
>>
>> So I think this is what Cira meant by "sprintf" in his original
>> document. However, I have to admit I'm not crazy about that. It seems
>> like overkill. I prefer something simpler like $SOMETHING$.
>>
>> > For more ideas on the resource format, you could look at GRIT's .grd
>> > format or at http://xml.coverpages.org/xliff.html for inspiration.
>> > Both are probably more complex than what we'd like to have for
>> > extension message catalogs, and so as long as the format supports the
>> > things I mentioned above, I believe it should be fine.
>> >
>> > Finally, keep in mind that messages may contain embedded line-breaks,
>> > so it's good to have a format that supports this naturally.
>>
>> I realized that for the message format, there is one other consideration:
>>
>> We cannot parse untrusted JSON or XML in the browser, so we will need
>> to do this in a sandboxed process. We already have a nice mechanism
>> for doing this with JSON, but we'd have to come up with something new
>> for XML.
>
> We're already planning to do sandboxed XML for extension autoupdate, so we
> could depend on that too.

I don't think it's the same.

For autoupdate, we only need to parse the XML so that we can pick a
few fields out of it.

For i18n, we need to actually use the XML later, at runtime. We don't
want to convert it to some other format because we also need to handle
the --load-extension case, where we won't want to convert to an
intermediate format.

This means we need to sanitize the XML, more like what we do with the
manifest today. So we'd need to serialize it to some intermediate
format and send it back to the browser process. We already have a way
to do this for JSON -- I'm just saying we'd need to do something
similar for XML.

Fair enough.  I guess all I was saying is that we're going to be doing sandboxed parsing of XML already.  Rewriting it as part of that work doesn't seem like a large addition.

Erik


--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: chromium-dev <at> googlegroups.com
View archives, change email options, or unsubscribe:
    http://groups.google.com/group/chromium-dev

-~----------~----~----~----~------~----~------~--~---

Evan Martin | 1 Jul 2009 16:17

Re: Extensions i18n Design Doc Draft


What are your thoughts on just using GRIT?  We have experience with it
and JavaScript can parse XML just fine.

2009/7/1 Jói <joi.sigurdsson <at> gmail.com>:
>
> Sorry to jump in a bit late.  I'm really glad to see i18n of
> extensions being addressed so early on.  I have a couple of high-level
> comments on the format used for the catalogs, based on my experience
> writing GRIT.
>
> It's likely that whatever file format is used will end up as a source
> file for localization, since most extension authors will just use our
> format directly rather than generating it from some more
> fully-featured format.  To this end, the format should include at
> least the following:
>
> a) A way to add a description of the message for translators (some
> kind of attribute that is empty by default); someone already mentioned
> this.
>
> b) A way to distinguish between two messages that are textually the
> same, but have separate meanings, e.g. "Open" (as a verb) and "Open"
> (as an adjective).  An attribute of the message that is empty by
> default is ideal.  I would keep this separate from the description
> attribute, as this facilitates calculating a message ID as a hash over
> the message contents plus the 'meaning' attribute (this is a useful
> approach to avoid translating each message more than once, see how it
> is used in GRIT).
>
> c) A way to demarcate bits of the message that should not be
> translated - generally these are called "placeholders" but that
> conflicts with how that term is currently used in the document.  It's
> important to do this, otherwise translators are going to receive
> messages that contain "code" bits that shouldn't be translated, and
> which will cause errors in the running program if they are incorrectly
> translated.  Consider for example a message like "Hello $USER$, how
> are you?" and the implications if the translator translates $USER$.
> Ideally, you could use a format such as XML which allows the extension
> author to mark any piece of text as a placeholder, but for a simpler
> approach compatible with more formats, you could require a specific
> format for non-translateables, e.g. $SOMETHING$ and/or printf-style
> format specifiers.
>
> For more ideas on the resource format, you could look at GRIT's .grd
> format or at http://xml.coverpages.org/xliff.html for inspiration.
> Both are probably more complex than what we'd like to have for
> extension message catalogs, and so as long as the format supports the
> things I mentioned above, I believe it should be fine.
>
> Finally, keep in mind that messages may contain embedded line-breaks,
> so it's good to have a format that supports this naturally.
>
> Cheers,
> Jói
>
>
> On Jun 30, 8:01 pm, Aaron Boodman <a... <at> chromium.org> wrote:
>> On Tue, Jun 30, 2009 at 4:58 PM, Nebojša Ćirić<c... <at> chromium.org> wrote:
>> >> * This document should propose a specific JavaScript API for
>> >> programmatically resolving messages.
>>
>> > What do you thing about having a getMsg(key, optional_namespace) function
>> > for JavaScript API (similar to gadgets api)?
>> > So for __MSG_greeting__ script would call getMsg("greeting"), and get
>> > translation for current locale.
>> > Namespace would be necessary only if we go with multiple catalogs.
>>
>> sgtm. I think chrome.i18n.getMessage("greeting") would fit better with
>> our other APIs.
>>
>> - a
> >
>

--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: chromium-dev <at> googlegroups.com 
View archives, change email options, or unsubscribe: 
    http://groups.google.com/group/chromium-dev
-~----------~----~----~----~------~----~------~--~---

Jói | 1 Jul 2009 18:15
Picon

Re: Extensions i18n Design Doc Draft


I think it probably makes sense to use a format simpler than the .grd
format for extension developers, at least most of them.  Mostly so as
not to scare them off i18n by something that might appear "big" at
first.

You could still use GRIT to do translations of extensions, we did this
for Google Desktop gadgets where we had a fairly simple resource
format for gadgets, and then added a parser and generator to GRIT so
that the simpler format could be used as an input to GRIT and GRIT
could output translated versions of the English source file.

Cheers,
Jói

On Jul 1, 10:17 am, Evan Martin <e... <at> chromium.org> wrote:
> What are your thoughts on just using GRIT?  We have experience with it
> and JavaScript can parse XML just fine.
>
> 2009/7/1 Jói <joi.sigurds... <at> gmail.com>:
>
>
>
>
>
> > Sorry to jump in a bit late.  I'm really glad to see i18n of
> > extensions being addressed so early on.  I have a couple of high-level
> > comments on the format used for the catalogs, based on my experience
> > writing GRIT.
>
> > It's likely that whatever file format is used will end up as a source
> > file for localization, since most extension authors will just use our
> > format directly rather than generating it from some more
> > fully-featured format.  To this end, the format should include at
> > least the following:
>
> > a) A way to add a description of the message for translators (some
> > kind of attribute that is empty by default); someone already mentioned
> > this.
>
> > b) A way to distinguish between two messages that are textually the
> > same, but have separate meanings, e.g. "Open" (as a verb) and "Open"
> > (as an adjective).  An attribute of the message that is empty by
> > default is ideal.  I would keep this separate from the description
> > attribute, as this facilitates calculating a message ID as a hash over
> > the message contents plus the 'meaning' attribute (this is a useful
> > approach to avoid translating each message more than once, see how it
> > is used in GRIT).
>
> > c) A way to demarcate bits of the message that should not be
> > translated - generally these are called "placeholders" but that
> > conflicts with how that term is currently used in the document.  It's
> > important to do this, otherwise translators are going to receive
> > messages that contain "code" bits that shouldn't be translated, and
> > which will cause errors in the running program if they are incorrectly
> > translated.  Consider for example a message like "Hello $USER$, how
> > are you?" and the implications if the translator translates $USER$.
> > Ideally, you could use a format such as XML which allows the extension
> > author to mark any piece of text as a placeholder, but for a simpler
> > approach compatible with more formats, you could require a specific
> > format for non-translateables, e.g. $SOMETHING$ and/or printf-style
> > format specifiers.
>
> > For more ideas on the resource format, you could look at GRIT's .grd
> > format or athttp://xml.coverpages.org/xliff.htmlfor inspiration.
> > Both are probably more complex than what we'd like to have for
> > extension message catalogs, and so as long as the format supports the
> > things I mentioned above, I believe it should be fine.
>
> > Finally, keep in mind that messages may contain embedded line-breaks,
> > so it's good to have a format that supports this naturally.
>
> > Cheers,
> > Jói
>
> > On Jun 30, 8:01 pm, Aaron Boodman <a... <at> chromium.org> wrote:
> >> On Tue, Jun 30, 2009 at 4:58 PM, Nebojša Ćirić<c... <at> chromium.org> wrote:
> >> >> * This document should propose a specific JavaScript API for
> >> >> programmatically resolving messages.
>
> >> > What do you thing about having a getMsg(key, optional_namespace) function
> >> > for JavaScript API (similar to gadgets api)?
> >> > So for __MSG_greeting__ script would call getMsg("greeting"), and get
> >> > translation for current locale.
> >> > Namespace would be necessary only if we go with multiple catalogs.
>
> >> sgtm. I think chrome.i18n.getMessage("greeting") would fit better with
> >> our other APIs.
>
> >> - a
--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: chromium-dev <at> googlegroups.com 
View archives, change email options, or unsubscribe: 
    http://groups.google.com/group/chromium-dev
-~----------~----~----~----~------~----~------~--~---

Nebojša Ćirić | 1 Jul 2009 19:10

Re: Extensions i18n Design Doc Draft

I've added new section for "Message format" - not finalized yet.

Also changed format of the message container to be more extensible (separate attributes for comment, message, and type).

Thanks for suggestions,
 Cira

P.S. I am going on a vacation today and won't be able to answer questions/suggestions for next 3 weeks.

2009/7/1 Jói <joi.sigurdsson <at> gmail.com>

I think it probably makes sense to use a format simpler than the .grd
format for extension developers, at least most of them.  Mostly so as
not to scare them off i18n by something that might appear "big" at
first.

You could still use GRIT to do translations of extensions, we did this
for Google Desktop gadgets where we had a fairly simple resource
format for gadgets, and then added a parser and generator to GRIT so
that the simpler format could be used as an input to GRIT and GRIT
could output translated versions of the English source file.

Cheers,
Jói


On Jul 1, 10:17 am, Evan Martin <e... <at> chromium.org> wrote:
> What are your thoughts on just using GRIT?  We have experience with it
> and JavaScript can parse XML just fine.
>
> 2009/7/1 Jói <joi.sigurds... <at> gmail.com>:
>
>
>
>
>
> > Sorry to jump in a bit late.  I'm really glad to see i18n of
> > extensions being addressed so early on.  I have a couple of high-level
> > comments on the format used for the catalogs, based on my experience
> > writing GRIT.
>
> > It's likely that whatever file format is used will end up as a source
> > file for localization, since most extension authors will just use our
> > format directly rather than generating it from some more
> > fully-featured format.  To this end, the format should include at
> > least the following:
>
> > a) A way to add a description of the message for translators (some
> > kind of attribute that is empty by default); someone already mentioned
> > this.
>
> > b) A way to distinguish between two messages that are textually the
> > same, but have separate meanings, e.g. "Open" (as a verb) and "Open"
> > (as an adjective).  An attribute of the message that is empty by
> > default is ideal.  I would keep this separate from the description
> > attribute, as this facilitates calculating a message ID as a hash over
> > the message contents plus the 'meaning' attribute (this is a useful
> > approach to avoid translating each message more than once, see how it
> > is used in GRIT).
>
> > c) A way to demarcate bits of the message that should not be
> > translated - generally these are called "placeholders" but that
> > conflicts with how that term is currently used in the document.  It's
> > important to do this, otherwise translators are going to receive
> > messages that contain "code" bits that shouldn't be translated, and
> > which will cause errors in the running program if they are incorrectly
> > translated.  Consider for example a message like "Hello $USER$, how
> > are you?" and the implications if the translator translates $USER$.
> > Ideally, you could use a format such as XML which allows the extension
> > author to mark any piece of text as a placeholder, but for a simpler
> > approach compatible with more formats, you could require a specific
> > format for non-translateables, e.g. $SOMETHING$ and/or printf-style
> > format specifiers.
>
> > For more ideas on the resource format, you could look at GRIT's .grd
> > format or athttp://xml.coverpages.org/xliff.htmlfor inspiration.
> > Both are probably more complex than what we'd like to have for
> > extension message catalogs, and so as long as the format supports the
> > things I mentioned above, I believe it should be fine.
>
> > Finally, keep in mind that messages may contain embedded line-breaks,
> > so it's good to have a format that supports this naturally.
>
> > Cheers,
> > Jói
>
> > On Jun 30, 8:01 pm, Aaron Boodman <a... <at> chromium.org> wrote:
> >> On Tue, Jun 30, 2009 at 4:58 PM, Nebojša Ćirić<c... <at> chromium.org> wrote:
> >> >> * This document should propose a specific JavaScript API for
> >> >> programmatically resolving messages.
>
> >> > What do you thing about having a getMsg(key, optional_namespace) function
> >> > for JavaScript API (similar to gadgets api)?
> >> > So for __MSG_greeting__ script would call getMsg("greeting"), and get
> >> > translation for current locale.
> >> > Namespace would be necessary only if we go with multiple catalogs.
>
> >> sgtm. I think chrome.i18n.getMessage("greeting") would fit better with
> >> our other APIs.
>
> >> - a



--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: chromium-dev <at> googlegroups.com
View archives, change email options, or unsubscribe:
    http://groups.google.com/group/chromium-dev

-~----------~----~----~----~------~----~------~--~---


Gmane