Charles Lindsey | 1 Feb 2010 19:06
Picon
Picon

Extending news to EAI


There is now an experimental protocol for UTF-8 headers in Email (RFC5335
and its relations). This was the product of the IMA WG. There has been
recent discussion of applying this to Netnews, and the conclusion seems to
be that the IMA WG is not the place to do this, and that a private draft
would be the way to do this. However, this list would be a reasonable
place to discuss it.

Essentially, under this protocol, UTF-8 may be freely used in Email
headers, but a downgrading mechanism is needed whenever mail passes to a
server that does not advertise the UTF8SMTP capability.

This is much what the USEFOR WG wanted to do in its earlier days, but the
decision was then taken to postpone it until the base documents were
complete, and then to bring it up again as an Experimental protocol. So
maybe now is the time to embark on it.

It is much easier with Netnews than with Email, since the underlying
transport (whether NNTP or UUCP) is already 8-bit clean. I would not
expect it to become the norm on the Big-8 groups for quite some time, but
it would be very useful for National hierarchies, such as the Scandinavian
ones where the inability to have Newsgroup Names with their own special
characters in them is a right pain (apparently).

So the experimental protocol would start off with the extensions allowed
by RFC5535, and then add UTF-8 in the Newsgroups header. It would be up to
individual hierarchies to encourage deployment of the experiment within
their groups.

It has already been established that the existing transport mechanisms
(Continue reading)

Julien ÉLIE | 3 Feb 2010 23:00
Favicon

Re: Extending news to EAI


Hi Charles,

> So the experimental protocol would start off with the extensions allowed
> by RFC5535, and then add UTF-8 in the Newsgroups header.

I think it will also require to change the possible values for "argument"
here:

   control         =  "Control:" SP *WSP control-command *WSP CRLF
   control-command =  verb *( 1*WSP argument )
   verb            =  token
   argument        =  1*( %x21-7E )

We need to be able to use UTF-8 in "argument".

Could it also be possible to extend dist-list to allow UTF-8 distributions?

   distribution    =  "Distribution:" SP dist-list CRLF
   dist-list       =  *WSP dist-name
                      *( [FWS] "," [FWS] dist-name ) *WSP
   dist-name       =  ALPHA / DIGIT
                      *( ALPHA / DIGIT / "+" / "-" / "_" )

Incidentally, may it exist hostnames in UTF-8 (for the Path: header field)
or are they always canonized in ASCII?

> It has already been established that the existing transport mechanisms
> will move such articles around without problem.

(Continue reading)

Charles Lindsey | 5 Feb 2010 15:54
Picon
Picon

Re: Extending news to EAI


In <D599DB8FD691431D8E6C6E6B75B56400 <at> Iulius> Julien ÉLIE <julien <at> trigofacile.com> writes:

>Hi Charles,

>> So the experimental protocol would start off with the extensions allowed
>> by RFC5535, and then add UTF-8 in the Newsgroups header.

>I think it will also require to change the possible values for "argument"
>here:

>   control         =  "Control:" SP *WSP control-command *WSP CRLF
>   control-command =  verb *( 1*WSP argument )
>   verb            =  token
>   argument        =  1*( %x21-7E )

>We need to be able to use UTF-8 in "argument".

>Could it also be possible to extend dist-list to allow UTF-8 distributions?

>   distribution    =  "Distribution:" SP dist-list CRLF
>   dist-list       =  *WSP dist-name
>                      *( [FWS] "," [FWS] dist-name ) *WSP
>   dist-name       =  ALPHA / DIGIT
>                      *( ALPHA / DIGIT / "+" / "-" / "_" )

Yes, there are various minor things that would need to be extended apart
from the Newgroups header (including the newsgroups line and the
checkgroups message).

(Continue reading)

Julien ÉLIE | 5 Feb 2010 20:32
Favicon

Re: Extending news to EAI


Hi Charles,

> The real difficulty will be in user agents. I can read the danish group in
> Opera, but for some reason it doesn't like yours. Ensuring there is a
> working Path between your server and news.dotsrc.org might be worth a try.

I do not have many feeds (only a dozen) and I do not see the Danish UTF-8 group
on them.  So I doubt a post to dk.test.utf8-æøå (that is to say dk.test.utf8-æøå)
on my news server will go far away...

I don't mind creating their group, though.

It is strange that Opera can read dk.test.utf8-æøå but not trigofacile.test.υτφ8...

I believe the French hierarchy fr.* may be willing to participate in the
experiment.  Creation of newsgroups are asked in fr.usenet.forums.evolution
(see "Noms de groupes en UTF-8" where one person is in favour -- nobody answered
that he was against).

--

-- 
Julien ÉLIE

« Un petit pås pøur møi, un grånd bønd pøur l'humanité ! » (Kerøzen) 

Charles Lindsey | 8 Feb 2010 13:18
Picon
Picon

Re: Extending news to EAI


In <220E6FECA0D44BC48B94BD47C438B522 <at> Iulius> =?UTF-8?Q?Julien_=C3=89LIE?=
<julien <at> trigofacile.com> writes:

>Hi Charles,

>> The real difficulty will be in user agents. I can read the danish group in
>> Opera, but for some reason it doesn't like yours. Ensuring there is a
>> working Path between your server and news.dotsrc.org might be worth a try.

>I do not have many feeds (only a dozen) and I do not see the Danish UTF-8 group
>on them.  So I doubt a post to dk.test.utf8-æøå (that is to say dk.test.utf8-æøå)
>on my news server will go far away...

It is available on news.dotsrc.org, so you either need to peer with them,
or at least subscribe to them.

--

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131            Web: http://www.cs.man.ac.uk/~chl
Email: chl <at> clerew.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5

Julien ÉLIE | 8 Feb 2010 21:17
Favicon

Re: Extending news to EAI


Hi Charles,

> It is available on news.dotsrc.org, so you either need to peer with them,
> or at least subscribe to them.

I have just subscribed to news.dotsrc.org and created the dk.test.utf8-æøå
(C3 A6 C3 B8 C3 A5 in hexadecimal) newsgroup on my news server.
Amusingly, it is dk.test.utf8-ÊÞå on my ISO-8859-15 terminal (0xA6 and 0xB8
are ones of the few characters different between ISO-8859-1 and ISO-8859-15!).

A good news is that the pullnews program works fine with dk.test.utf8-æøå
so I could easily suck existing articles for them to show up on my news server
(news.trigofacile.com).

--

-- 
Julien ÉLIE

« Le travail n'est pas une bonne chose. Si ça l'était,
  les riches l'auraient accaparé. » 


Gmane