Wm... | 1 Jun 2006 17:26
Picon
Picon

Re: Bug found - Unicode equivalence

Thu, 1 Jun 2006 14:55:11 
<1149170111.10989.5.camel <at> ptpc3lin.op.ph.ic.ac.uk>  Edward Grace 
<ej.grace <at> imperial.ac.uk>

>I have found a bug that I think should be fixed.  It concerns annoying
>ambiguities between the way Linux treats two ways of representing an o
>with umlaut against just one way for Mac OS X.  As with the case
>insensitivity this leads to clashes.

[snip]

>2) Run it on a Linux system.  This will generate a directory Two_Files, 
>containing two files.
>
>3) Do a Unison sync between that machine and a Mac OS X system.
>
>4) This will trigger the bug, since the two files resolve to the same 
>name on OS X.
>
>I hope this helps people track and squash this particularly obscure beastie!

Is the problem not with OSX rather than Unison?

If you sync from OSX to Linux you will get one file.

If you have two files on Linux and sync to OSX and OSX can't handle that 
surely Unison isn't the problem?

Or have I misunderstood?  What are you expecting Unison to do?

(Continue reading)

Edward Grace | 1 Jun 2006 19:12
Picon

Re: Bug found - Unicode equivalence

> If you have two files on Linux and sync to OSX and OSX can't handle  
> that
> surely Unison isn't the problem?

Hi.  As Benjamin points out in another post this is a more complex  
superset of the problem with case sensitive / case insensitive file  
systems.  The "blame" is not really an issue of OS X vs Linux, the  
two systems simply have different assumptions and conventions  
regarding their files.  Quite frankly I don't think either is  
inherently right or wrong, they are just different.

> Or have I misunderstood?  What are you expecting Unison to do?

I think you have misunderstood.

As eluded to by Benjamin, a complete machine independent Unicode  
layer between the file system and unison is likely to be "involved".   
I think, on balance, that it is right for Unison to rely on the  
system libraries, however I think the default behaviour for a  
filename clash is incorrect.

I expect unison to accept that there will be the possibility of  
filename clashes, or problem filenames and to act gracefully.  If  
this is the default, then managing case insensitivity should come for  
free!

If I understand correctly the process of failure is as follows:

1) The source list of files to transfer from Linux -> Mac OSX goes  
over.  There are two distinct file names in this byte stream.  They  
(Continue reading)

Wm... | 3 Jun 2006 23:35
Picon
Picon

Re: Bug found - Unicode equivalence

Thu, 1 Jun 2006 18:12:44 
<2FB423A2-7DF7-48C8-B4F5-A3A6BD69D030 <at> imperial.ac.uk>  Edward Grace 
<ej.grace <at> imperial.ac.uk>

[explanation snipped, other posts read]

>Does that make sense?

Yes, thank you.

-- 
Wm ...
Please reply to the list
e-mail replies will be replied to in the list unless marked as private

------------------------ Yahoo! Groups Sponsor --------------------~--> 
You can search right from your browser? It's easy and it's free.  See how.
http://us.click.yahoo.com/_7bhrC/NGxNAA/yQLSAA/26EolB/TM
--------------------------------------------------------------------~-> 

 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/unison-users/

<*> To unsubscribe from this group, send an email to:
    unison-users-unsubscribe <at> yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
(Continue reading)

Fred Frigerio | 1 Jun 2006 20:43

RE: Bug found - Unicode equivalence

I would expect Unison to fail gracefully?

Fred Frigerio
Locust USA

This electronic message transmission contains information from Locust USA which may be confidential or
privileged.  The information is intended to be for the use of the individual or entity named above.  If you
are not the intended recipient, be aware that any disclosure, copying, distribution or use of the
contents of this information is prohibited.  If you have received this electronic transmission in error,
please notify us by telephone (305-889-5410) or by reply via electronic mail immediately.

> -----Original Message-----
> From: Wm... [mailto:wm-unison <at> tarrcity.demon.co.uk] 
> Sent: Thursday, June 01, 2006 11:26 AM
> To: unison-users <at> yahoogroups.com
> Subject: Re: [unison-users] Bug found - Unicode equivalence
> 
> Thu, 1 Jun 2006 14:55:11
> <1149170111.10989.5.camel <at> ptpc3lin.op.ph.ic.ac.uk>  Edward 
> Grace <ej.grace <at> imperial.ac.uk>
> 
> >I have found a bug that I think should be fixed.  It 
> concerns annoying 
> >ambiguities between the way Linux treats two ways of 
> representing an o 
> >with umlaut against just one way for Mac OS X.  As with the case 
> >insensitivity this leads to clashes.
> 
> [snip]
> 
(Continue reading)

Benjamin Pierce | 1 Jun 2006 17:59
Favicon

Re: Bug found - Unicode equivalence

I agree with Edward that this should be viewed as a bug in Unison:  
the situation is indeed just the same as with syncing case-sensitive  
filesystems with "case preserving but insensitive" ones.  Unison goes  
to considerable trouble to handle the latter situation correctly, but  
(as Edward discovered) not the former.

Unfortunately, I'm less sanguine about the ease of fixing things.   
There are lots of ways in which Unison doesn't deal well with Unicode  
and other character encoding issues -- basically, Unison itself just  
ignores all such issues and takes whatever it gets from the lower- 
level OCaml / Posix filesystem libraries, string libraries, etc.   
Doing all of this right would be very valuable, but at the moment no  
one is signed up to do it.  (Volunteers welcome, of course! :-)

Regards,

      - Benjamin

On Jun 1, 2006, at 11:26 AM, Wm... wrote:

> Thu, 1 Jun 2006 14:55:11
> <1149170111.10989.5.camel <at> ptpc3lin.op.ph.ic.ac.uk>  Edward Grace
> <ej.grace <at> imperial.ac.uk>
>
>> I have found a bug that I think should be fixed.  It concerns  
>> annoying
>> ambiguities between the way Linux treats two ways of representing  
>> an o
>> with umlaut against just one way for Mac OS X.  As with the case
>> insensitivity this leads to clashes.
(Continue reading)

Trevor Jim | 2 Jun 2006 23:37
Picon

Re: Bug found - Unicode equivalence

On Jun 1, 2006, at 11:59 AM, Benjamin Pierce wrote:

> Unfortunately, I'm less sanguine about the ease of fixing things.
> There are lots of ways in which Unison doesn't deal well with Unicode
> and other character encoding issues -- basically, Unison itself just
> ignores all such issues and takes whatever it gets from the lower-
> level OCaml / Posix filesystem libraries, string libraries, etc.

To be fair, this is a very hard problem.  Consider

     http://developer.apple.com/technotes/tn/ 
tn1150.html#UnicodeSubtleties

which says

     An implementation must not use the Unicode utilities implemented  
by its
     native platform (for decomposition and comparison), unless those
     algorithms are equivalent to the HFS Plus algorithms defined  
here, and
     are guaranteed to be so forever. This is rarely the case. Platform
     algorithms tend to evolve with the Unicode standard. The HFS Plus
     algorithms cannot evolve because such evolution would invalidate
     existing HFS Plus volumes.

In other words, Unicode is evolving and Apple has implemented  
something that
might once have matched up with some version of Unicode (or not).

To truly solve the problem means understanding the particular
(Continue reading)

Kai Steinbach | 2 Jun 2006 11:54
Picon
Gravatar

Re: Bug found - Unicode equivalence

Hi Benjamin,

this topic has popped up in the mailing list a few times before, so I
thought it's worth putting it into a little FAQ page:
http://alliance.seas.upenn.edu/~bcpierce/unison/wiki/index.php?n=Main.UnisonFAQCharacterEncoding

I left some room for others to contribute with the experience of their
environments, as well as ways to make it easy to work around this
limitation. Are there any python / perl scripts to automate renaming a
whole tree out there? I might polish and publish my own
rename_to_ascii.vbs some day ... though this would only help the
windows guys, sorry. :(

Final note: I have my fingers crossed that one day someone can spend
the time on the OCaml / Posix basis and Unison and make it all just do
The Right Thing. That would be great ;-)

Regards,
Kai

On 6/1/06, Benjamin Pierce <bcpierce <at> cis.upenn.edu> wrote:
[snip]
> Unfortunately, I'm less sanguine about the ease of fixing things.
> There are lots of ways in which Unison doesn't deal well with Unicode
> and other character encoding issues -- basically, Unison itself just
> ignores all such issues and takes whatever it gets from the lower-
> level OCaml / Posix filesystem libraries, string libraries, etc.
> Doing all of this right would be very valuable, but at the moment no
> one is signed up to do it.  (Volunteers welcome, of course! :-)

(Continue reading)


Gmane