Thomas Manson | 7 Oct 19:23

Re: CVS migration help

Hi Brian,
 
on my new system :
 
LANG=en_US.UTF-8
 
thomas <at> home:~/temp/cvsrepo/crf-irp/Ressources/documentation$ ll
total 32
drwxr-xr-x 2 thomas thomas  4096 2008-10-06 18:07 .
drwxr-xr-x 9 thomas thomas  4096 2008-10-06 18:07 ..
-r--r--r-- 1 thomas thomas 23274 2008-01-20 00:56 Sp?cifications.doc,v
thomas <at> home:~/temp/cvsrepo/crf-irp/Ressources/documentation$ ls -N | hexdump -C
00000000  53 70 e9 63 69 66 69 63  61 74 69 6f 6e 73 2e 64  |Sp.cifications.d|
00000010  6f 63 2c 76 0a                                    |oc,v.|
00000015

On my old system, from which the files came from  :
 
 
[root <at> home documentation]# ll
total 24
-r--r--r--  1 paquerette dev 23274 jan 20  2008 Spécifications.doc,v
[root <at> home documentation]# ls -N | hexdump -C
00000000  53 70 e9 63 69 66 69 63  61 74 69 6f 6e 73 2e 64  |Spécifications.d|
00000010  6f 63 2c 76 0a                                    |oc,v.|
00000015
 
>Have you checked that the names in the .dat files actually are encoded in UTF-8?
 
How would  I do that ?
 
 
On my old system, python is too old
 
[root <at> home documentation]# yum info python
==============================================================
WARNING: Additional commands may be required after running yum
==============================================================
Loading "smeserver" plugin
Loading "installonlyn" plugin
Loading "fastestmirror" plugin
Setting up repositories
Loading mirror speeds from cached hostfile
Reading repository metadata in from local files
Installed Packages
Name   : python
Arch   : i386
Version: 2.3.4
Release: 14.4.el4_6.1
Size   : 20 M
Repo   : installed
Summary: An interpreted, interactive, object-oriented programming language.

I'll try to build python from sources and then get bzr...
 
 
On Tue, Oct 7, 2008 at 18:29, Brian de Alwis <bsd <at> cs.ubc.ca> wrote:
Hi Thomas.

On 7-Oct-2008, at 9:03 AM, Thomas Manson wrote:
 unfortunately it crashes in the same way that bzr cvsps-import does :
 
 
thomas <at> home:~/temp/bzr$ cat ../cvs2svn-tmp/git-blob.dat  ../cvs2svn-tmp/git-dump.dat |  bzr fast-import -
bzr: ERROR: exceptions.UnicodeDecodeError: 'utf8' codec can't decode bytes in position 43-45: invalid data

So that indicates bzr thinks that the filenames in the dumpfile are in UTF-8.  Have you checked that the names in the .dat files actually are encoded in UTF-8?  Maybe cvs2svn's dumps aren't re-encoding the filenames?   If not, try fiddling with your LANG/LC_* env vars to match whatever encoding is in use in the file-system  You might get some traction by ensuring your LANG=fr_FR.ISO8859-1, or whatever works on your system.

It might be worth doing an `ls -N | hexdump -C' or something similar to ensure that the filenames are encoded in latin1.

[I tried using cvs2svn to create a dumpfile from a toy project with accents, but gave up after 10 minutes.  I personally used tailor to convert my projects, but none of the files involved accents.]

Brian.

-- 
"Amusement to an observing mind is study." - Benjamin Disraeli


John Arbash Meinel | 7 Oct 19:40
Favicon

Re: CVS migration help


Thomas Manson wrote:
> Hi Brian,
>  
> on my new system :
>  
> LANG=en_US.UTF-8
>  
> thomas <at> home:~/temp/cvsrepo/crf-irp/Ressources/documentation$
> <mailto:thomas <at> home:~/temp/cvsrepo/crf-irp/Ressources/documentation$> ll
> total 32
> drwxr-xr-x 2 thomas thomas  4096 2008-10-06 18:07 .
> drwxr-xr-x 9 thomas thomas  4096 2008-10-06 18:07 ..
> -r--r--r-- 1 thomas thomas 23274 2008-01-20 00:56 Sp?cifications.doc,v
> thomas <at> home:~/temp/cvsrepo/crf-irp/Ressources/documentation$
> <mailto:thomas <at> home:~/temp/cvsrepo/crf-irp/Ressources/documentation$> ls
> -N | hexdump -C
> 00000000  53 70 e9 63 69 66 69 63  61 74 69 6f 6e 73 2e 64 
> |Sp.cifications.d|
> 00000010  6f 63 2c 76 0a                                    |oc,v.|
> 00000015
> 
> On my old system, from which the files came from  :
>  
> LANG=fr_FR <at> euro <mailto:LANG=fr_FR <at> euro>
>  

^- The fact that it is a single character means that it *is not* in
UTF-8, it would take 2 characters to encode é.

Now:

>>> print '\xe9'.decode('latin1')
é

>>> '\xe9'.decode('latin1').encode('utf-8')
'\xc3\xa9'

Anyway, *most* current filesystems would assume that paths are in UTF-8
(Linux doesn't actually specify, everything is just a NULL terminated
string), which causes problems because we have to "guess" what things
really are.

In this case, your filename is probably in Latin-1 encoding.

This is partially why cvsps-import doesn't support it, because we don't
really know what encoding to use for filenames. (Mostly because nobody
had non-ascii filenames and wanted us to make it work.)

For example, code like this *could* do what you want:

=== modified file 'cvsps/parser.py'
--- cvsps/parser.py     2007-02-08 22:33:44 +0000
+++ cvsps/parser.py     2008-10-07 17:39:30 +0000
@@ -174,6 +174,7 @@
         if ':' not in line:
             return
         fname, version = line[1:].rsplit(':', 1)
+        fname = fname.decode(self._encoding)
         fname = self._cache(fname)
         versions = version.split('->')
         assert len(versions) == 2

It just uses the same encoding for filenames that we use for the log
content and the committer names.

John
=:->
Thomas Manson | 7 Oct 20:00

Re: CVS migration help

I've applyed your patch and run again :
 
paquerette <at> home:~/temp$ bzr cvsps-import cvs/files/ . bazaar --use-cvs
Creating cvsps dump file: bazaar/staging/ROOT.dump
Read 120 patchsets (string cache hits: 0, total: 16950)
Failed while processing: Patchset(2, HEAD, paquerette, 2006/10/22 22:08:38)
Processed 1 patches (0 new, 1 existing) on 0 branches (1 tags) in 1.1s (0.00 patch/s)
bzr: ERROR: Could not find the cvs versioned file for crf-irp-monitor/Ressources/documentation/Spécifications.doc. Looking for it at /home/paquerette/temp/cvs/files/crf-irp-monitor/Ressources/documentation/Spécifications.doc,v and /home/paquerette/temp/cvs/files/crf-irp-monitor/Ressources/documentation/Attic/Spécifications.doc,v.
I think the issue is that the change of encoding between my new server and my old server.
 
I've compiled python 2.6, installed cElementTree-1.0.5-20051216.tar.gz
then installed bzr 1.7.1 from sources
 
but I got another issue :
 
[root <at> home temp]# bzr whoami 'Paquerette <dev.mansonthomas <at> gmail.com>'
/usr/local/lib/python2.6/site-packages/bzrlib/lazy_import.py:195: DeprecationWarning: the sha module is deprecated; use the hashlib module instead
  module = __import__(module_python_path, scope, scope, [])
Unable to load plugin 'launchpad' from '/usr/local/lib/python2.6/site-packages/bzrlib/plugins'
I think I'm going to open the window and jump :op
 


 
On Tue, Oct 7, 2008 at 19:40, John Arbash Meinel <john <at> arbash-meinel.com> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Thomas Manson wrote:
> Hi Brian,
>
> on my new system :
>
> LANG=en_US.UTF-8
>
> thomas <at> home:~/temp/cvsrepo/crf-irp/Ressources/documentation$
> <mailto:thomas <at> home:~/temp/cvsrepo/crf-irp/Ressources/documentation$> ll
> total 32
> drwxr-xr-x 2 thomas thomas  4096 2008-10-06 18:07 .
> drwxr-xr-x 9 thomas thomas  4096 2008-10-06 18:07 ..
> -r--r--r-- 1 thomas thomas 23274 2008-01-20 00:56 Sp?cifications.doc,v
> thomas <at> home:~/temp/cvsrepo/crf-irp/Ressources/documentation$
> <mailto:thomas <at> home:~/temp/cvsrepo/crf-irp/Ressources/documentation$> ls
> -N | hexdump -C
> 00000000  53 70 e9 63 69 66 69 63  61 74 69 6f 6e 73 2e 64
> |Sp.cifications.d|
> 00000010  6f 63 2c 76 0a                                    |oc,v.|
> 00000015
>
> On my old system, from which the files came from  :
>
> LANG=fr_FR <at> euro <mailto:LANG=fr_FR <at> euro>
>

^- The fact that it is a single character means that it *is not* in
UTF-8, it would take 2 characters to encode é.

Now:

>>> print '\xe9'.decode('latin1')
é

>>> '\xe9'.decode('latin1').encode('utf-8')
'\xc3\xa9'


Anyway, *most* current filesystems would assume that paths are in UTF-8
(Linux doesn't actually specify, everything is just a NULL terminated
string), which causes problems because we have to "guess" what things
really are.

In this case, your filename is probably in Latin-1 encoding.

This is partially why cvsps-import doesn't support it, because we don't
really know what encoding to use for filenames. (Mostly because nobody
had non-ascii filenames and wanted us to make it work.)

For example, code like this *could* do what you want:

=== modified file 'cvsps/parser.py'
- --- cvsps/parser.py     2007-02-08 22:33:44 +0000
+++ cvsps/parser.py     2008-10-07 17:39:30 +0000
<at> <at> -174,6 +174,7 <at> <at>
        if ':' not in line:
            return
        fname, version = line[1:].rsplit(':', 1)
+        fname = fname.decode(self._encoding)
        fname = self._cache(fname)
        versions = version.split('->')
        assert len(versions) == 2

It just uses the same encoding for filenames that we use for the log
content and the committer names.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjrnxcACgkQJdeBCYSNAAPWhwCgy/4VbBRxWIcb0JzJxz1xURW+
MuUAoKqtfapED0UniQd7vn4Nv6fAEFOt
=w//u
-----END PGP SIGNATURE-----

Vincent Ladeuil | 8 Oct 08:04
Favicon

Re: CVS migration help

>>>>> "thomas" == Thomas Manson <dev.mansonthomas <at> gmail.com> writes:

<snip/>

    thomas> I've compiled python 2.6,

Eerk, why did you do that ? Stay with python-2.5. Support for 2.6
is under consideration and some patches has been applied starting
with 1.8, but using 2.6 *now* in *your* case will just make
things harder.

    thomas> installed
    thomas> cElementTree-1.0.5-20051216.tar.gz<http://effbot.org/media/downloads/cElementTree-1.0.5-20051216.tar.gz>
    thomas> then installed bzr 1.7.1 from sources

    thomas> but I got another issue :

    thomas> [root <at> home temp]# bzr whoami 'Paquerette
    thomas> <dev.mansonthomas <at> gmail.com>'<dev.mansonthomas <at> gmail.com%3E'>
    thomas> /usr/local/lib/python2.6/site-packages/bzrlib/lazy_import.py:195:
    thomas> DeprecationWarning: the sha module is deprecated; use the hashlib module
    thomas> instead
    thomas>   module = __import__(module_python_path, scope, scope, [])
    thomas> Unable to load plugin 'launchpad' from
    thomas> '/usr/local/lib/python2.6/site-packages/bzrlib/plugins'
    thomas> I think I'm going to open the window and jump :op

Don't jump now :)

Either get a more recent version of bzr or *better* stick with
python-2.5.

        Vincent

Thomas Manson | 8 Oct 10:03

Re: CVS migration help

I've recompiled python 2.5.2 and ré install bzr 1.7.1.
 
[root <at> home bzr-1.7.1]# bzr whoami 'Paquerette <dev.mansonthomas <at> gmail.com>'
Unable to load plugin 'launchpad' from '/usr/local/lib/python2.5/site-packages/bzrlib/plugins'
I give up...
 
What I'll do, is re install cvs on my new server with its repository
Have a copy of my eclipse workspace using cvs to be able to browse cvs history.
And make a bzr import of my existing project and use bzr from now on.
 
Anyway many thanks for your help everybody ;)
Thomas.

On Wed, Oct 8, 2008 at 08:04, Vincent Ladeuil <v.ladeuil+lp <at> free.fr> wrote:
>>>>> "thomas" == Thomas Manson <dev.mansonthomas <at> gmail.com> writes:

<snip/>

   thomas> I've compiled python 2.6,

Eerk, why did you do that ? Stay with python-2.5. Support for 2.6
is under consideration and some patches has been applied starting
with 1.8, but using 2.6 *now* in *your* case will just make
things harder.

   thomas> installed
   thomas> cElementTree-1.0.5-20051216.tar.gz<http://effbot.org/media/downloads/cElementTree-1.0.5-20051216.tar.gz>
   thomas> then installed bzr 1.7.1 from sources

   thomas> but I got another issue :

   thomas> [root <at> home temp]# bzr whoami 'Paquerette
   thomas> <dev.mansonthomas <at> gmail.com>'<dev.mansonthomas <at> gmail.com%3E'>
   thomas> /usr/local/lib/python2.6/site-packages/bzrlib/lazy_import.py:195:
   thomas> DeprecationWarning: the sha module is deprecated; use the hashlib module
   thomas> instead
   thomas>   module = __import__(module_python_path, scope, scope, [])
   thomas> Unable to load plugin 'launchpad' from
   thomas> '/usr/local/lib/python2.6/site-packages/bzrlib/plugins'
   thomas> I think I'm going to open the window and jump :op

Don't jump now :)

Either get a more recent version of bzr or *better* stick with
python-2.5.

       Vincent

Favicon

Re: CVS migration help

Thomas Manson пишет:
> I've recompiled python 2.5.2 and ré install bzr 1.7.1. <http://1.7.1.>
>  
> [root <at> home bzr-1.7.1]# bzr whoami 'Paquerette 
> <dev.mansonthomas <at> gmail.com>' <mailto:dev.mansonthomas <at> gmail.com%3E'>
> Unable to load plugin 'launchpad' from 
> '/usr/local/lib/python2.5/site-packages/bzrlib/plugins'
> I give up...

'Unable to load plugin' is not fatal error.

Colin D Bennett | 8 Oct 18:04
Favicon

Re: CVS migration help

On Wed, 8 Oct 2008 10:03:20 +0200
"Thomas Manson" <dev.mansonthomas <at> gmail.com> wrote:

...
> I give up...
> 
> What I'll do, is re install cvs on my new server with its repository
> Have a copy of my eclipse workspace using cvs to be able to browse cvs
> history.
> And make a bzr import of my existing project and use bzr from now on.
> 
> Anyway many thanks for your help everybody ;)
> Thomas.

Did you try using Tailor?  It has worked great for me to import from and
stay synchronized with CVS projects.  Here is my example configuration:

----- begin file: grub.tailor -----
[grub]
patch-name-format = None
source = cvs:grub2
target = bzr:grub

[cvs:grub2]
repository = :pserver:anonymous <at> cvs.savannah.gnu.org:/sources/grub
module = grub2
encoding = iso-8859-1

[bzr:grub2]
----- end file: grub.tailor -----

Then, just execute Tailor with the following command:

  tailor --config=../grub.tailor

I create a subdirectory underneath the directory where 'grub.tailor'
is, and then I run the above 'tailor' command in that subdirectory,
since it will check out the CVS tree into the current working
directory.  This CVS checkout is then also a bzr branch, which is
updated every time you re-run the 'tailor' command above if there are
any new commits to CVS.

Regards,
Colin
Thomas Manson | 9 Oct 01:03

Re: CVS migration help

Hi Colin,
 
I'll try tomorrow, thanks for your config file.
 
Thomas

On Wed, Oct 8, 2008 at 18:04, Colin D Bennett <colin <at> gibibit.com> wrote:
On Wed, 8 Oct 2008 10:03:20 +0200
"Thomas Manson" <dev.mansonthomas <at> gmail.com> wrote:

...
> I give up...
>
> What I'll do, is re install cvs on my new server with its repository
> Have a copy of my eclipse workspace using cvs to be able to browse cvs
> history.
> And make a bzr import of my existing project and use bzr from now on.
>
> Anyway many thanks for your help everybody ;)
> Thomas.

Did you try using Tailor?  It has worked great for me to import from and
stay synchronized with CVS projects.  Here is my example configuration:

----- begin file: grub.tailor -----
[grub]
patch-name-format = None
source = cvs:grub2
target = bzr:grub

[cvs:grub2]
repository = :pserver:anonymous <at> cvs.savannah.gnu.org:/sources/grub
module = grub2
encoding = iso-8859-1

[bzr:grub2]
----- end file: grub.tailor -----

Then, just execute Tailor with the following command:

 tailor --config=../grub.tailor

I create a subdirectory underneath the directory where 'grub.tailor'
is, and then I run the above 'tailor' command in that subdirectory,
since it will check out the CVS tree into the current working
directory.  This CVS checkout is then also a bzr branch, which is
updated every time you re-run the 'tailor' command above if there are
any new commits to CVS.

Regards,
Colin

Thomas Manson | 9 Oct 23:16

Re: CVS migration help

Hi colin,
I have the same issue :
 
thomas <at> home:~/temp/bzr2$ tailor --config=../crf.tailor
paquerette <at> cvs.paquerette.com's password:
paquerette <at> cvs.paquerette.com's password:
paquerette <at> cvs.paquerette.com's password:
paquerette <at> cvs.paquerette.com's password:
Traceback (most recent call last):
  File "/usr/lib/python2.5/logging/__init__.py", line 744, in emit
    msg = self.format(record)
  File "/usr/lib/python2.5/logging/__init__.py", line 630, in format
    return fmt.format(record)
  File "/usr/lib/python2.5/logging/__init__.py", line 421, in format
    s = self._fmt % record.__dict__
  File "/var/lib/python-support/python2.5/vcpx/shwrap.py", line 89, in __str__
    r = '$'+repr(self)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7562: ordinal not in range(128)
 
But unlike the other tools, it doesn't stop the conversion... I"ve a conversion.
 
Now I've to find what file is missing (I guess I miss only one)
 
I'm converting the other projects.
 
Thanks very much !
 
Thomas.


 
On Thu, Oct 9, 2008 at 01:03, Thomas Manson <dev.mansonthomas <at> gmail.com> wrote:
Hi Colin,
 
I'll try tomorrow, thanks for your config file.
 
Thomas

On Wed, Oct 8, 2008 at 18:04, Colin D Bennett <colin <at> gibibit.com> wrote:
On Wed, 8 Oct 2008 10:03:20 +0200
"Thomas Manson" <dev.mansonthomas <at> gmail.com> wrote:

...
> I give up...
>
> What I'll do, is re install cvs on my new server with its repository
> Have a copy of my eclipse workspace using cvs to be able to browse cvs
> history.
> And make a bzr import of my existing project and use bzr from now on.
>
> Anyway many thanks for your help everybody ;)
> Thomas.

Did you try using Tailor?  It has worked great for me to import from and
stay synchronized with CVS projects.  Here is my example configuration:

----- begin file: grub.tailor -----
[grub]
patch-name-format = None
source = cvs:grub2
target = bzr:grub

[cvs:grub2]
repository = :pserver:anonymous <at> cvs.savannah.gnu.org:/sources/grub
module = grub2
encoding = iso-8859-1

[bzr:grub2]
----- end file: grub.tailor -----

Then, just execute Tailor with the following command:

 tailor --config=../grub.tailor

I create a subdirectory underneath the directory where 'grub.tailor'
is, and then I run the above 'tailor' command in that subdirectory,
since it will check out the CVS tree into the current working
directory.  This CVS checkout is then also a bzr branch, which is
updated every time you re-run the 'tailor' command above if there are
any new commits to CVS.

Regards,
Colin


Brian de Alwis | 7 Oct 19:58
Favicon

Re: CVS migration help

> >Have you checked that the names in the .dat files actually are  
> encoded in UTF-8?
>
> How would  I do that ?

Use the 'hexdump -C' and look for some ASCII-encoded portion of the  
filenames.  Or if you like vi, check out bvi (a binary vi;  
bvi.sourceforge.net).

As pointed out by John, the accented characters are encoded as  
multiple bytes using UTF-8, but as a single byte if using ISO8859-1  
(latin1).

> LANG=fr_FR <at> euro

I'm not sure what charset '@euro' encoding corresponds to.

Brian.


Gmane