Barry Haddow | 1 Feb 11:27 2011
Picon
Picon

Re: Errors training GIZA++

Hi Nakul

The arguments to clean-corpus-n are
input source-language target-language output min-length max-length

eg
./clean-corpus-n.perl europarl fr en europarl.cleaned 1 80

which assumes I have europarl.fr and europarl.en files.

If you have train-factored-phrase-model.perl then you have an old version of 
moses - I'd recommend using the latest from svn. In any case, if you want to 
use train-model.perl then you have to run make release in the scripts 
directory. By all means try Tom's packages, they might make things easier.

It should be possible to run GIZA by hand, but I've never done this. From the 
GIZA output you posted earlier it looked like there was a problem with the 
vcb files, but try cleaning the corpus first to see if that fixes the 
problem.

best regards - Barry

On Tuesday 01 February 2011 04:20, nakul sharma wrote:
> Hi Barry,
> ./clean-corpus-n.perl in truck/scripts/training returned following error:-
>
> ./clean-corpus-n.perl corpus/* txt txt clean 1 50
> clean-corpus.perl: processing
> corpus/200EnglishSens.txt.corpus/200HindiSens.txt & .txt to txt, cutoff
> clean-1
(Continue reading)


Gmane