5 Sep 2007 15:49
Re: Raised f0 in clustergen
В Втр, 04/09/2007 в 07:48 -0400, Alan W Black пишет: > Nickolay V. Shmyrev wrote: > > This leads to more generic question - how we can model speech parameters > > better. Probably we should model logarithm of f0 instead of f0 like in > > hts and adjust distance matrix for mcep. What is the base for logarithm > > then? > > > > Are there articles on appropriate topic? > > > > Yes they are appropriate topics. I had noticed before that tuning the > F0 by hand could make the voice sound better (even if the generated F0 > was not actually close to the source speaker). > > The F0 generation in clustergen is really quite different from HTS, a > smoothed F0 over the whole sentence is generated (through unvoiced > regions) which is then predicted from a separate model from the mcep > model. This is producing pretty good values (correlation and rmse) > compared to other F) models I've done in the past. HOwever I've not > really done listening tests on them. > > Though I have seen on other systems that playing with the F0 values can > improve the sound of the voice even. The Log F0 vs absolute F0 may make > a difference, though in some experiments I've found it makes a flatter > F0 (smaller variance), which I believe is the bigger problem. I've not > done listening tests here, but we are deep in the process building new > prosodic models for Festival (clustergen and otherwise) based on the new > story data we now have access to. Very interesting, thanks(Continue reading)
RSS Feed