Thomas Goirand | 22 Jan 2012 10:08
Picon

[patch] man page fixes

Hi,

Please also apply these man page fixes. I'm currently adding this patch
in the Debian packaging to reduce lintian warnings which are quite
annoying me when working on MLMMJ: too many warnings, and I wont see
anything... By the way, hyphen-as-minus use are breaking groff
indentation, so it's a good thing to fix them.

Cheers,

Thomas Goirand (zigo)
Ben Schmidt | 22 Jan 2012 14:56
Picon
Gravatar

Re: [patch] man page fixes

Hi, Thomas,

Thanks for this. I have a few issues/questions.

1. This doesn't apply cleanly to current sources in version control.
Would you be able to provide a patch that does? I can probably resolve
the clashes OK, but I know little about groff, so I'm not sure if other
man-page changes/additions might also require fixing, so it'd be better
if someone who knows more what they're doing could look at it.

2. My system (Mac OS X) doesn't like the UTF-8 encoding. The existing
Latin-1 encoding works for me (in fact, the ø is replaced by just an o
for me automagically somewhere). I guess this is locale-related. This
means we need to figure out how to do an encoding conversion appropriate
to the host system as part of the build/install process, or find a groff
directive that makes it interpret the file as a particular encoding, or
something, rather than just change the encoding. I'm happy to change the
encoding to UTF-8 if we can figure out how to make all systems interpret
the files properly. Any ideas?

3. Could we keep the separate issues in separate patches? If the
encoding change is in one patch, the hyphen issue in another, and the
content changes in another, that'd be nice (and I can then easily apply
any that have no issues while continuing to discuss any that do).

Cheers, and thanks again,

Ben.

On 22/01/12 8:08 PM, Thomas Goirand wrote:
(Continue reading)

Thomas Goirand | 22 Jan 2012 20:13
Picon

Re: [patch] man page fixes

On 01/22/2012 09:56 PM, Ben Schmidt wrote:
> Hi, Thomas,
> 
> Thanks for this. I have a few issues/questions.
> 
> 1. This doesn't apply cleanly to current sources in version control.
> Would you be able to provide a patch that does?

Sorry, my patch is from MLMMJ 1.2.17, as I didn't upgrade the Debian
package yet (I'm waiting that you release something).

> I can probably resolve
> the clashes OK, but I know little about groff, so I'm not sure if other
> man-page changes/additions might also require fixing, so it'd be better
> if someone who knows more what they're doing could look at it.

The issue is when you have something with dash "like-this". Groff will
then try to wrap it, and you mind end up with something displayed like-
this (eg: with a return to the next line, when you really don't want
one). Adding a \ in front of the - makes it so that groff wont do the
word break.

If you check with lintian (which is a Debian package checking tool), it
will warn with a message like "hyphen-instead-of-minus" warning. The
extended description in lintian is as follow:

    This manual page seems to contain a hyphen where a minus sign was
intended. By default, "-" chars are interpreted as hyphens (U+2010) by
groff, not as minus signs (U+002D). Since options to programs use minus
signs (U+002D), this means for example in UTF-8 locales that you cannot
(Continue reading)

Ben Schmidt | 23 Jan 2012 01:37
Picon
Gravatar

Re: [patch] man page fixes

On 23/01/12 6:13 AM, Thomas Goirand wrote:
> On 01/22/2012 09:56 PM, Ben Schmidt wrote:
>> Hi, Thomas,
>>
>> Thanks for this. I have a few issues/questions.
>>
>> 1. This doesn't apply cleanly to current sources in version control.
>> Would you be able to provide a patch that does?
>
> Sorry, my patch is from MLMMJ 1.2.17, as I didn't upgrade the Debian
> package yet (I'm waiting that you release something).

It's a wise move to wait until a release, of course. Just a little
tricky for me to apply old patches.

>> I can probably resolve the clashes OK, but I know little about groff,
>> so I'm not sure if other man-page changes/additions might also
>> require fixing, so it'd be better if someone who knows more what
>> they're doing could look at it.
>
> The issue is when you have something with dash "like-this". Groff will
> then try to wrap it, and you mind end up with something displayed like-
> this (eg: with a return to the next line, when you really don't want
> one). Adding a \ in front of the - makes it so that groff wont do the
> word break.

Thanks a lot for that detailed clarification. I'll do a semi-automated
find-replace on the current man pages and escape all the dashes.

>> 2. My system (Mac OS X) doesn't like the UTF-8 encoding. The existing
(Continue reading)

Ben Schmidt | 23 Jan 2012 03:06
Picon
Gravatar

Re: [patch] man page fixes

>> All man pages should be using UTF-8 in Debian, and I believe that you
>> should have your mac to use UTF-8 if possible. If not, do we care? Is
>> MLMMJ used in the Apple platform?
>>
>> Also, what type of encoding do you use? Why is your encoding more valid
>> than UTF-8? What if the user is let's say Chinese, Russian, or who knows?
>>
>> It really doesn't make sense to use any type of specific encoding,
>> everyone should be using UTF-8, IMO.
>
> I agree, it makes sense for everyone to use UTF-8 these days. However,
> I'd prefer not to expect or assume that. I didn't mention my system
> because I think it is particularly important, but simply to point out
> that there is at least one system out there that this change will break.
> There may be others. I would like to find a way to make this change that
> won't break any system. Does anyone know how to do this, or another
> project that has solved this problem whose work we can copy or imitate?

I think I've solved this.

It seems Debian is non-standard in requiring UTF-8 man pages, as Groff
does not support UTF-8 input:
http://www.gnu.org/software/groff/manual/html_node/Input-Encodings.html

However, Groff supports character escapes which can be used compatibly:
http://manpages.ubuntu.com/manpages/gutsy/man7/groff_char.7.html
http://manpages.debian.net/cgi-bin/man.cgi?query=groff_char&apropos=0&sektion=0&manpath=Debian+6.0+squeeze&format=html&locale=en

So I'll replace ø with \[/o] and everything should be good, though a
little ugly.
(Continue reading)

Thomas Goirand | 23 Jan 2012 08:11
Picon

Re: [patch] man page fixes

On 01/23/2012 08:37 AM, Ben Schmidt wrote:
> Just a little
> tricky for me to apply old patches.

For these man pages, my intention was to point at the issues, and make
sure they don't re-occur, because really, this has been recurrent with
MLMMJ.

> It seems Debian is non-standard in requiring UTF-8 man pages, as Groff
> does not support UTF-8 input:
> http://www.gnu.org/software/groff/manual/html_node/Input-Encodings.html

From the same page:
"By its very nature, -Tutf8 supports all input encodings"

So it's absolutely standard (and recommended).

> So I'll replace ø with \[/o] and everything should be good, though a
> little ugly.

I'm happy if you've found a solution, however, I still think UTF-8 is
the only choice.

Thomas

P.S: Please do *not* Cc: me, I'm registered to the list.

Ben Schmidt | 23 Jan 2012 17:39
Picon
Gravatar

Re: [patch] man page fixes

Hi, Thomas,

>> It seems Debian is non-standard in requiring UTF-8 man pages, as Groff
>> does not support UTF-8 input:
>> http://www.gnu.org/software/groff/manual/html_node/Input-Encodings.html
>
> From the same page:
> "By its very nature, -Tutf8 supports all input encodings"
>
> So it's absolutely standard (and recommended).

My interpretation of this is, "When the output/terminal encoding is
UTF-8, naturally all supported input encodings can be accommodated,
since Unicode is a superset of them all." (The paragraph then explains
how other output encodings have restrictions on which input encodings
they can accommodate.)

That doesn't by any means mean that UTF-8 is a supported input encoding.
On the contrary, since it's not on the list of supported input
encodings, and there is no documentation regarding how to instruct groff
that its input is UTF-8, I believe it isn't. If Debian supports it, they
must have patched groff, or just be happily sweeping the issue under the
carpet (if groff thinks everything is Latin-1 I presume it will just
handle text transparently, so it might not matter if it is actually fed
and outputs UTF-8 rather than Latin-1--until complicated wrapping or
collation gets involved).

>> So I'll replace ø with \[/o] and everything should be good, though a
>> little ugly.
>
(Continue reading)

Thomas Goirand | 27 Jan 2012 05:47
Picon

Re: [patch] man page fixes

On 01/24/2012 12:39 AM, Ben Schmidt wrote:
>>> It seems Debian is non-standard in requiring UTF-8 man pages, as Groff
>>> does not support UTF-8 input:
>>> http://www.gnu.org/software/groff/manual/html_node/Input-Encodings.html
>>
>> From the same page:
>> "By its very nature, -Tutf8 supports all input encodings"
>>
>> So it's absolutely standard (and recommended).
> 
> My interpretation of this is, "When the output/terminal encoding is
> UTF-8, naturally all supported input encodings can be accommodated,
> since Unicode is a superset of them all." (The paragraph then explains
> how other output encodings have restrictions on which input encodings
> they can accommodate.)
> 
> That doesn't by any means mean that UTF-8 is a supported input encoding.
> On the contrary, since it's not on the list of supported input
> encodings, and there is no documentation regarding how to instruct groff
> that its input is UTF-8, I believe it isn't. If Debian supports it, they
> must have patched groff, or just be happily sweeping the issue under the
> carpet (if groff thinks everything is Latin-1 I presume it will just
> handle text transparently, so it might not matter if it is actually fed
> and outputs UTF-8 rather than Latin-1--until complicated wrapping or
> collation gets involved).

This doesn't make sense at all. If there's a parameter to use UTF-8, how
could it be not supported?

Thomas
(Continue reading)

Ben Schmidt | 27 Jan 2012 06:37
Picon
Gravatar

Re: [patch] man page fixes

On 27/01/12 3:47 PM, Thomas Goirand wrote:
> On 01/24/2012 12:39 AM, Ben Schmidt wrote:
>>>> It seems Debian is non-standard in requiring UTF-8 man pages, as Groff
>>>> does not support UTF-8 input:
>>>> http://www.gnu.org/software/groff/manual/html_node/Input-Encodings.html
>>>
>>>  From the same page:
>>> "By its very nature, -Tutf8 supports all input encodings"
>>>
>>> So it's absolutely standard (and recommended).
>>
>> My interpretation of this is, "When the output/terminal encoding is
>> UTF-8, naturally all supported input encodings can be accommodated,
>> since Unicode is a superset of them all." (The paragraph then explains
>> how other output encodings have restrictions on which input encodings
>> they can accommodate.)
>>
>> That doesn't by any means mean that UTF-8 is a supported input encoding.
>> On the contrary, since it's not on the list of supported input
>> encodings, and there is no documentation regarding how to instruct groff
>> that its input is UTF-8, I believe it isn't. If Debian supports it, they
>> must have patched groff, or just be happily sweeping the issue under the
>> carpet (if groff thinks everything is Latin-1 I presume it will just
>> handle text transparently, so it might not matter if it is actually fed
>> and outputs UTF-8 rather than Latin-1--until complicated wrapping or
>> collation gets involved).
>
> This doesn't make sense at all. If there's a parameter to use UTF-8, how
> could it be not supported?

(Continue reading)


Gmane