Adam Tlałka | 6 Oct 14:05

[PATCH 0/0] SIGWINCH problem with terminal apps

Welcome,

I've observed then very often a X11 terminal app is not getting proper
window sizes afer terminal resize operation. This could be seen with
mc, jed, vim or other curses and not curses aware apps.

I wrote a simple program which just does nothing but uses SIGWINCH
handler so I can observe values reported by ioctl(1,TIOCGWINS,&ws) call
inside my signal handler. What is interesting that from time to time it
obtain unchanged values. It means values which were valid just before
terminal resize.

In drivers/char/vt.c and drivers/char/tty_io.c variables
vc->vc_tty->winsize and tty->winsize , real_tty->winsize are updated
after kill_pgrp(pgrp, SIGWINCH, 1) calls. I am not very familiar with
mutex design and how it corresponds to kill_pgrp() kernel function but
it seems that locking is not working here as we expect. An app can read
tty winsize data through ioctl() call in SIGWINCH handler and obtain
uchanged values.

So as a quick solution I made patches which move mentioned updates
before kill_pgrp() calls. As I tested modified kernel there is no
observed effect now. So I send patchs.

There are some places where kill_pgrp() call is used and some variable
is changed after it. It should be considered if this code is always
working properly or some race scheduler condition exists.

Signed-off-by: Adam Tla/lka <atlka <at> pg.gda.pl>

(Continue reading)

Alan Cox | 6 Oct 15:13

Re: [PATCH 0/0] SIGWINCH problem with terminal apps

> So as a quick solution I made patches which move mentioned updates
> before kill_pgrp() calls. As I tested modified kernel there is no
> observed effect now. So I send patchs.

NAK - this might happen to make the race miss on your box but it's not a
fix of any kind.

> There are some places where kill_pgrp() call is used and some variable
> is changed after it. It should be considered if this code is always
> working properly or some race scheduler condition exists.

The code was never race free, the scheduler change made the problem show
up more. Later 2.6.27-rc has patches that use the termios lock across
TIOCG/SWINSZ and deal with the problem properly.

Alan
Adam Tlałka | 6 Oct 20:28

Re: [PATCH 0/0] SIGWINCH problem with terminal apps

Hi,

Mon, 6 Oct 2008 14:13:06 +0100 - Alan Cox <alan <at> lxorguk.ukuu.org.uk>:
> > So as a quick solution I made patches which move mentioned updates
> > before kill_pgrp() calls. As I tested modified kernel there is no
> > observed effect now. So I send patchs.
> 
> NAK - this might happen to make the race miss on your box but it's
> not a fix of any kind.

It depends. If mutexes are not working properly only in case of signal
sending then moving variables modification before kill_pgrp() call
could be a quite good enough solution too. Mutexes seems to be faster
and more efficient then semaphores for example.

> The code was never race free, the scheduler change made the problem
> show up more. Later 2.6.27-rc has patches that use the termios lock
> across TIOCG/SWINSZ and deal with the problem properly.

Maybe but what with older versions. Anyway the problem is if mutexes
are usable here or not.

Regards

--

-- 
Adam Tlałka
Alan Cox | 7 Oct 00:14

Re: [PATCH 0/0] SIGWINCH problem with terminal apps

> > show up more. Later 2.6.27-rc has patches that use the termios lock
> > across TIOCG/SWINSZ and deal with the problem properly.
> 
> Maybe but what with older versions. Anyway the problem is if mutexes
> are usable here or not.

Well if mutexes don't work your kernel will eat your computer fairly
rapidly so I think we may safely conclude that mutex locks work
Adam Tlałka | 7 Oct 22:28

Re: [PATCH 0/0] SIGWINCH problem with terminal apps

Mon, 6 Oct 2008 23:14:34 +0100 - Alan Cox <alan <at> lxorguk.ukuu.org.uk>:

> > > show up more. Later 2.6.27-rc has patches that use the termios
> > > lock across TIOCG/SWINSZ and deal with the problem properly.
> > 
> > Maybe but what with older versions. Anyway the problem is if mutexes
> > are usable here or not.
> 
> Well if mutexes don't work your kernel will eat your computer fairly
> rapidly so I think we may safely conclude that mutex locks work
> 

OK, so if this race problem raises only while kill_pgrp() is
used the proposed patch eliminates this problem. Of course we should
change code in all places where pgrp_kill() is used in conjuction
with mutexes and some internal variables are modified.
What do you think about it?

Regards

--

-- 
Adam Tlałka       mailto:atlka <at> pg.gda.pl    ^v^ ^v^ ^v^
System  & Network Administration Group       - - - ~~~~~~
Computer Center, Gdańsk University of Technology, Poland
Adam Tlałka | 10 Oct 03:12

[PATCH 0/1] SIGWINCH problem with terminal apps still alive

Welcome,

now we have 2.6.26.6 kernel and still terminal resize leads to
undesired effects. It is very inconvenient to wait for 2.6.27 for
corrections.

As Alan Cox previously said mutexes generally work but as we can
observe in case of kill_pgrp() call inside mutex lock we got
race because of rescheduling so lock is not working here.
Rearanging code so the variable change is placed before kill_pgrp()
call removes mentioned race situaction.   

Signed-off-by: Adam Tla/lka <atlka <at> pg.gda.pl>

I strongly suggest to patch actual 2.6.26.x kernel to remove this very
nasty pts behaviour.

Regards

--

-- 
Adam Tlałka       mailto:atlka <at> pg.gda.pl    ^v^ ^v^ ^v^
--- drivers/char/tty_io_orig.c	2008-10-10 02:30:18.000000000 +0200
+++ drivers/char/tty_io.c	2008-10-10 02:33:38.000000000 +0200
@@ -3014,6 +3014,9 @@ static int tiocswinsz(struct tty_struct 
 		}
 	}
 #endif
+	tty->winsize = tmp_ws;
(Continue reading)

Adam Tlałka | 10 Oct 05:32

Re: [PATCH 0/1] SIGWINCH problem with terminal apps still alive

Welcome,

Fri, 10 Oct 2008 03:12:34 +0200 - Adam Tlałka <atlka <at> pg.gda.pl>:
> now we have 2.6.26.6 kernel and still terminal resize leads to
> undesired effects. It is very inconvenient to wait for 2.6.27 for
> corrections.

Very funny, I've posted my patch just before 2.6.27 appeared so now it
seems obsolete. Only argument for it now is the cleaner and more
optimized code. Why?
Now we have two distant places where we use ws and tty->winsize
variables:

(from drivers/char/tty_io.c:tty_do_resize())

	mutex_lock(&real_tty->termios_mutex);
=>      if (!memcmp(ws, &tty->winsize, sizeof(*ws)))
                goto done;
        /* Get the PID values and reference them so we can
           avoid holding the tty ctrl lock while sending signals */
        spin_lock_irqsave(&tty->ctrl_lock, flags);
        pgrp = get_pid(tty->pgrp);
        rpgrp = get_pid(real_tty->pgrp);
        spin_unlock_irqrestore(&tty->ctrl_lock, flags);

        if (pgrp)
                kill_pgrp(pgrp, SIGWINCH, 1);
        if (rpgrp != pgrp && rpgrp)
                kill_pgrp(rpgrp, SIGWINCH, 1);

(Continue reading)

Alan Cox | 10 Oct 11:29

Re: [PATCH 0/1] SIGWINCH problem with terminal apps still alive

On Fri, 10 Oct 2008 03:12:34 +0200
Adam Tlałka <atlka <at> pg.gda.pl> wrote:

> Welcome,
> 
> now we have 2.6.26.6 kernel and still terminal resize leads to
> undesired effects. It is very inconvenient to wait for 2.6.27 for
> corrections.
> 
> As Alan Cox previously said mutexes generally work but as we can
> observe in case of kill_pgrp() call inside mutex lock we got
> race because of rescheduling so lock is not working here.
> Rearanging code so the variable change is placed before kill_pgrp()
> call removes mentioned race situaction.   
> 
> Signed-off-by: Adam Tla/lka <atlka <at> pg.gda.pl>
> 
> I strongly suggest to patch actual 2.6.26.x kernel to remove this very
> nasty pts behaviour.

NAK again

Moving the copies around simply moves the race, it might be that it fixes
your box and unfixes other peoples.
Adam Tlałka | 10 Oct 12:35

Re: [PATCH 0/1] SIGWINCH problem with terminal apps still alive

Fri, 10 Oct 2008 10:29:06 +0100 - Alan Cox <alan <at> lxorguk.ukuu.org.uk>:

> On Fri, 10 Oct 2008 03:12:34 +0200
> Adam Tlałka <atlka <at> pg.gda.pl> wrote:
> 
> > Welcome,
> > 
> > now we have 2.6.26.6 kernel and still terminal resize leads to
> > undesired effects. It is very inconvenient to wait for 2.6.27 for
> > corrections.
> > 
> > As Alan Cox previously said mutexes generally work but as we can
> > observe in case of kill_pgrp() call inside mutex lock we got
> > race because of rescheduling so lock is not working here.
> > Rearanging code so the variable change is placed before kill_pgrp()
> > call removes mentioned race situaction.   
> > 
> > Signed-off-by: Adam Tla/lka <atlka <at> pg.gda.pl>
> > 
> > I strongly suggest to patch actual 2.6.26.x kernel to remove this
> > very nasty pts behaviour.
> 
> NAK again
> 
> Moving the copies around simply moves the race, it might be that it
> fixes your box and unfixes other peoples.
> 

I don't think so. Race appears because of kill_pgrp() call which
generates SIGWINCH so it leads to reschedule and ioctl() which reads
(Continue reading)

Adam Tlałka | 10 Oct 13:56

Re: [PATCH 0/1] SIGWINCH problem with terminal apps still alive

Fri, 10 Oct 2008 12:35:17 +0200 - Adam Tlałka <atlka <at> pg.gda.pl>:

> Fri, 10 Oct 2008 10:29:06 +0100 - Alan Cox <alan <at> lxorguk.ukuu.org.uk>:
> > 
> > NAK again
> > 
> > Moving the copies around simply moves the race, it might be that it
> > fixes your box and unfixes other peoples.
> > 
> 
> I don't think so. Race appears because of kill_pgrp() call which
> generates SIGWINCH so it leads to reschedule and ioctl() which reads
> termios sizes before they are updated - from time to time. So if we
> update variables before signal generation there will be no race.
> Moving the point of variables update eliminates
> possibility of reading old values. So even if after kill_pgrp() the
> other process will not lock here on this mutex values obtained will be
> sane.
> 
> Whats more we could protect by mutex variable only test and change
> operations and it stil will work correctly.
> 
> Because now we have 2.6.27 I tested this kind of code in
> tty_io.c(tty_do_resize):
> 
> ...
> 
> So it works, and change of tty->winsize and real_tty->winsize are
> protected . Why another process should wait more if winsize is
> already properly set?
(Continue reading)


Gmane