Felix Fietkau | 8 Aug 2012 16:25

[PATCH v2 3.6] ath9k: fix interrupt storms on queued hardware reset

commit b74713d04effbacd3d126ce94cec18742187b6ce
"ath9k: Handle fatal interrupts properly" introduced a race condition, where
IRQs are being left enabled, however the irq handler returns IRQ_HANDLED
while the reset is still queued without addressing the IRQ cause.
This leads to an IRQ storm that prevents the system from even getting to
the reset code.

Fix this by disabling IRQs in the handler without touching intr_ref_cnt.

Cc: Rajkumar Manoharan <rmanohar@...>
Cc: Sujith Manoharan <c_manoha@...>
Signed-off-by: Felix Fietkau <nbd@...>
---
 drivers/net/wireless/ath/ath9k/mac.c  |   18 ++++++++++++------
 drivers/net/wireless/ath/ath9k/mac.h  |    1 +
 drivers/net/wireless/ath/ath9k/main.c |    4 +++-
 3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/mac.c b/drivers/net/wireless/ath/ath9k/mac.c
index 7990cd5..b42be91 100644
--- a/drivers/net/wireless/ath/ath9k/mac.c
+++ b/drivers/net/wireless/ath/ath9k/mac.c
 <at>  <at>  -773,15 +773,10  <at>  <at>  bool ath9k_hw_intrpend(struct ath_hw *ah)
 }
 EXPORT_SYMBOL(ath9k_hw_intrpend);

-void ath9k_hw_disable_interrupts(struct ath_hw *ah)
+void ath9k_hw_kill_interrupts(struct ath_hw *ah)
 {
 	struct ath_common *common = ath9k_hw_common(ah);
(Continue reading)

Rajkumar Manoharan | 8 Aug 2012 16:43
Favicon

Re: [PATCH v2 3.6] ath9k: fix interrupt storms on queued hardware reset

On Wed, Aug 08, 2012 at 04:25:03PM +0200, Felix Fietkau wrote:
> commit b74713d04effbacd3d126ce94cec18742187b6ce
> "ath9k: Handle fatal interrupts properly" introduced a race condition, where
> IRQs are being left enabled, however the irq handler returns IRQ_HANDLED
> while the reset is still queued without addressing the IRQ cause.
> This leads to an IRQ storm that prevents the system from even getting to
> the reset code.
> 
> Fix this by disabling IRQs in the handler without touching intr_ref_cnt.
>
It is safer not to re-enable interrupts on FATAL errors rather than enabling
it and then checking it on irq for bailing out. It would be better if you kill
the interrupts on processing fatal interrupts.

-Rajkumar
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Felix Fietkau | 8 Aug 2012 17:00

Re: [PATCH v2 3.6] ath9k: fix interrupt storms on queued hardware reset

On 2012-08-08 4:43 PM, Rajkumar Manoharan wrote:
> On Wed, Aug 08, 2012 at 04:25:03PM +0200, Felix Fietkau wrote:
>> commit b74713d04effbacd3d126ce94cec18742187b6ce
>> "ath9k: Handle fatal interrupts properly" introduced a race condition, where
>> IRQs are being left enabled, however the irq handler returns IRQ_HANDLED
>> while the reset is still queued without addressing the IRQ cause.
>> This leads to an IRQ storm that prevents the system from even getting to
>> the reset code.
>> 
>> Fix this by disabling IRQs in the handler without touching intr_ref_cnt.
>>
> It is safer not to re-enable interrupts on FATAL errors rather than enabling
> it and then checking it on irq for bailing out. It would be better if you kill
> the interrupts on processing fatal interrupts.
A fatal interrupt isn't the only place where this is race shows up.
Anything that queues a reset is affected, so skipping the interrupt
enable in the IRQ handler is not enough (aside from the fact that it
would mess up irq disable refcounting).

Also, how is it safer? It's not like the interrupt handler does any real
processing before running into that check.

- Felix
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Rajkumar Manoharan | 8 Aug 2012 17:20
Favicon

Re: [PATCH v2 3.6] ath9k: fix interrupt storms on queued hardware reset

On Wed, Aug 08, 2012 at 05:00:39PM +0200, Felix Fietkau wrote:
> On 2012-08-08 4:43 PM, Rajkumar Manoharan wrote:
> > On Wed, Aug 08, 2012 at 04:25:03PM +0200, Felix Fietkau wrote:
> >> commit b74713d04effbacd3d126ce94cec18742187b6ce
> >> "ath9k: Handle fatal interrupts properly" introduced a race condition, where
> >> IRQs are being left enabled, however the irq handler returns IRQ_HANDLED
> >> while the reset is still queued without addressing the IRQ cause.
> >> This leads to an IRQ storm that prevents the system from even getting to
> >> the reset code.
> >> 
> >> Fix this by disabling IRQs in the handler without touching intr_ref_cnt.
> >>
> > It is safer not to re-enable interrupts on FATAL errors rather than enabling
> > it and then checking it on irq for bailing out. It would be better if you kill
> > the interrupts on processing fatal interrupts.
> A fatal interrupt isn't the only place where this is race shows up.
> Anything that queues a reset is affected, so skipping the interrupt
> enable in the IRQ handler is not enough (aside from the fact that it
> would mess up irq disable refcounting).
> 
> Also, how is it safer? It's not like the interrupt handler does any real
> processing before running into that check.
> 
Agree. I confused with the mentioned commit subject. Sorry for the noise.

-Rajkumar
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
(Continue reading)

Sujith Manoharan | 8 Aug 2012 17:25
Favicon

Re: [PATCH v2 3.6] ath9k: fix interrupt storms on queued hardware reset

Rajkumar Manoharan wrote:
> It is safer not to re-enable interrupts on FATAL errors rather than enabling
> it and then checking it on irq for bailing out. It would be better if you kill
> the interrupts on processing fatal interrupts.

I am not sure I understand.

The original issue was the race between reset-work and the ISR which resulted in
frequent disconnects when a BB-WATCHDOG interrupt occurred or TX hung, which was
fixed by introducing the SC_OP_HW_RESET flag. Later, the work_pending() race was
fixed. Still, this is a race that can happen and I think fixing it by bypassing
the ref-count maintenance and disabling interrupts is okay.

Sujith
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Gmane