Dan Williams | 10 Feb 09:44
Picon
Favicon

[PATCH v8 00/13] libsas error handling + discovery v8

Changes since v7: http://marc.info/?l=linux-scsi&m=132739159230095&w=2

1/ Dropped "libsas: feed the scsi_block_when_processing_errors() meter"
   [1].  Patches 1-3 are just resends of the patches affected by that
   rebase.

2/ Fixed up the phy identify data that is exported to userspace.
   sas_phy/phyX/device_type and sas_phy/phyX/target_port_protocols are
   now populated more reliably.  Patches 4-5

3/ User-disabled ssp phys are now no longer inadvertantly re-enabled by
   error recovery (they need an explicit reset from userspace).  This
   enables scripted hotplug testing. Patches 6-8

4/ Fixed up the eh_strategy_handlers to prevent them being called from
   outside eh context, added an abort handler, and made sure all resets
   are notified to the lldd. Patches 9-12

5/ A small diet for sas_task to move slow path infrastructue out of the
   fast path use case. Patch 13

[1]: http://marc.info/?l=linux-scsi&m=132677728817896&w=2

---

[PATCH 01/13] libsas: close scsi_remove_target() vs libata-eh race
[PATCH 02/13] libsas: improve debug statements
[PATCH 03/13] libsas: async ata scanning
[PATCH 04/13] libsas: set attached device type and target protocols for local phys
[PATCH 05/13] libsas: fixup target_port_protocols for expanders that don't report sata
(Continue reading)

Dan Williams | 10 Feb 09:44
Picon
Favicon

[PATCH v8 01/13] libsas: close scsi_remove_target() vs libata-eh race

ata_port lifetime in libata follows the host.  In libsas it follows the
scsi_target.  Once scsi_remove_device() has caused all commands to be
completed it allows scsi_remove_target() to immediately proceed to
freeing the ata_port causing bug reports like:

[  848.393333] BUG: spinlock bad magic on CPU#4, kworker/u:2/5107
[  848.400262] general protection fault: 0000 [#1] SMP
[  848.406244] CPU 4
[  848.408310] Modules linked in: nls_utf8 ipv6 uinput i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support
ioatdma dca sg sd_mod sr_mod cdrom ahci libahci isci libsas libata scsi_transport_sas [last unloaded: scsi_wait_scan]
[  848.432060]
[  848.434137] Pid: 5107, comm: kworker/u:2 Not tainted 3.2.0-isci+ #8 Intel Corporation S2600CP/S2600CP
[  848.445310] RIP: 0010:[<ffffffff8126a68c>]  [<ffffffff8126a68c>] spin_dump+0x5e/0x8c
[  848.454787] RSP: 0018:ffff8807f868dca0  EFLAGS: 00010002
[  848.461137] RAX: 0000000000000048 RBX: ffff8807fe86a630 RCX: ffffffff817d0be0
[  848.469520] RDX: 0000000000000000 RSI: ffffffff814af1cf RDI: 0000000000000002
[  848.477959] RBP: ffff8807f868dcb0 R08: 00000000ffffffff R09: 000000006b6b6b6b
[  848.486327] R10: 000000000003fb8c R11: ffffffff81a19448 R12: 6b6b6b6b6b6b6b6b
[  848.494699] R13: ffff8808027dc520 R14: 0000000000000000 R15: 000000000000001e
[  848.503067] FS:  0000000000000000(0000) GS:ffff88083fd00000(0000) knlGS:0000000000000000
[  848.512899] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  848.519710] CR2: 00007ff77d001000 CR3: 00000007f7a5d000 CR4: 00000000000406e0
[  848.528072] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  848.536446] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  848.544831] Process kworker/u:2 (pid: 5107, threadinfo ffff8807f868c000, task ffff8807ff348000)
[  848.555327] Stack:
[  848.557959]  ffff8807fe86a630 ffff8807fe86a630 ffff8807f868dcd0 ffffffff8126a6e0
[  848.567072]  ffffffff817c142f ffff8807fe86a630 ffff8807f868dcf0 ffffffff8126a703
[  848.576190]  ffff8808027dc520 0000000000000286 ffff8807f868dd10 ffffffff814af1bb
[  848.585281] Call Trace:
(Continue reading)

Jeff Garzik | 10 Feb 19:32
Favicon

Re: [PATCH v8 01/13] libsas: close scsi_remove_target() vs libata-eh race

On 02/10/2012 03:44 AM, Dan Williams wrote:
> ata_port lifetime in libata follows the host.  In libsas it follows the
> scsi_target.  Once scsi_remove_device() has caused all commands to be
> completed it allows scsi_remove_target() to immediately proceed to
> freeing the ata_port causing bug reports like:
>
> [  848.393333] BUG: spinlock bad magic on CPU#4, kworker/u:2/5107
> [  848.400262] general protection fault: 0000 [#1] SMP
> [  848.406244] CPU 4
> [  848.408310] Modules linked in: nls_utf8 ipv6 uinput i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support
ioatdma dca sg sd_mod sr_mod cdrom ahci libahci isci libsas libata scsi_transport_sas [last unloaded: scsi_wait_scan]
> [  848.432060]
> [  848.434137] Pid: 5107, comm: kworker/u:2 Not tainted 3.2.0-isci+ #8 Intel Corporation S2600CP/S2600CP
> [  848.445310] RIP: 0010:[<ffffffff8126a68c>]  [<ffffffff8126a68c>] spin_dump+0x5e/0x8c
> [  848.454787] RSP: 0018:ffff8807f868dca0  EFLAGS: 00010002
> [  848.461137] RAX: 0000000000000048 RBX: ffff8807fe86a630 RCX: ffffffff817d0be0
> [  848.469520] RDX: 0000000000000000 RSI: ffffffff814af1cf RDI: 0000000000000002
> [  848.477959] RBP: ffff8807f868dcb0 R08: 00000000ffffffff R09: 000000006b6b6b6b
> [  848.486327] R10: 000000000003fb8c R11: ffffffff81a19448 R12: 6b6b6b6b6b6b6b6b
> [  848.494699] R13: ffff8808027dc520 R14: 0000000000000000 R15: 000000000000001e
> [  848.503067] FS:  0000000000000000(0000) GS:ffff88083fd00000(0000) knlGS:0000000000000000
> [  848.512899] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  848.519710] CR2: 00007ff77d001000 CR3: 00000007f7a5d000 CR4: 00000000000406e0
> [  848.528072] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  848.536446] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  848.544831] Process kworker/u:2 (pid: 5107, threadinfo ffff8807f868c000, task ffff8807ff348000)
> [  848.555327] Stack:
> [  848.557959]  ffff8807fe86a630 ffff8807fe86a630 ffff8807f868dcd0 ffffffff8126a6e0
> [  848.567072]  ffffffff817c142f ffff8807fe86a630 ffff8807f868dcf0 ffffffff8126a703
> [  848.576190]  ffff8808027dc520 0000000000000286 ffff8807f868dd10 ffffffff814af1bb
(Continue reading)

Dan Williams | 10 Feb 20:09
Picon
Favicon

Re: [PATCH v8 01/13] libsas: close scsi_remove_target() vs libata-eh race

On Fri, Feb 10, 2012 at 12:44 AM, Dan Williams <dan.j.williams <at> intel.com> wrote:
> diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
> index 37a9e73..852b1b1 100644
> --- a/drivers/scsi/libsas/sas_ata.c
> +++ b/drivers/scsi/libsas/sas_ata.c
> @@ -661,6 +661,10 @@ static void async_sas_ata_eh(void *data, async_cookie_t cookie)
>        struct ata_port *ap = dev->sata_dev.ap;
>        struct sas_ha_struct *ha = dev->port->ha;
>
> +       /* hold a reference over eh since we may be racing with final
> +        * remove once all commands are completed
> +        */
> +       kref_get(&dev->kref);
>        ata_port_printk(ap, KERN_DEBUG, "sas eh calling libata port error handler");
>        ata_scsi_port_error_handler(ha->core.shost, ap);

The last rebase ate the matching call to "sas_put_device()" that needs
to be here.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dan Williams | 10 Feb 09:44
Picon
Favicon

[PATCH v8 02/13] libsas: improve debug statements

It's difficult to determine which domain_device is triggering error recovery,
so convert messages like:

  sas: ex 5001b4da000e703f phy08:T attached: 5001b4da000e7028
  sas: ex 5001b4da000e703f phy09:T attached: 5001b4da000e7029
  ...
  ata7: sas eh calling libata port error handler
  ata8: sas eh calling libata port error handler

...into:

  sas: ex 5001517e85cfefff phy05:T:9 attached: 5001517e85cfefe5 (stp)
  sas: ex 5001517e3b0af0bf phy11:T:8 attached: 5001517e3b0af0ab (stp)
  ...
  sas: ata7: end_device-21:1: dev error handler
  sas: ata8: end_device-20:0:5: dev error handler

which shows attached link rate, device type, and associates a
domain_device with its ata_port id to correlate messages emitted from
libata-eh.

As Doug notes, we can also take the opportunity to clarify expander phy
routing capabilities.

Cc: Douglas Gilbert <dgilbert <at> interlog.com>
[dgilbert <at> interlog.com: clarify table2table with 'U']
Signed-off-by: Dan Williams <dan.j.williams <at> intel.com>
---
 drivers/scsi/libsas/sas_ata.c      |   43 ++++++++++++++++-----
 drivers/scsi/libsas/sas_expander.c |   74 +++++++++++++++++++++++++-----------
(Continue reading)

Dan Williams | 10 Feb 09:44
Picon
Favicon

[PATCH v8 03/13] libsas: async ata scanning

libsas ata error handling is already async but this does not help the
scan case.  Move initial link recovery out from under host->scan_mutex,
and delay synchronization with eh until after all port probe/recovery
work has been queued.

Device ordering is maintained with scan order by still calling
sas_rphy_add() in order of domain discovery.

Since we now scan the domain list when invoking libata-eh we need to be
careful to check for fully initialized ata ports.

Cc: Xiangliang Yu <yuxiangl <at> marvell.com>
Cc: Luben Tuikov <ltuikov <at> yahoo.com>
Acked-by: Jack Wang <jack_wang <at> usish.com>
Signed-off-by: Dan Williams <dan.j.williams <at> intel.com>
---
 drivers/ata/libata-core.c           |   34 +++++++++-------
 drivers/ata/libata-scsi.c           |   13 ++++++
 drivers/ata/libata.h                |    1 
 drivers/scsi/aic94xx/aic94xx_init.c |    1 
 drivers/scsi/isci/init.c            |    1 
 drivers/scsi/libsas/sas_ata.c       |   74 ++++++++++++++++++++++++++++++-----
 drivers/scsi/libsas/sas_discover.c  |   22 +++++-----
 drivers/scsi/libsas/sas_internal.h  |    9 ++++
 drivers/scsi/libsas/sas_scsi_host.c |   18 ---------
 drivers/scsi/mvsas/mv_init.c        |    1 
 drivers/scsi/pm8001/pm8001_init.c   |    1 
 include/linux/libata.h              |    1 
 include/scsi/libsas.h               |    1 
 include/scsi/sas_ata.h              |   12 +++---
(Continue reading)

Jeff Garzik | 10 Feb 19:34
Favicon

Re: [PATCH v8 03/13] libsas: async ata scanning

On 02/10/2012 03:44 AM, Dan Williams wrote:
> libsas ata error handling is already async but this does not help the
> scan case.  Move initial link recovery out from under host->scan_mutex,
> and delay synchronization with eh until after all port probe/recovery
> work has been queued.
>
> Device ordering is maintained with scan order by still calling
> sas_rphy_add() in order of domain discovery.
>
> Since we now scan the domain list when invoking libata-eh we need to be
> careful to check for fully initialized ata ports.
>
> Cc: Xiangliang Yu<yuxiangl <at> marvell.com>
> Cc: Luben Tuikov<ltuikov <at> yahoo.com>
> Acked-by: Jack Wang<jack_wang <at> usish.com>
> Signed-off-by: Dan Williams<dan.j.williams <at> intel.com>
> ---
>   drivers/ata/libata-core.c           |   34 +++++++++-------
>   drivers/ata/libata-scsi.c           |   13 ++++++
>   drivers/ata/libata.h                |    1
>   drivers/scsi/aic94xx/aic94xx_init.c |    1
>   drivers/scsi/isci/init.c            |    1
>   drivers/scsi/libsas/sas_ata.c       |   74 ++++++++++++++++++++++++++++++-----
>   drivers/scsi/libsas/sas_discover.c  |   22 +++++-----
>   drivers/scsi/libsas/sas_internal.h  |    9 ++++
>   drivers/scsi/libsas/sas_scsi_host.c |   18 ---------
>   drivers/scsi/mvsas/mv_init.c        |    1
>   drivers/scsi/pm8001/pm8001_init.c   |    1
>   include/linux/libata.h              |    1
>   include/scsi/libsas.h               |    1
(Continue reading)

Dan Williams | 10 Feb 09:45
Picon
Favicon

[PATCH v8 04/13] libsas: set attached device type and target protocols for local phys

Before:
$ cat /sys/class/sas_phy/phy-6\:3/device_type
none
$ cat /sys/class/sas_phy/phy-6\:3/target_port_protocols
none

After:
$ cat /sys/class/sas_phy/phy-6\:3/device_type
end device
$ cat /sys/class/sas_phy/phy-6\:3/target_port_protocols
sata

Also downgrade the phy_list_lock to _irq instead of _irqsave since
libsas will never call sas_get_port_device with interrupts disbled.

Signed-off-by: Dan Williams <dan.j.williams <at> intel.com>
---
 drivers/scsi/libsas/sas_discover.c |   12 ++++++++----
 drivers/scsi/libsas/sas_internal.h |   17 +++++++++++++++++
 drivers/scsi/libsas/sas_port.c     |    2 ++
 3 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index ed3f8c0..7a3ae48 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -69,7 +69,6 @@ void sas_init_dev(struct domain_device *dev)
  */
 static int sas_get_port_device(struct asd_sas_port *port)
 {
(Continue reading)

Dan Williams | 10 Feb 09:45
Picon
Favicon

[PATCH v8 05/13] libsas: fixup target_port_protocols for expanders that don't report sata

If discovery returns 0 for target_port_protocols but shows an attached
sata device, just report SAS_PROTOCOL_SATA in the identify data so
userspace can reliably search for sata devices in the domain.

Signed-off-by: Dan Williams <dan.j.williams <at> intel.com>
---
 drivers/scsi/libsas/sas_expander.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 14e3244..05acd9e 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -251,6 +251,8 @@ static void sas_set_ex_phy(struct domain_device *dev, int phy_id, void *rsp)
 	phy->phy->identify.device_type = dr->attached_dev_type;
 	phy->phy->identify.initiator_port_protocols = phy->attached_iproto;
 	phy->phy->identify.target_port_protocols = phy->attached_tproto;
+	if (!phy->attached_tproto && dr->attached_sata_dev)
+		phy->phy->identify.target_port_protocols = SAS_PROTOCOL_SATA;
 	phy->phy->identify.phy_identifier = phy_id;
 	phy->phy->minimum_linkrate_hw = dr->hmin_linkrate;
 	phy->phy->maximum_linkrate_hw = dr->hmax_linkrate;

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dan Williams | 10 Feb 09:45
Picon
Favicon

[PATCH v8 06/13] libsas: don't recover end devices attached to disabled phys

If userspace has decided to disable a phy the kernel should honor that
and not inadvertantly re-enable the phy via error recovery.  This is
more straightforward in the sata case where link recovery (via
libata-eh) is separate from sas_task cancelling in libsas-eh.  Teach
libsas to accept -ENODEV as a successful response from I_T_nexus_reset
('successful' in terms of not escalating further).

This is a more comprehensive fix then "libsas: don't recover 'gone'
devices in sas_ata_hard_reset()", as it is no longer sata-specific.

aic94xx does check the return value from sas_phy_reset() so if the phy
is disabled we proceed with clearing the I_T_nexus.

Cc: Luben Tuikov <ltuikov <at> yahoo.com>
Signed-off-by: Dan Williams <dan.j.williams <at> intel.com>
---
 drivers/scsi/aic94xx/aic94xx_tmf.c  |    2 +-
 drivers/scsi/libsas/sas_ata.c       |    5 ++---
 drivers/scsi/libsas/sas_init.c      |    3 +++
 drivers/scsi/libsas/sas_scsi_host.c |    3 ++-
 4 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/aic94xx/aic94xx_tmf.c b/drivers/scsi/aic94xx/aic94xx_tmf.c
index 50b914f..cf90409 100644
--- a/drivers/scsi/aic94xx/aic94xx_tmf.c
+++ b/drivers/scsi/aic94xx/aic94xx_tmf.c
@@ -192,7 +192,7 @@ int asd_I_T_nexus_reset(struct domain_device *dev)
 	ASD_DPRINTK("sending %s reset to %s\n",
 		    reset_type ? "hard" : "soft", dev_name(&phy->dev));
 	res = sas_phy_reset(phy, reset_type);
(Continue reading)

Dan Williams | 10 Feb 09:45
Picon
Favicon

[PATCH v8 07/13] scsi_transport_sas: 'enable' phys on reset

If userspace requests a phy reset, treat that as a request for the phy
to be enabled since that is the effect on hardware.

Signed-off-by: Dan Williams <dan.j.williams <at> intel.com>
---
 drivers/scsi/scsi_transport_sas.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c
index 7d69a25..f7565fc 100644
--- a/drivers/scsi/scsi_transport_sas.c
+++ b/drivers/scsi/scsi_transport_sas.c
@@ -615,6 +615,7 @@ do_sas_phy_reset(struct device *dev, size_t count, int hard_reset)
 	error = i->f->phy_reset(phy, hard_reset);
 	if (error)
 		return error;
+	phy->enabled = 1;
 	return count;
 };

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dan Williams | 10 Feb 09:45
Picon
Favicon

[PATCH v8 08/13] libsas: libsas.force_hard_reset module parameter

It is possible for a host to get "locked out" from talking to sata
devices in the domain if, for example, its sas address changes but the
expander topology has existing affiliations with the old address.  If
the system is booted userspace can write to
/sys/class/sas_phy/<phy-X>/hard_reset to clear the affiliation, however
if this condition exists for the root device the module parameter can be
used to promote all ata resets to hard resets.

After the system is booted this state can be cleared via
/sys/module/libsas/parameters/force_hard_reset

Cc: Xiangliang Yu <yuxiangl <at> marvell.com>
Cc: Luben Tuikov <ltuikov <at> yahoo.com>
Cc: Jack Wang <jack_wang <at> usish.com>
Signed-off-by: Dan Williams <dan.j.williams <at> intel.com>
---
 Documentation/kernel-parameters.txt |    6 ++++++
 drivers/scsi/libsas/sas_init.c      |    6 +++++-
 2 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 81c287f..ffefa3b 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1283,6 +1283,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			If there are multiple matching configurations changing
 			the same attribute, the last one is used.

+	libsas.force_hard_reset=
+			[LIBSAS] Clear SATA affiliations with every reset, for
(Continue reading)

James Bottomley | 29 Feb 22:55

Re: [PATCH v8 08/13] libsas: libsas.force_hard_reset module parameter

On Fri, 2012-02-10 at 00:45 -0800, Dan Williams wrote:
> It is possible for a host to get "locked out" from talking to sata
> devices in the domain if, for example, its sas address changes but the
> expander topology has existing affiliations with the old address.  If
> the system is booted userspace can write to
> /sys/class/sas_phy/<phy-X>/hard_reset to clear the affiliation, however
> if this condition exists for the root device the module parameter can be
> used to promote all ata resets to hard resets.

I don't quite understand this.  Are you saying we can't (or shouldn't)
execute 

/sys/class/sas_phy/<phy-X>/hard_reset

on the root device for some reason?

> After the system is booted this state can be cleared via
> /sys/module/libsas/parameters/force_hard_reset

I really don't think a module parameter for this is such a good idea ...
it effectively promotes all soft resets to being hard ones, which can
have a lot of unintended consequences.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

(Continue reading)

Douglas Gilbert | 29 Feb 23:40
Picon

Re: [PATCH v8 08/13] libsas: libsas.force_hard_reset module parameter

On 12-02-29 04:55 PM, James Bottomley wrote:
> On Fri, 2012-02-10 at 00:45 -0800, Dan Williams wrote:
>> It is possible for a host to get "locked out" from talking to sata
>> devices in the domain if, for example, its sas address changes but the
>> expander topology has existing affiliations with the old address.  If
>> the system is booted userspace can write to
>> /sys/class/sas_phy/<phy-X>/hard_reset to clear the affiliation, however
>> if this condition exists for the root device the module parameter can be
>> used to promote all ata resets to hard resets.

A point of order: SAS has link resets and hard resets. The
hard reset is a superset of link reset. A "link reset sequence
serves as a hard reset for SATA devices" and hence is
sufficient to reset a SATA device. To reset a SAS device
(e.g. a SAS disk) you need a SAS hard reset. Therefore a link
reset is the appropriately sized "gun" to reset a SATA device.

I have a SAS-2 expander that annoyingly powers up with the
programmed maximum physical link rate of its phys at 3 Gbps
even though its hardware maximum rate is 6 Gbps. For expander
phys connected to SAS-2 disks I can up the programmed maximum
value to 6 Gbps on the expander phy then do a link reset on
that phy. So without upsetting Linux (or any other OS) I can
switch that path from 3 Gbps to 6 Gbps. Can't do that with a
SATA disk without the OS finding out.

Also to clear a SATA affiliation you should be using a SMP
PHY CONTROL (phy_op=6) function.

Doug Gilbert
(Continue reading)

Dan Williams | 1 Mar 00:27
Picon
Favicon

Re: [PATCH v8 08/13] libsas: libsas.force_hard_reset module parameter

On Wed, Feb 29, 2012 at 2:40 PM, Douglas Gilbert <dgilbert <at> interlog.com> wrote:
> On 12-02-29 04:55 PM, James Bottomley wrote:
>>
>> On Fri, 2012-02-10 at 00:45 -0800, Dan Williams wrote:
>>>
>>> It is possible for a host to get "locked out" from talking to sata
>>> devices in the domain if, for example, its sas address changes but the
>>> expander topology has existing affiliations with the old address.  If
>>> the system is booted userspace can write to
>>> /sys/class/sas_phy/<phy-X>/hard_reset to clear the affiliation, however
>>> if this condition exists for the root device the module parameter can be
>>> used to promote all ata resets to hard resets.
>
>
> A point of order: SAS has link resets and hard resets. The
> hard reset is a superset of link reset. A "link reset sequence
> serves as a hard reset for SATA devices" and hence is
> sufficient to reset a SATA device. To reset a SAS device
> (e.g. a SAS disk) you need a SAS hard reset. Therefore a link
> reset is the appropriately sized "gun" to reset a SATA device.
>
> I have a SAS-2 expander that annoyingly powers up with the
> programmed maximum physical link rate of its phys at 3 Gbps
> even though its hardware maximum rate is 6 Gbps. For expander
> phys connected to SAS-2 disks I can up the programmed maximum
> value to 6 Gbps on the expander phy then do a link reset on
> that phy. So without upsetting Linux (or any other OS) I can
> switch that path from 3 Gbps to 6 Gbps. Can't do that with a
> SATA disk without the OS finding out.

(Continue reading)

Douglas Gilbert | 1 Mar 01:23
Picon

Re: [PATCH v8 08/13] libsas: libsas.force_hard_reset module parameter

On 12-02-29 06:27 PM, Dan Williams wrote:
> On Wed, Feb 29, 2012 at 2:40 PM, Douglas Gilbert<dgilbert <at> interlog.com>  wrote:
>> On 12-02-29 04:55 PM, James Bottomley wrote:
>>>
>>> On Fri, 2012-02-10 at 00:45 -0800, Dan Williams wrote:
>>>>
>>>> It is possible for a host to get "locked out" from talking to sata
>>>> devices in the domain if, for example, its sas address changes but the
>>>> expander topology has existing affiliations with the old address.  If
>>>> the system is booted userspace can write to
>>>> /sys/class/sas_phy/<phy-X>/hard_reset to clear the affiliation, however
>>>> if this condition exists for the root device the module parameter can be
>>>> used to promote all ata resets to hard resets.
>>
>>
>> A point of order: SAS has link resets and hard resets. The
>> hard reset is a superset of link reset. A "link reset sequence
>> serves as a hard reset for SATA devices" and hence is
>> sufficient to reset a SATA device. To reset a SAS device
>> (e.g. a SAS disk) you need a SAS hard reset. Therefore a link
>> reset is the appropriately sized "gun" to reset a SATA device.
>>
>> I have a SAS-2 expander that annoyingly powers up with the
>> programmed maximum physical link rate of its phys at 3 Gbps
>> even though its hardware maximum rate is 6 Gbps. For expander
>> phys connected to SAS-2 disks I can up the programmed maximum
>> value to 6 Gbps on the expander phy then do a link reset on
>> that phy. So without upsetting Linux (or any other OS) I can
>> switch that path from 3 Gbps to 6 Gbps. Can't do that with a
>> SATA disk without the OS finding out.
(Continue reading)

Dan Williams | 1 Mar 01:35
Picon
Favicon

Re: [PATCH v8 08/13] libsas: libsas.force_hard_reset module parameter

On Wed, Feb 29, 2012 at 4:23 PM, Douglas Gilbert <dgilbert <at> interlog.com> wrote:
> On 12-02-29 06:27 PM, Dan Williams wrote:
>>
>> On Wed, Feb 29, 2012 at 2:40 PM, Douglas Gilbert<dgilbert <at> interlog.com>
>>  wrote:
>>>
>>> On 12-02-29 04:55 PM, James Bottomley wrote:
>>>>
>>>>
>>>> On Fri, 2012-02-10 at 00:45 -0800, Dan Williams wrote:
>>>>>
>>>>>
>>>>> It is possible for a host to get "locked out" from talking to sata
>>>>> devices in the domain if, for example, its sas address changes but the
>>>>> expander topology has existing affiliations with the old address.  If
>>>>> the system is booted userspace can write to
>>>>> /sys/class/sas_phy/<phy-X>/hard_reset to clear the affiliation, however
>>>>> if this condition exists for the root device the module parameter can
>>>>> be
>>>>> used to promote all ata resets to hard resets.
>>>
>>>
>>>
>>> A point of order: SAS has link resets and hard resets. The
>>> hard reset is a superset of link reset. A "link reset sequence
>>> serves as a hard reset for SATA devices" and hence is
>>> sufficient to reset a SATA device. To reset a SAS device
>>> (e.g. a SAS disk) you need a SAS hard reset. Therefore a link
>>> reset is the appropriately sized "gun" to reset a SATA device.
>>>
(Continue reading)

Dan Williams | 1 Mar 00:22
Picon
Favicon

Re: [PATCH v8 08/13] libsas: libsas.force_hard_reset module parameter

On Wed, Feb 29, 2012 at 1:55 PM, James Bottomley
<James.Bottomley <at> hansenpartnership.com> wrote:
> On Fri, 2012-02-10 at 00:45 -0800, Dan Williams wrote:
>> It is possible for a host to get "locked out" from talking to sata
>> devices in the domain if, for example, its sas address changes but the
>> expander topology has existing affiliations with the old address.  If
>> the system is booted userspace can write to
>> /sys/class/sas_phy/<phy-X>/hard_reset to clear the affiliation, however
>> if this condition exists for the root device the module parameter can be
>> used to promote all ata resets to hard resets.
>
> I don't quite understand this.  Are you saying we can't (or shouldn't)
> execute
>
> /sys/class/sas_phy/<phy-X>/hard_reset
>
> on the root device for some reason?

The case I ran into was accidentally changing the host sas address
between reboots.  If the sata device had been a root device then I
would not have been able boot the system.  But now that I think about
it, if Linux could not boot then neither could the pre-os
option-rom/efi driver.

>> After the system is booted this state can be cleared via
>> /sys/module/libsas/parameters/force_hard_reset
>
> I really don't think a module parameter for this is such a good idea ...
> it effectively promotes all soft resets to being hard ones, which can
> have a lot of unintended consequences.
(Continue reading)

James Bottomley | 1 Mar 15:27

Re: [PATCH v8 08/13] libsas: libsas.force_hard_reset module parameter

On Wed, 2012-02-29 at 15:22 -0800, Dan Williams wrote:
> On Wed, Feb 29, 2012 at 1:55 PM, James Bottomley
> <James.Bottomley <at> hansenpartnership.com> wrote:
> > On Fri, 2012-02-10 at 00:45 -0800, Dan Williams wrote:
> >> It is possible for a host to get "locked out" from talking to sata
> >> devices in the domain if, for example, its sas address changes but the
> >> expander topology has existing affiliations with the old address.  If
> >> the system is booted userspace can write to
> >> /sys/class/sas_phy/<phy-X>/hard_reset to clear the affiliation, however
> >> if this condition exists for the root device the module parameter can be
> >> used to promote all ata resets to hard resets.
> >
> > I don't quite understand this.  Are you saying we can't (or shouldn't)
> > execute
> >
> > /sys/class/sas_phy/<phy-X>/hard_reset
> >
> > on the root device for some reason?
> 
> The case I ran into was accidentally changing the host sas address
> between reboots.  If the sata device had been a root device then I
> would not have been able boot the system.  But now that I think about
> it, if Linux could not boot then neither could the pre-os
> option-rom/efi driver.
> 
> >> After the system is booted this state can be cleared via
> >> /sys/module/libsas/parameters/force_hard_reset
> >
> > I really don't think a module parameter for this is such a good idea ...
> > it effectively promotes all soft resets to being hard ones, which can
(Continue reading)

Dan Williams | 10 Feb 09:45
Picon
Favicon

[PATCH v8 09/13] libsas: enforce eh strategy handlers only in eh context

The strategy handlers may be called in places that are problematic for
libsas (i.e. sata resets outside of domain revalidation filtering /
libata link recovery), or problematic for userspace (non-blocking ioctl
to sleeping reset functions).  However, these routines are also called
for eh escalations and recovery of scsi_eh_prep_cmnd(), so permit them
as long as we are running in the host's error handler.

Signed-off-by: Dan Williams <dan.j.williams <at> intel.com>
---
 drivers/scsi/libsas/sas_scsi_host.c |   15 +++++++++++----
 1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index f0b9b7b..1cabedc 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -463,11 +463,14 @@ EXPORT_SYMBOL_GPL(sas_get_local_phy);
 /* Attempt to send a LUN reset message to a device */
 int sas_eh_device_reset_handler(struct scsi_cmnd *cmd)
 {
-	struct domain_device *dev = cmd_to_domain_dev(cmd);
-	struct sas_internal *i =
-		to_sas_internal(dev->port->ha->core.shost->transportt);
-	struct scsi_lun lun;
 	int res;
+	struct scsi_lun lun;
+	struct Scsi_Host *host = cmd->device->host;
+	struct domain_device *dev = cmd_to_domain_dev(cmd);
+	struct sas_internal *i = to_sas_internal(host->transportt);
+
(Continue reading)

James Bottomley | 29 Feb 23:05

Re: [PATCH v8 09/13] libsas: enforce eh strategy handlers only in eh context

On Fri, 2012-02-10 at 00:45 -0800, Dan Williams wrote:
> The strategy handlers may be called in places that are problematic for
> libsas (i.e. sata resets outside of domain revalidation filtering /
> libata link recovery), or problematic for userspace (non-blocking ioctl
> to sleeping reset functions).  However, these routines are also called
> for eh escalations and recovery of scsi_eh_prep_cmnd(), so permit them
> as long as we are running in the host's error handler.
> 
> Signed-off-by: Dan Williams <dan.j.williams <at> intel.com>
> ---
>  drivers/scsi/libsas/sas_scsi_host.c |   15 +++++++++++----
>  1 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
> index f0b9b7b..1cabedc 100644
> --- a/drivers/scsi/libsas/sas_scsi_host.c
> +++ b/drivers/scsi/libsas/sas_scsi_host.c
> @@ -463,11 +463,14 @@ EXPORT_SYMBOL_GPL(sas_get_local_phy);
>  /* Attempt to send a LUN reset message to a device */
>  int sas_eh_device_reset_handler(struct scsi_cmnd *cmd)
>  {
> -	struct domain_device *dev = cmd_to_domain_dev(cmd);
> -	struct sas_internal *i =
> -		to_sas_internal(dev->port->ha->core.shost->transportt);
> -	struct scsi_lun lun;
>  	int res;
> +	struct scsi_lun lun;
> +	struct Scsi_Host *host = cmd->device->host;
> +	struct domain_device *dev = cmd_to_domain_dev(cmd);
> +	struct sas_internal *i = to_sas_internal(host->transportt);
(Continue reading)

Dan Williams | 1 Mar 01:28
Picon
Favicon

Re: [PATCH v8 09/13] libsas: enforce eh strategy handlers only in eh context

On Wed, Feb 29, 2012 at 2:05 PM, James Bottomley
<James.Bottomley <at> hansenpartnership.com> wrote:
> On Fri, 2012-02-10 at 00:45 -0800, Dan Williams wrote:
>> The strategy handlers may be called in places that are problematic for
>> libsas (i.e. sata resets outside of domain revalidation filtering /
>> libata link recovery), or problematic for userspace (non-blocking ioctl
>> to sleeping reset functions).  However, these routines are also called
>> for eh escalations and recovery of scsi_eh_prep_cmnd(), so permit them
>> as long as we are running in the host's error handler.
>>
>> Signed-off-by: Dan Williams <dan.j.williams <at> intel.com>
>> ---
>>  drivers/scsi/libsas/sas_scsi_host.c |   15 +++++++++++----
>>  1 files changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
>> index f0b9b7b..1cabedc 100644
>> --- a/drivers/scsi/libsas/sas_scsi_host.c
>> +++ b/drivers/scsi/libsas/sas_scsi_host.c
>> @@ -463,11 +463,14 @@ EXPORT_SYMBOL_GPL(sas_get_local_phy);
>>  /* Attempt to send a LUN reset message to a device */
>>  int sas_eh_device_reset_handler(struct scsi_cmnd *cmd)
>>  {
>> -     struct domain_device *dev = cmd_to_domain_dev(cmd);
>> -     struct sas_internal *i =
>> -             to_sas_internal(dev->port->ha->core.shost->transportt);
>> -     struct scsi_lun lun;
>>       int res;
>> +     struct scsi_lun lun;
>> +     struct Scsi_Host *host = cmd->device->host;
(Continue reading)

James Bottomley | 1 Mar 15:29

Re: [PATCH v8 09/13] libsas: enforce eh strategy handlers only in eh context

On Wed, 2012-02-29 at 16:28 -0800, Dan Williams wrote:
> On Wed, Feb 29, 2012 at 2:05 PM, James Bottomley
> <James.Bottomley <at> hansenpartnership.com> wrote:
> > On Fri, 2012-02-10 at 00:45 -0800, Dan Williams wrote:
> >> The strategy handlers may be called in places that are problematic for
> >> libsas (i.e. sata resets outside of domain revalidation filtering /
> >> libata link recovery), or problematic for userspace (non-blocking ioctl
> >> to sleeping reset functions).  However, these routines are also called
> >> for eh escalations and recovery of scsi_eh_prep_cmnd(), so permit them
> >> as long as we are running in the host's error handler.
> >>
> >> Signed-off-by: Dan Williams <dan.j.williams <at> intel.com>
> >> ---
> >>  drivers/scsi/libsas/sas_scsi_host.c |   15 +++++++++++----
> >>  1 files changed, 11 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
> >> index f0b9b7b..1cabedc 100644
> >> --- a/drivers/scsi/libsas/sas_scsi_host.c
> >> +++ b/drivers/scsi/libsas/sas_scsi_host.c
> >> @@ -463,11 +463,14 @@ EXPORT_SYMBOL_GPL(sas_get_local_phy);
> >>  /* Attempt to send a LUN reset message to a device */
> >>  int sas_eh_device_reset_handler(struct scsi_cmnd *cmd)
> >>  {
> >> -     struct domain_device *dev = cmd_to_domain_dev(cmd);
> >> -     struct sas_internal *i =
> >> -             to_sas_internal(dev->port->ha->core.shost->transportt);
> >> -     struct scsi_lun lun;
> >>       int res;
> >> +     struct scsi_lun lun;
(Continue reading)

Dan Williams | 6 Mar 20:17
Picon
Favicon

Re: [PATCH v8 09/13] libsas: enforce eh strategy handlers only in eh context

On Thu, Mar 1, 2012 at 6:29 AM, James Bottomley
<James.Bottomley <at> hansenpartnership.com> wrote:
>> Hmm, what about something like:
>>
>>   if (current != host->ehandler) {
>>     schedule_reset_to_run_in_eh_context():
>>     wait_for_eh();
>>   } else
>>     do_reset();
>
> I think that would work for me ... as long as the wait doesn't cause
> entangled deadlocks (I can't think of any at the moment, but I'll think
> a bit more deeply about it).

So there is a deadlock due to:

commit d7a1bb0a04ca835bffc0a91e64ab827dfba7d8f5
Author: James Smart <James.Smart <at> Emulex.Com>
Date:   Wed Mar 8 14:50:12 2006 -0500

    [SCSI] Block I/O while SG reset operation in progress - the midlayer patch

    The scsi midlayer portion of the patch

    Signed-off-by: James Smart <James.Smart <at> emulex.com>
    Signed-off-by: James Bottomley <James.Bottomley <at> SteelEye.com>

...this adds shost->tmf_in_progress to scsi_host_in_recovery(), so I
can't "wait for eh" because the exit condition for that wait is
"!scsi_host_in_recovery()".  But since sg_reset is opened O_NONBLOCK
(Continue reading)

Dan Williams | 10 Feb 09:45
Picon
Favicon

[PATCH v8 10/13] libsas: add sas_eh_abort_handler

When recovering failed eh-cmnds let the lldd attempt an abort via
scsi_abort_eh_cmnd before escalating.

Signed-off-by: Dan Williams <dan.j.williams <at> intel.com>
---
 drivers/scsi/libsas/sas_scsi_host.c |   21 +++++++++++++++++++++
 include/scsi/libsas.h               |    1 +
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 1cabedc..82f7532 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -460,6 +460,27 @@ struct sas_phy *sas_get_local_phy(struct domain_device *dev)
 }
 EXPORT_SYMBOL_GPL(sas_get_local_phy);

+int sas_eh_abort_handler(struct scsi_cmnd *cmd)
+{
+	int res;
+	struct sas_task *task = TO_SAS_TASK(cmd);
+	struct Scsi_Host *host = cmd->device->host;
+	struct sas_internal *i = to_sas_internal(host->transportt);
+
+	if (current != host->ehandler)
+		return FAILED;
+
+	if (!i->dft->lldd_abort_task)
+		return FAILED;
+
(Continue reading)

Dan Williams | 10 Feb 09:45
Picon
Favicon

[PATCH v8 11/13] libsas: use ->lldd_I_T_nexus_reset for ->eh_bus_reset_handler

sas_eh_bus_reset_handler() amounts to sas_phy_reset() without
notification of the reset to the lldd.  If this is triggered from
eh-cmnd recovery there may be sas_tasks for the lldd to terminate, so
->lldd_I_T_nexus_reset is warranted.

Cc: Xiangliang Yu <yuxiangl <at> marvell.com>
Cc: Luben Tuikov <ltuikov <at> yahoo.com>
Cc: Jack Wang <jack_wang <at> usish.com>
Signed-off-by: Dan Williams <dan.j.williams <at> intel.com>
---
 drivers/scsi/libsas/sas_scsi_host.c |   19 ++++++++-----------
 1 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 82f7532..86ffd8f 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -505,25 +505,22 @@ int sas_eh_device_reset_handler(struct scsi_cmnd *cmd)
 	return FAILED;
 }

-/* Attempt to send a phy (bus) reset */
 int sas_eh_bus_reset_handler(struct scsi_cmnd *cmd)
 {
-	struct domain_device *dev = cmd_to_domain_dev(cmd);
-	struct sas_phy *phy = sas_get_local_phy(dev);
-	struct Scsi_Host *host = cmd->device->host;
 	int res;
+	struct Scsi_Host *host = cmd->device->host;
+	struct domain_device *dev = cmd_to_domain_dev(cmd);
(Continue reading)

Dan Williams | 10 Feb 09:45
Picon
Favicon

[PATCH v8 12/13] isci: use sas eh strategy handlers

...now that the strategy handlers guarantee eh context and and notify
the driver of bus reset.

Signed-off-by: Dan Williams <dan.j.williams <at> intel.com>
---
 drivers/scsi/isci/init.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/isci/init.c b/drivers/scsi/isci/init.c
index 9a28270..87ac0f6 100644
--- a/drivers/scsi/isci/init.c
+++ b/drivers/scsi/isci/init.c
@@ -157,6 +157,9 @@ static struct scsi_host_template isci_sht = {
 	.sg_tablesize			= SG_ALL,
 	.max_sectors			= SCSI_DEFAULT_MAX_SECTORS,
 	.use_clustering			= ENABLE_CLUSTERING,
+	.eh_abort_handler		= sas_eh_abort_handler,
+	.eh_device_reset_handler        = sas_eh_device_reset_handler,
+	.eh_bus_reset_handler           = sas_eh_bus_reset_handler,
 	.target_destroy			= sas_target_destroy,
 	.ioctl				= sas_ioctl,
 	.shost_attrs			= isci_host_attrs,

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dan Williams | 10 Feb 09:45
Picon
Favicon

[PATCH v8 13/13] libsas: trim sas_task of slow path infrastructure

The timer and the completion are only used for slow path tasks (smp, and
lldd tmfs), yet we incur the allocation space and cpu setup time for
every fast path task.

Cc: Christoph Hellwig <hch <at> lst.de>
Signed-off-by: Dan Williams <dan.j.williams <at> intel.com>
---
 drivers/scsi/libsas/sas_expander.c  |   20 ++++++++++----------
 drivers/scsi/libsas/sas_init.c      |   23 +++++++++++++++++++++--
 drivers/scsi/libsas/sas_scsi_host.c |    8 ++++++--
 drivers/scsi/mvsas/mv_sas.c         |   20 ++++++++++----------
 drivers/scsi/pm8001/pm8001_sas.c    |   30 +++++++++++++++---------------
 include/scsi/libsas.h               |   14 +++++++++-----
 6 files changed, 71 insertions(+), 44 deletions(-)

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 05acd9e..0ab3796 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -51,14 +51,14 @@ static void smp_task_timedout(unsigned long _task)
 		task->task_state_flags |= SAS_TASK_STATE_ABORTED;
 	spin_unlock_irqrestore(&task->task_state_lock, flags);

-	complete(&task->completion);
+	complete(&task->slow_task->completion);
 }

 static void smp_task_done(struct sas_task *task)
 {
-	if (!del_timer(&task->timer))
(Continue reading)

Jack Wang | 15 Feb 02:09

RE: [PATCH v8 13/13] libsas: trim sas_task of slow path infrastructure

Dear Dan,

You seems forget change sas_alloc_task to sas_alloc_slow_task in pm8001.
I'll post a patch to fix this later.
Thanks for your work.

Jack
[PATCH v8 13/13] libsas: trim sas_task of slow path infrastructure
> 
> The timer and the completion are only used for slow path tasks (smp, and
> lldd tmfs), yet we incur the allocation space and cpu setup time for
> every fast path task.
> 
> Cc: Christoph Hellwig <hch <at> lst.de>
> Signed-off-by: Dan Williams <dan.j.williams <at> intel.com>
> ---
>  drivers/scsi/libsas/sas_expander.c  |   20 ++++++++++----------
>  drivers/scsi/libsas/sas_init.c      |   23 +++++++++++++++++++++--
>  drivers/scsi/libsas/sas_scsi_host.c |    8 ++++++--
>  drivers/scsi/mvsas/mv_sas.c         |   20 ++++++++++----------
>  drivers/scsi/pm8001/pm8001_sas.c    |   30 +++++++++++++++---------------
>  include/scsi/libsas.h               |   14 +++++++++-----
>  6 files changed, 71 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/scsi/libsas/sas_expander.c
> b/drivers/scsi/libsas/sas_expander.c
> index 05acd9e..0ab3796 100644
> --- a/drivers/scsi/libsas/sas_expander.c
> +++ b/drivers/scsi/libsas/sas_expander.c
> @@ -51,14 +51,14 @@ static void smp_task_timedout(unsigned long _task)
(Continue reading)


Gmane