Cedric Kimaru | 3 Jun 2012 03:25

Rhel 5.7 Cluster - gfs2 volume in "LEAVE_START_WAIT" status

Fellow Cluster Compatriots,
I'm looking for some guidance here. Whenever my rhel 5.7 cluster get's into "LEAVE_START_WAIT" on on a given iscsi volume, the following occurs:

  1. I can't r/w io to the volume.
  2. Can't unmount it, from any node.
  3. In flight/pending IO's are impossible to determine or kill since lsof on the mount fails. Basically all IO operations stall/fail.

So my questions are:

  1. What does the output from group_tool -v really indicate, "00030005 LEAVE_START_WAIT 12 c000b0002 1" ? Man on group_tool doesn't list these fields.
  2. Does anyone have a list of what these fields represent ?
  3. Corrective actions. How do i get out of this state without rebooting the entire cluster ?
  4. Is it possible to determine the offending node ?
thanks,
-Cedric


//misc output

root <at> bl13-node13:~# clustat
Cluster Status for cluster3 <at> Sat Jun  2 20:47:08 2012
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
bl01-node01                                      1 Online, rgmanager
 bl04-node04                                      4 Online, rgmanager
 bl05-node05                                      5 Online, rgmanager
 bl06-node06                                      6 Online, rgmanager
 bl07-node07                                      7 Online, rgmanager
 bl08-node08                                      8 Online, rgmanager
 bl09-node09                                      9 Online, rgmanager
 bl10-node10                                     10 Online, rgmanager
 bl11-node11                                     11 Online, rgmanager
 bl12-node12                                     12 Online, rgmanager
 bl13-node13                                     13 Online, Local, rgmanager
 bl14-node14                                     14 Online, rgmanager
 bl15-node15                                     15 Online, rgmanager


 Service Name                                                 Owner (Last)                                                 State       
 ------- ----                                                 ----- ------                                                 -----       
 service:httpd                                                bl05-node05                               started     
 service:nfs_disk2                                         bl08-node08                               started


root <at> bl13-node13:~# group_tool -v
type             level name            id       state node id local_done
fence            0     default         0001000d none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
dlm              1     clvmd           0001000c none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
dlm              1     cluster3_disk1  00020005 none       
[4 5 6 7 8 9 10 11 12 13 14 15]
dlm              1     cluster3_disk2  00040005 none       
[4 5 6 7 8 9 10 11 13 14 15]
dlm              1     cluster3_disk7  00060005 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
dlm              1     cluster3_disk8  00080005 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
dlm              1     cluster3_disk9  000a0005 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
dlm              1     disk10          000c0005 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
dlm              1     rgmanager       0001000a none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
dlm              1     cluster3_disk3  00020001 none       
[1 5 6 7 8 9 10 11 12 13]
dlm              1     cluster3_disk6  00020008 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
gfs              2     cluster3_disk1  00010005 none       
[4 5 6 7 8 9 10 11 12 13 14 15]
gfs              2     cluster3_disk2  00030005 LEAVE_START_WAIT 12 c000b0002 1
[4 5 6 7 8 9 10 11 13 14 15]
gfs              2     cluster3_disk7  00050005 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
gfs              2     cluster3_disk8  00070005 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
gfs              2     cluster3_disk9  00090005 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
gfs              2     disk10          000b0005 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
gfs              2     cluster3_disk3  00010001 none       
[1 5 6 7 8 9 10 11 12 13]
gfs              2     cluster3_disk6  00010008 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]

root <at> bl13-node13:~# gfs2_tool list
253:15 cluster3:cluster3_disk6
253:16 cluster3:cluster3_disk3
253:18 cluster3:disk10
253:17 cluster3:cluster3_disk9
253:19 cluster3:cluster3_disk8
253:21 cluster3:cluster3_disk7
253:22 cluster3:cluster3_disk2
253:23 cluster3:cluster3_disk1

root <at> bl13-node13:~# lvs
    Logging initialised at Sat Jun  2 20:50:03 2012
    Set umask from 0022 to 0077
    Finding all logical volumes
  LV                            VG                            Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  lv_cluster3_Disk7             vg_Cluster3_Disk7             -wi-ao   3.00T                                     
  lv_cluster3_Disk9             vg_Cluster3_Disk9             -wi-ao 200.01G                                     
  lv_Cluster3_libvert           vg_Cluster3_libvert           -wi-a- 100.00G                                     
  lv_cluster3_disk1             vg_cluster3_disk1             -wi-ao 100.00G                                     
  lv_cluster3_disk10            vg_cluster3_disk10            -wi-ao  15.00T                                     
  lv_cluster3_disk2             vg_cluster3_disk2             -wi-ao 220.00G                                     
  lv_cluster3_disk3             vg_cluster3_disk3             -wi-ao 330.00G                                     
  lv_cluster3_disk4_1T-kvm-thin vg_cluster3_disk4_1T-kvm-thin -wi-a-   1.00T                                     
  lv_cluster3_disk5             vg_cluster3_disk5             -wi-a- 555.00G                                     
  lv_cluster3_disk6             vg_cluster3_disk6             -wi-ao   2.00T                                     
  lv_cluster3_disk8             vg_cluster3_disk8             -wi-ao 
<div>
<p>Fellow Cluster Compatriots,<br>I'm looking for some guidance here. Whenever my rhel 5.7 cluster get's into "LEAVE_START_WAIT" on on a given iscsi volume, the following occurs: <br></p>
<ol>
<li>I can't r/w io to the volume.</li>
<li>Can't unmount it, from any node.</li>
<li>In flight/pending IO's are impossible to determine or kill since lsof on the mount fails. Basically all IO operations stall/fail.<br>
</li>
</ol>
<p>So my questions are:</p>
<ol>
<li>What does the output from group_tool -v really indicate, "00030005 LEAVE_START_WAIT 12 c000b0002 1" ? Man on group_tool doesn't list these fields.<br>
</li>
<li>Does anyone have a list of what these fields represent ?</li>
<li>Corrective actions. How do i get out of this state without rebooting the entire cluster ?</li>
<li>Is it possible to determine the offending node ?<br>
</li>
</ol>thanks,<br>-Cedric<br><br><br>//misc output<br><br>
root <at> bl13-node13:~# clustat <br>Cluster Status for cluster3  <at>  Sat Jun&nbsp; 2 20:47:08 2012<br>Member Status: Quorate<br><br>&nbsp;Member Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ID&nbsp;&nbsp; Status<br>&nbsp;------ ----&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ---- ------<br>

bl01-node01&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1 Online, rgmanager<br>&nbsp;bl04-node04&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 4 Online, rgmanager<br>&nbsp;bl05-node05&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5 Online, rgmanager<br>&nbsp;bl06-node06&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 6 Online, rgmanager<br>
&nbsp;bl07-node07&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 7 Online, rgmanager<br>&nbsp;bl08-node08&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 8 Online, rgmanager<br>&nbsp;bl09-node09&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 9 Online, rgmanager<br>&nbsp;bl10-node10&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 10 Online, rgmanager<br>
&nbsp;bl11-node11&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 11 Online, rgmanager<br>&nbsp;bl12-node12&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 12 Online, rgmanager<br>&nbsp;bl13-node13&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 13 Online, Local, rgmanager<br>
&nbsp;bl14-node14&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 14 Online, rgmanager<br>&nbsp;bl15-node15&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 15 Online, rgmanager<br><br><br>&nbsp;Service Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Owner (Last)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; State&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>
&nbsp;------- ----&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ----- ------&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -----&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>&nbsp;service:httpd&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; bl05-node05&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; started&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>
&nbsp;service:nfs_disk2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; bl08-node08&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; started <br><br><br>root <at> bl13-node13:~# group_tool -v<br>
type&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; level name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; id&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; state node id local_done<br>fence&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0&nbsp;&nbsp;&nbsp;&nbsp; default&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0001000d none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; clvmd&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0001000c none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk1&nbsp; 00020005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[4 5 6 7 8 9 10 11 12 13 14 15]<br>dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk2&nbsp; 00040005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[4 5 6 7 8 9 10 11 13 14 15]<br>

dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk7&nbsp; 00060005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk8&nbsp; 00080005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk9&nbsp; 000a0005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; disk10&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 000c0005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; rgmanager&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0001000a none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>

dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk3&nbsp; 00020001 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 5 6 7 8 9 10 11 12 13]<br>dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk6&nbsp; 00020008 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>gfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk1&nbsp; 00010005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

[4 5 6 7 8 9 10 11 12 13 14 15]<br>gfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk2&nbsp; 00030005 LEAVE_START_WAIT 12 c000b0002 1<br>[4 5 6 7 8 9 10 11 13 14 15]<br>gfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk7&nbsp; 00050005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>gfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk8&nbsp; 00070005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>gfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk9&nbsp; 00090005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>

gfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp; disk10&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 000b0005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>gfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk3&nbsp; 00010001 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 5 6 7 8 9 10 11 12 13]<br>gfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk6&nbsp; 00010008 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

[1 4 5 6 7 8 9 10 11 12 13 14 15]<br><br>root <at> bl13-node13:~# gfs2_tool list<br>253:15 cluster3:cluster3_disk6<br>253:16 cluster3:cluster3_disk3<br>253:18 cluster3:disk10<br>253:17 cluster3:cluster3_disk9<br>253:19 cluster3:cluster3_disk8<br>

253:21 cluster3:cluster3_disk7<br>253:22 cluster3:cluster3_disk2<br>253:23 cluster3:cluster3_disk1<br><br>root <at> bl13-node13:~# lvs<br>&nbsp;&nbsp;&nbsp; Logging initialised at Sat Jun&nbsp; 2 20:50:03 2012<br>&nbsp;&nbsp;&nbsp; Set umask from 0022 to 0077<br>

&nbsp;&nbsp;&nbsp; Finding all logical volumes<br>&nbsp; LV&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; VG&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Attr&nbsp;&nbsp; LSize&nbsp;&nbsp; Origin Snap%&nbsp; Move Log Copy%&nbsp; Convert<br>&nbsp; lv_cluster3_Disk7&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_Cluster3_Disk7&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-ao&nbsp;&nbsp; 3.00T&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

&nbsp; lv_cluster3_Disk9&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_Cluster3_Disk9&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-ao 200.01G&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>&nbsp; lv_Cluster3_libvert&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_Cluster3_libvert&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-a- 100.00G&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

&nbsp; lv_cluster3_disk1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_cluster3_disk1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-ao 100.00G&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>&nbsp; lv_cluster3_disk10&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_cluster3_disk10&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-ao&nbsp; 15.00T&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

&nbsp; lv_cluster3_disk2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_cluster3_disk2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-ao 220.00G&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>&nbsp; lv_cluster3_disk3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_cluster3_disk3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-ao 330.00G&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

&nbsp; lv_cluster3_disk4_1T-kvm-thin vg_cluster3_disk4_1T-kvm-thin -wi-a-&nbsp;&nbsp; 1.00T&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>&nbsp; lv_cluster3_disk5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_cluster3_disk5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-a- 555.00G&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

&nbsp; lv_cluster3_disk6&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_cluster3_disk6&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-ao&nbsp;&nbsp; 2.00T&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>&nbsp; lv_cluster3_disk8&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_cluster3_disk8&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-ao&nbsp;
emmanuel segura | 3 Jun 2012 19:17
Picon

Re: Rhel 5.7 Cluster - gfs2 volume in "LEAVE_START_WAIT" status

Hello Cedric

Are you using gfs or gfs2? if you are using gfs  i recommend to use gfs2

2012/6/3 Cedric Kimaru <rhel_cluster <at> ckimaru.com>
Fellow Cluster Compatriots,
I'm looking for some guidance here. Whenever my rhel 5.7 cluster get's into "LEAVE_START_WAIT" on on a given iscsi volume, the following occurs:
  1. I can't r/w io to the volume.
  2. Can't unmount it, from any node.
  3. In flight/pending IO's are impossible to determine or kill since lsof on the mount fails. Basically all IO operations stall/fail.

So my questions are:

  1. What does the output from group_tool -v really indicate, "00030005 LEAVE_START_WAIT 12 c000b0002 1" ? Man on group_tool doesn't list these fields.
  2. Does anyone have a list of what these fields represent ?
  3. Corrective actions. How do i get out of this state without rebooting the entire cluster ?
  4. Is it possible to determine the offending node ?
thanks,
-Cedric


//misc output

root <at> bl13-node13:~# clustat
Cluster Status for cluster3 <at> Sat Jun  2 20:47:08 2012
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
bl01-node01                                      1 Online, rgmanager
 bl04-node04                                      4 Online, rgmanager
 bl05-node05                                      5 Online, rgmanager
 bl06-node06                                      6 Online, rgmanager
 bl07-node07                                      7 Online, rgmanager
 bl08-node08                                      8 Online, rgmanager
 bl09-node09                                      9 Online, rgmanager
 bl10-node10                                     10 Online, rgmanager
 bl11-node11                                     11 Online, rgmanager
 bl12-node12                                     12 Online, rgmanager
 bl13-node13                                     13 Online, Local, rgmanager
 bl14-node14                                     14 Online, rgmanager
 bl15-node15                                     15 Online, rgmanager


 Service Name                                                 Owner (Last)                                                 State       
 ------- ----                                                 ----- ------                                                 -----       
 service:httpd                                                bl05-node05                               started     
 service:nfs_disk2                                         bl08-node08                               started


root <at> bl13-node13:~# group_tool -v
type             level name            id       state node id local_done
fence            0     default         0001000d none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
dlm              1     clvmd           0001000c none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
dlm              1     cluster3_disk1  00020005 none       
[4 5 6 7 8 9 10 11 12 13 14 15]
dlm              1     cluster3_disk2  00040005 none       
[4 5 6 7 8 9 10 11 13 14 15]
dlm              1     cluster3_disk7  00060005 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
dlm              1     cluster3_disk8  00080005 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
dlm              1     cluster3_disk9  000a0005 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
dlm              1     disk10          000c0005 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
dlm              1     rgmanager       0001000a none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
dlm              1     cluster3_disk3  00020001 none       
[1 5 6 7 8 9 10 11 12 13]
dlm              1     cluster3_disk6  00020008 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
gfs              2     cluster3_disk1  00010005 none       
[4 5 6 7 8 9 10 11 12 13 14 15]
gfs              2     cluster3_disk2  00030005 LEAVE_START_WAIT 12 c000b0002 1
[4 5 6 7 8 9 10 11 13 14 15]
gfs              2     cluster3_disk7  00050005 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
gfs              2     cluster3_disk8  00070005 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
gfs              2     cluster3_disk9  00090005 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
gfs              2     disk10          000b0005 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
gfs              2     cluster3_disk3  00010001 none       
[1 5 6 7 8 9 10 11 12 13]
gfs              2     cluster3_disk6  00010008 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]

root <at> bl13-node13:~# gfs2_tool list
253:15 cluster3:cluster3_disk6
253:16 cluster3:cluster3_disk3
253:18 cluster3:disk10
253:17 cluster3:cluster3_disk9
253:19 cluster3:cluster3_disk8
253:21 cluster3:cluster3_disk7
253:22 cluster3:cluster3_disk2
253:23 cluster3:cluster3_disk1

root <at> bl13-node13:~# lvs
    Logging initialised at Sat Jun  2 20:50:03 2012
    Set umask from 0022 to 0077
    Finding all logical volumes
  LV                            VG                            Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  lv_cluster3_Disk7             vg_Cluster3_Disk7             -wi-ao   3.00T                                     
  lv_cluster3_Disk9             vg_Cluster3_Disk9             -wi-ao 200.01G                                     
  lv_Cluster3_libvert           vg_Cluster3_libvert           -wi-a- 100.00G                                     
  lv_cluster3_disk1             vg_cluster3_disk1             -wi-ao 100.00G                                     
  lv_cluster3_disk10            vg_cluster3_disk10            -wi-ao  15.00T                                     
  lv_cluster3_disk2             vg_cluster3_disk2             -wi-ao 220.00G                                     
  lv_cluster3_disk3             vg_cluster3_disk3             -wi-ao 330.00G                                     
  lv_cluster3_disk4_1T-kvm-thin vg_cluster3_disk4_1T-kvm-thin -wi-a-   1.00T                                     
  lv_cluster3_disk5             vg_cluster3_disk5             -wi-a- 555.00G                                     
  lv_cluster3_disk6             vg_cluster3_disk6             -wi-ao   2.00T                                     
  lv_cluster3_disk8             vg_cluster3_disk8             -wi-ao   2.00T


--
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



--
esta es mi vida e me la vivo hasta que dios quiera
<div>
<p>Hello Cedric<br><br>Are you using gfs or gfs2? if you are using gfs&nbsp; i recommend to use gfs2<br><br></p>
<div class="gmail_quote">2012/6/3 Cedric Kimaru <span dir="ltr">&lt;<a href="mailto:rhel_cluster <at> ckimaru.com" target="_blank">rhel_cluster <at> ckimaru.com</a>&gt;</span><br><blockquote class="gmail_quote">Fellow Cluster Compatriots,<br>I'm looking for some guidance here. Whenever my rhel 5.7 cluster get's into "LEAVE_START_WAIT" on on a given iscsi volume, the following occurs: <br><ol>
<li>I can't r/w io to the volume.</li>
<li>Can't unmount it, from any node.</li>
<li>In flight/pending IO's are impossible to determine or kill since lsof on the mount fails. Basically all IO operations stall/fail.<br>
</li>
</ol>
<p>So my questions are:</p>
<ol>
<li>What does the output from group_tool -v really indicate, "00030005 LEAVE_START_WAIT 12 c000b0002 1" ? Man on group_tool doesn't list these fields.<br>
</li>
<li>Does anyone have a list of what these fields represent ?</li>
<li>Corrective actions. How do i get out of this state without rebooting the entire cluster ?</li>
<li>Is it possible to determine the offending node ?<br>
</li>
</ol>thanks,<br>-Cedric<br><br><br>//misc output<br><br>
root <at> bl13-node13:~# clustat <br>Cluster Status for cluster3  <at>  Sat Jun&nbsp; 2 20:47:08 2012<br>Member Status: Quorate<br><br>&nbsp;Member Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ID&nbsp;&nbsp; Status<br>&nbsp;------ ----&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ---- ------<br>

bl01-node01&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1 Online, rgmanager<br>&nbsp;bl04-node04&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 4 Online, rgmanager<br>&nbsp;bl05-node05&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5 Online, rgmanager<br>&nbsp;bl06-node06&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 6 Online, rgmanager<br>

&nbsp;bl07-node07&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 7 Online, rgmanager<br>&nbsp;bl08-node08&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 8 Online, rgmanager<br>&nbsp;bl09-node09&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 9 Online, rgmanager<br>&nbsp;bl10-node10&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 10 Online, rgmanager<br>

&nbsp;bl11-node11&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 11 Online, rgmanager<br>&nbsp;bl12-node12&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 12 Online, rgmanager<br>&nbsp;bl13-node13&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 13 Online, Local, rgmanager<br>

&nbsp;bl14-node14&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 14 Online, rgmanager<br>&nbsp;bl15-node15&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 15 Online, rgmanager<br><br><br>&nbsp;Service Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Owner (Last)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; State&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

&nbsp;------- ----&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ----- ------&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -----&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>&nbsp;service:httpd&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; bl05-node05&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; started&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

&nbsp;service:nfs_disk2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; bl08-node08&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; started <br><br><br>root <at> bl13-node13:~# group_tool -v<br>
type&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; level name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; id&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; state node id local_done<br>fence&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0&nbsp;&nbsp;&nbsp;&nbsp; default&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0001000d none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; clvmd&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0001000c none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk1&nbsp; 00020005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[4 5 6 7 8 9 10 11 12 13 14 15]<br>dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk2&nbsp; 00040005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[4 5 6 7 8 9 10 11 13 14 15]<br>

dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk7&nbsp; 00060005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk8&nbsp; 00080005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk9&nbsp; 000a0005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; disk10&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 000c0005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; rgmanager&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0001000a none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>

dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk3&nbsp; 00020001 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 5 6 7 8 9 10 11 12 13]<br>dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk6&nbsp; 00020008 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>gfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk1&nbsp; 00010005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

[4 5 6 7 8 9 10 11 12 13 14 15]<br>gfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk2&nbsp; 00030005 LEAVE_START_WAIT 12 c000b0002 1<br>[4 5 6 7 8 9 10 11 13 14 15]<br>gfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk7&nbsp; 00050005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>gfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk8&nbsp; 00070005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>gfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk9&nbsp; 00090005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>

gfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp; disk10&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 000b0005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>gfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk3&nbsp; 00010001 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 5 6 7 8 9 10 11 12 13]<br>gfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk6&nbsp; 00010008 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

[1 4 5 6 7 8 9 10 11 12 13 14 15]<br><br>root <at> bl13-node13:~# gfs2_tool list<br>253:15 cluster3:cluster3_disk6<br>253:16 cluster3:cluster3_disk3<br>253:18 cluster3:disk10<br>253:17 cluster3:cluster3_disk9<br>253:19 cluster3:cluster3_disk8<br>

253:21 cluster3:cluster3_disk7<br>253:22 cluster3:cluster3_disk2<br>253:23 cluster3:cluster3_disk1<br><br>root <at> bl13-node13:~# lvs<br>&nbsp;&nbsp;&nbsp; Logging initialised at Sat Jun&nbsp; 2 20:50:03 2012<br>&nbsp;&nbsp;&nbsp; Set umask from 0022 to 0077<br>

&nbsp;&nbsp;&nbsp; Finding all logical volumes<br>&nbsp; LV&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; VG&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Attr&nbsp;&nbsp; LSize&nbsp;&nbsp; Origin Snap%&nbsp; Move Log Copy%&nbsp; Convert<br>&nbsp; lv_cluster3_Disk7&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_Cluster3_Disk7&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-ao&nbsp;&nbsp; 3.00T&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

&nbsp; lv_cluster3_Disk9&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_Cluster3_Disk9&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-ao 200.01G&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>&nbsp; lv_Cluster3_libvert&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_Cluster3_libvert&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-a- 100.00G&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

&nbsp; lv_cluster3_disk1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_cluster3_disk1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-ao 100.00G&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>&nbsp; lv_cluster3_disk10&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_cluster3_disk10&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-ao&nbsp; 15.00T&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

&nbsp; lv_cluster3_disk2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_cluster3_disk2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-ao 220.00G&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>&nbsp; lv_cluster3_disk3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_cluster3_disk3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-ao 330.00G&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

&nbsp; lv_cluster3_disk4_1T-kvm-thin vg_cluster3_disk4_1T-kvm-thin -wi-a-&nbsp;&nbsp; 1.00T&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>&nbsp; lv_cluster3_disk5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_cluster3_disk5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-a- 555.00G&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

&nbsp; lv_cluster3_disk6&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_cluster3_disk6&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-ao&nbsp;&nbsp; 2.00T&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>&nbsp; lv_cluster3_disk8&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_cluster3_disk8&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-ao&nbsp;&nbsp; 2.00T <br><br><br>--<br>
Linux-cluster mailing list<br><a href="mailto:Linux-cluster <at> redhat.com">Linux-cluster <at> redhat.com</a><br><a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a><br>
</blockquote>
</div>
<br><br clear="all"><br>-- <br>esta es mi vida e me la vivo hasta que dios quiera<br>
</div>
Cedric Kimaru | 4 Jun 2012 15:29

Re: Rhel 5.7 Cluster - gfs2 volume in "LEAVE_START_WAIT" status

Hi Emmanuel,
 Yes, i'm running gfs2. I'm also trying this out on Rhel 6.2 with three nodes so see if this happens upstream.
Looks like i may have to open a BZ to get more info on this.
 
root <at> bl13-node13:~# gfs2_tool list
253:15 cluster3:cluster3_disk6
253:16 cluster3:cluster3_disk3
253:18 cluster3:disk10
253:17 cluster3:cluster3_disk9
253:19 cluster3:cluster3_disk8
253:21 cluster3:cluster3_disk7
253:22 cluster3:cluster3_disk2
253:23 cluster3:cluster3_disk1

thanks,
-Cedric

On Sun, Jun 3, 2012 at 1:17 PM, emmanuel segura <emi2fast <at> gmail.com> wrote:
Hello Cedric

Are you using gfs or gfs2? if you are using gfs  i recommend to use gfs2

2012/6/3 Cedric Kimaru <rhel_cluster <at> ckimaru.com>
Fellow Cluster Compatriots,
I'm looking for some guidance here. Whenever my rhel 5.7 cluster get's into "LEAVE_START_WAIT" on on a given iscsi volume, the following occurs:
  1. I can't r/w io to the volume.
  2. Can't unmount it, from any node.
  3. In flight/pending IO's are impossible to determine or kill since lsof on the mount fails. Basically all IO operations stall/fail.

So my questions are:

  1. What does the output from group_tool -v really indicate, "00030005 LEAVE_START_WAIT 12 c000b0002 1" ? Man on group_tool doesn't list these fields.
  2. Does anyone have a list of what these fields represent ?
  3. Corrective actions. How do i get out of this state without rebooting the entire cluster ?
  4. Is it possible to determine the offending node ?
thanks,
-Cedric


//misc output

root <at> bl13-node13:~# clustat
Cluster Status for cluster3 <at> Sat Jun  2 20:47:08 2012
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
bl01-node01                                      1 Online, rgmanager
 bl04-node04                                      4 Online, rgmanager
 bl05-node05                                      5 Online, rgmanager
 bl06-node06                                      6 Online, rgmanager
 bl07-node07                                      7 Online, rgmanager
 bl08-node08                                      8 Online, rgmanager
 bl09-node09                                      9 Online, rgmanager
 bl10-node10                                     10 Online, rgmanager
 bl11-node11                                     11 Online, rgmanager
 bl12-node12                                     12 Online, rgmanager
 bl13-node13                                     13 Online, Local, rgmanager
 bl14-node14                                     14 Online, rgmanager
 bl15-node15                                     15 Online, rgmanager


 Service Name                                                 Owner (Last)                                                 State       
 ------- ----                                                 ----- ------                                                 -----       
 service:httpd                                                bl05-node05                               started     
 service:nfs_disk2                                         bl08-node08                               started


root <at> bl13-node13:~# group_tool -v
type             level name            id       state node id local_done
fence            0     default         0001000d none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
dlm              1     clvmd           0001000c none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
dlm              1     cluster3_disk1  00020005 none       
[4 5 6 7 8 9 10 11 12 13 14 15]
dlm              1     cluster3_disk2  00040005 none       
[4 5 6 7 8 9 10 11 13 14 15]
dlm              1     cluster3_disk7  00060005 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
dlm              1     cluster3_disk8  00080005 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
dlm              1     cluster3_disk9  000a0005 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
dlm              1     disk10          000c0005 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
dlm              1     rgmanager       0001000a none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
dlm              1     cluster3_disk3  00020001 none       
[1 5 6 7 8 9 10 11 12 13]
dlm              1     cluster3_disk6  00020008 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
gfs              2     cluster3_disk1  00010005 none       
[4 5 6 7 8 9 10 11 12 13 14 15]
gfs              2     cluster3_disk2  00030005 LEAVE_START_WAIT 12 c000b0002 1
[4 5 6 7 8 9 10 11 13 14 15]
gfs              2     cluster3_disk7  00050005 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
gfs              2     cluster3_disk8  00070005 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
gfs              2     cluster3_disk9  00090005 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
gfs              2     disk10          000b0005 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]
gfs              2     cluster3_disk3  00010001 none       
[1 5 6 7 8 9 10 11 12 13]
gfs              2     cluster3_disk6  00010008 none       
[1 4 5 6 7 8 9 10 11 12 13 14 15]

root <at> bl13-node13:~# gfs2_tool list
253:15 cluster3:cluster3_disk6
253:16 cluster3:cluster3_disk3
253:18 cluster3:disk10
253:17 cluster3:cluster3_disk9
253:19 cluster3:cluster3_disk8
253:21 cluster3:cluster3_disk7
253:22 cluster3:cluster3_disk2
253:23 cluster3:cluster3_disk1

root <at> bl13-node13:~# lvs
    Logging initialised at Sat Jun  2 20:50:03 2012
    Set umask from 0022 to 0077
    Finding all logical volumes
  LV                            VG                            Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  lv_cluster3_Disk7             vg_Cluster3_Disk7             -wi-ao   3.00T                                     
  lv_cluster3_Disk9             vg_Cluster3_Disk9             -wi-ao 200.01G                                     
  lv_Cluster3_libvert           vg_Cluster3_libvert           -wi-a- 100.00G                                     
  lv_cluster3_disk1             vg_cluster3_disk1             -wi-ao 100.00G                                     
  lv_cluster3_disk10            vg_cluster3_disk10            -wi-ao  15.00T                                     
  lv_cluster3_disk2             vg_cluster3_disk2             -wi-ao 220.00G                                     
  lv_cluster3_disk3             vg_cluster3_disk3             -wi-ao 330.00G                                     
  lv_cluster3_disk4_1T-kvm-thin vg_cluster3_disk4_1T-kvm-thin -wi-a-   1.00T                                     
  lv_cluster3_disk5             vg_cluster3_disk5             -wi-a- 555.00G                                     
  lv_cluster3_disk6             vg_cluster3_disk6             -wi-ao   2.00T                                     
  lv_cluster3_disk8             vg_cluster3_disk8             -wi-ao   2.00T


--
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



--
esta es mi vida e me la vivo hasta que dios quiera

--
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

<div>
<p>Hi Emmanuel,<br>&nbsp;Yes, i'm running gfs2. I'm also trying this out on Rhel 6.2 with three nodes so see if this happens upstream.<br>Looks like i may have to open a BZ to get more info on this.<br>&nbsp;<br>root <at> bl13-node13:~# gfs2_tool list<br>
253:15 cluster3:cluster3_disk6<br>253:16 cluster3:cluster3_disk3<br>253:18 cluster3:disk10<br>253:17 cluster3:cluster3_disk9<br>253:19 cluster3:cluster3_disk8<br>

253:21 cluster3:cluster3_disk7<br>253:22 cluster3:cluster3_disk2<br>253:23 cluster3:cluster3_disk1<br><br>thanks,<br>-Cedric<br></p>
<div class="gmail_quote">On Sun, Jun 3, 2012 at 1:17 PM, emmanuel segura <span dir="ltr">&lt;<a href="mailto:emi2fast <at> gmail.com" target="_blank">emi2fast <at> gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote">Hello Cedric<br><br>Are you using gfs or gfs2? if you are using gfs&nbsp; i recommend to use gfs2<br><br><div class="gmail_quote">
2012/6/3 Cedric Kimaru <span dir="ltr">&lt;<a href="mailto:rhel_cluster <at> ckimaru.com" target="_blank">rhel_cluster <at> ckimaru.com</a>&gt;</span><br><blockquote class="gmail_quote">
<div><div class="h5">Fellow Cluster Compatriots,<br>I'm looking for some guidance here. Whenever my rhel 5.7 cluster get's into "LEAVE_START_WAIT" on on a given iscsi volume, the following occurs: <br><ol>
<li>I can't r/w io to the volume.</li>
<li>Can't unmount it, from any node.</li>
<li>In flight/pending IO's are impossible to determine or kill since lsof on the mount fails. Basically all IO operations stall/fail.<br>
</li>
</ol>
<p>So my questions are:</p>
<ol>
<li>What does the output from group_tool -v really indicate, "00030005 LEAVE_START_WAIT 12 c000b0002 1" ? Man on group_tool doesn't list these fields.<br>
</li>
<li>Does anyone have a list of what these fields represent ?</li>
<li>Corrective actions. How do i get out of this state without rebooting the entire cluster ?</li>
<li>Is it possible to determine the offending node ?<br>
</li>
</ol>thanks,<br>-Cedric<br><br><br>//misc output<br><br>
root <at> bl13-node13:~# clustat <br>Cluster Status for cluster3  <at>  Sat Jun&nbsp; 2 20:47:08 2012<br>Member Status: Quorate<br><br>&nbsp;Member Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ID&nbsp;&nbsp; Status<br>&nbsp;------ ----&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ---- ------<br>

bl01-node01&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1 Online, rgmanager<br>&nbsp;bl04-node04&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 4 Online, rgmanager<br>&nbsp;bl05-node05&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5 Online, rgmanager<br>&nbsp;bl06-node06&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 6 Online, rgmanager<br>

&nbsp;bl07-node07&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 7 Online, rgmanager<br>&nbsp;bl08-node08&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 8 Online, rgmanager<br>&nbsp;bl09-node09&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 9 Online, rgmanager<br>&nbsp;bl10-node10&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 10 Online, rgmanager<br>

&nbsp;bl11-node11&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 11 Online, rgmanager<br>&nbsp;bl12-node12&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 12 Online, rgmanager<br>&nbsp;bl13-node13&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 13 Online, Local, rgmanager<br>

&nbsp;bl14-node14&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 14 Online, rgmanager<br>&nbsp;bl15-node15&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 15 Online, rgmanager<br><br><br>&nbsp;Service Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Owner (Last)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; State&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

&nbsp;------- ----&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ----- ------&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -----&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>&nbsp;service:httpd&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; bl05-node05&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; started&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

&nbsp;service:nfs_disk2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; bl08-node08&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; started <br><br><br>root <at> bl13-node13:~# group_tool -v<br>
type&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; level name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; id&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; state node id local_done<br>fence&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0&nbsp;&nbsp;&nbsp;&nbsp; default&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0001000d none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; clvmd&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0001000c none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk1&nbsp; 00020005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[4 5 6 7 8 9 10 11 12 13 14 15]<br>dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk2&nbsp; 00040005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[4 5 6 7 8 9 10 11 13 14 15]<br>

dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk7&nbsp; 00060005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk8&nbsp; 00080005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk9&nbsp; 000a0005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; disk10&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 000c0005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; rgmanager&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0001000a none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>

dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk3&nbsp; 00020001 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 5 6 7 8 9 10 11 12 13]<br>dlm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk6&nbsp; 00020008 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>gfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk1&nbsp; 00010005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

[4 5 6 7 8 9 10 11 12 13 14 15]<br>gfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk2&nbsp; 00030005 LEAVE_START_WAIT 12 c000b0002 1<br>[4 5 6 7 8 9 10 11 13 14 15]<br>gfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk7&nbsp; 00050005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>gfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk8&nbsp; 00070005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>gfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk9&nbsp; 00090005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>

gfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp; disk10&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 000b0005 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 4 5 6 7 8 9 10 11 12 13 14 15]<br>gfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk3&nbsp; 00010001 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>[1 5 6 7 8 9 10 11 12 13]<br>gfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp; cluster3_disk6&nbsp; 00010008 none&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

[1 4 5 6 7 8 9 10 11 12 13 14 15]<br><br>root <at> bl13-node13:~# gfs2_tool list<br>253:15 cluster3:cluster3_disk6<br>253:16 cluster3:cluster3_disk3<br>253:18 cluster3:disk10<br>253:17 cluster3:cluster3_disk9<br>253:19 cluster3:cluster3_disk8<br>

253:21 cluster3:cluster3_disk7<br>253:22 cluster3:cluster3_disk2<br>253:23 cluster3:cluster3_disk1<br><br>root <at> bl13-node13:~# lvs<br>&nbsp;&nbsp;&nbsp; Logging initialised at Sat Jun&nbsp; 2 20:50:03 2012<br>&nbsp;&nbsp;&nbsp; Set umask from 0022 to 0077<br>

&nbsp;&nbsp;&nbsp; Finding all logical volumes<br>&nbsp; LV&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; VG&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Attr&nbsp;&nbsp; LSize&nbsp;&nbsp; Origin Snap%&nbsp; Move Log Copy%&nbsp; Convert<br>&nbsp; lv_cluster3_Disk7&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_Cluster3_Disk7&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-ao&nbsp;&nbsp; 3.00T&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

&nbsp; lv_cluster3_Disk9&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_Cluster3_Disk9&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-ao 200.01G&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>&nbsp; lv_Cluster3_libvert&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_Cluster3_libvert&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-a- 100.00G&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

&nbsp; lv_cluster3_disk1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_cluster3_disk1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-ao 100.00G&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>&nbsp; lv_cluster3_disk10&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_cluster3_disk10&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-ao&nbsp; 15.00T&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

&nbsp; lv_cluster3_disk2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_cluster3_disk2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-ao 220.00G&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>&nbsp; lv_cluster3_disk3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_cluster3_disk3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-ao 330.00G&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

&nbsp; lv_cluster3_disk4_1T-kvm-thin vg_cluster3_disk4_1T-kvm-thin -wi-a-&nbsp;&nbsp; 1.00T&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>&nbsp; lv_cluster3_disk5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_cluster3_disk5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-a- 555.00G&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

&nbsp; lv_cluster3_disk6&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_cluster3_disk6&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-ao&nbsp;&nbsp; 2.00T&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>&nbsp; lv_cluster3_disk8&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vg_cluster3_disk8&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -wi-ao&nbsp;&nbsp; 2.00T <br><br><br>
</div></div>
<span class="HOEnZb">--<br>
Linux-cluster mailing list<br><a href="mailto:Linux-cluster <at> redhat.com" target="_blank">Linux-cluster <at> redhat.com</a><br><a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a><br></span>
</blockquote>
</div>
<span class="HOEnZb"><br><br clear="all"><br>-- <br>esta es mi vida e me la vivo hasta que dios quiera<br></span><br>--<br>
Linux-cluster mailing list<br><a href="mailto:Linux-cluster <at> redhat.com">Linux-cluster <at> redhat.com</a><br><a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a><br>
</blockquote>
</div>
<br>
</div>
Dan Riley | 4 Jun 2012 16:52
Picon

Re: Rhel 5.7 Cluster - gfs2 volume in "LEAVE_START_WAIT" status

Hi Cedric,

About the only doc I've found that describes the barrier state transitions is in the cluster2 architecture doc

http://people.redhat.com/teigland/cluster2-arch.txt

When group membership changes, there's a barrier operation that stops the group, changes the membership,
and restarts the group, so that all members agree on the membership change synchronization. 
LEAVE_START_WAIT means that a node (12) left the group, but restarting the group hasn't completed
because not all the nodes have acknowledged agreement.  You should do 'group_tool -v' on the different
nodes of the cluster and look for a node where the final 'local_done' flag is 0, or where the group
membership is inconsistent with the other nodes.  Dumping the debug buffer for the group on the various
nodes may also identify which node is being waited on.  In the cases where we've found inconsistent group
membership, fencing the node with the inconsistency let the group finish starting.

[as an aside--is there a plan to reengineer the RH cluster group membership protocol stack to take
advantage of the virtual synchrony capabilities of Corosync/TOTEM?]

-dan

On Jun 2, 2012, at 9:25 PM, Cedric Kimaru wrote:

> Fellow Cluster Compatriots,
> I'm looking for some guidance here. Whenever my rhel 5.7 cluster get's into "LEAVE_START_WAIT" on on a
given iscsi volume, the following occurs: 
> 	• I can't r/w io to the volume.
> 	• Can't unmount it, from any node.
> 	• In flight/pending IO's are impossible to determine or kill since lsof on the mount fails. Basically
all IO operations stall/fail.
> So my questions are:
> 
> 	• What does the output from group_tool -v really indicate, "00030005 LEAVE_START_WAIT 12 c000b0002
1" ? Man on group_tool doesn't list these fields.
> 	• Does anyone have a list of what these fields represent ?
> 	• Corrective actions. How do i get out of this state without rebooting the entire cluster ?
> 	• Is it possible to determine the offending node ?
> thanks,
> -Cedric
> 
> 
> //misc output
> 
> root <at> bl13-node13:~# group_tool -v
> type             level name            id       state node id local_done
> fence            0     default         0001000d none        
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     clvmd           0001000c none        
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     cluster3_disk1  00020005 none        
> [4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     cluster3_disk2  00040005 none        
> [4 5 6 7 8 9 10 11 13 14 15]
> dlm              1     cluster3_disk7  00060005 none        
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     cluster3_disk8  00080005 none        
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     cluster3_disk9  000a0005 none        
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     disk10          000c0005 none        
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     rgmanager       0001000a none        
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     cluster3_disk3  00020001 none        
> [1 5 6 7 8 9 10 11 12 13]
> dlm              1     cluster3_disk6  00020008 none        
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> gfs              2     cluster3_disk1  00010005 none        
> [4 5 6 7 8 9 10 11 12 13 14 15]
> gfs              2     cluster3_disk2  00030005 LEAVE_START_WAIT 12 c000b0002 1
> [4 5 6 7 8 9 10 11 13 14 15]
> gfs              2     cluster3_disk7  00050005 none        
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> gfs              2     cluster3_disk8  00070005 none        
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> gfs              2     cluster3_disk9  00090005 none        
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> gfs              2     disk10          000b0005 none        
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> gfs              2     cluster3_disk3  00010001 none        
> [1 5 6 7 8 9 10 11 12 13]
> gfs              2     cluster3_disk6  00010008 none        
> [1 4 5 6 7 8 9 10 11 12 13 14 15]

Cedric Kimaru | 5 Jun 2012 16:14

Re: Rhel 5.7 Cluster - gfs2 volume in "LEAVE_START_WAIT" status

Hi Dan,
 Thanks for the response and breadcrumb. The link to Davids document will hopefully shed more light into this state.
I tried fencing the node with the pending sync restart, 12 in my case, but that didn't seem to get the volume out of the weeds. Attempting to restart from other nodes gfs2 also fails since it has to unmount, which it can't ... weeds, weeds, weeds.

Now, Could elaborate on which diags you are referring to, glock ?

thanks,
-Cedric

On Mon, Jun 4, 2012 at 10:52 AM, Dan Riley <dan131riley <at> gmail.com> wrote:
Hi Cedric,

About the only doc I've found that describes the barrier state transitions is in the cluster2 architecture doc

http://people.redhat.com/teigland/cluster2-arch.txt

When group membership changes, there's a barrier operation that stops the group, changes the membership, and restarts the group, so that all members agree on the membership change synchronization.  LEAVE_START_WAIT means that a node (12) left the group, but restarting the group hasn't completed because not all the nodes have acknowledged agreement.  You should do 'group_tool -v' on the different nodes of the cluster and look for a node where the final 'local_done' flag is 0, or where the group membership is inconsistent with the other nodes.  Dumping the debug buffer for the group on the various nodes may also identify which node is being waited on.  In the cases where we've found inconsistent group membership, fencing the node with the inconsistency let the group finish starting.

[as an aside--is there a plan to reengineer the RH cluster group membership protocol stack to take advantage of the virtual synchrony capabilities of Corosync/TOTEM?]

-dan

On Jun 2, 2012, at 9:25 PM, Cedric Kimaru wrote:

> Fellow Cluster Compatriots,
> I'm looking for some guidance here. Whenever my rhel 5.7 cluster get's into "LEAVE_START_WAIT" on on a given iscsi volume, the following occurs:
>       • I can't r/w io to the volume.
>       • Can't unmount it, from any node.
>       • In flight/pending IO's are impossible to determine or kill since lsof on the mount fails. Basically all IO operations stall/fail.
> So my questions are:
>
>       • What does the output from group_tool -v really indicate, "00030005 LEAVE_START_WAIT 12 c000b0002 1" ? Man on group_tool doesn't list these fields.
>       • Does anyone have a list of what these fields represent ?
>       • Corrective actions. How do i get out of this state without rebooting the entire cluster ?
>       • Is it possible to determine the offending node ?
> thanks,
> -Cedric
>
>
> //misc output
>
> root <at> bl13-node13:~# group_tool -v
> type             level name            id       state node id local_done
> fence            0     default         0001000d none
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     clvmd           0001000c none
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     cluster3_disk1  00020005 none
> [4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     cluster3_disk2  00040005 none
> [4 5 6 7 8 9 10 11 13 14 15]
> dlm              1     cluster3_disk7  00060005 none
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     cluster3_disk8  00080005 none
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     cluster3_disk9  000a0005 none
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     disk10          000c0005 none
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     rgmanager       0001000a none
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     cluster3_disk3  00020001 none
> [1 5 6 7 8 9 10 11 12 13]
> dlm              1     cluster3_disk6  00020008 none
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> gfs              2     cluster3_disk1  00010005 none
> [4 5 6 7 8 9 10 11 12 13 14 15]
> gfs              2     cluster3_disk2  00030005 LEAVE_START_WAIT 12 c000b0002 1
> [4 5 6 7 8 9 10 11 13 14 15]
> gfs              2     cluster3_disk7  00050005 none
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> gfs              2     cluster3_disk8  00070005 none
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> gfs              2     cluster3_disk9  00090005 none
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> gfs              2     disk10          000b0005 none
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> gfs              2     cluster3_disk3  00010001 none
> [1 5 6 7 8 9 10 11 12 13]
> gfs              2     cluster3_disk6  00010008 none
> [1 4 5 6 7 8 9 10 11 12 13 14 15]



<div>
<p>Hi Dan,<br>&nbsp;Thanks for the response and breadcrumb. The link to Davids document will hopefully shed more light into this state.<br>I tried fencing the node with the pending sync restart, 12 in my case, but that didn't seem to get the volume out of the weeds. Attempting to restart from other nodes gfs2 also fails since it has to unmount, which it can't ... weeds, weeds, weeds.<br><br>Now, Could elaborate on which diags you are referring to, glock ?<br><br>thanks,<br>-Cedric<br><br></p>
<div class="gmail_quote">On Mon, Jun 4, 2012 at 10:52 AM, Dan Riley <span dir="ltr">&lt;<a href="mailto:dan131riley <at> gmail.com" target="_blank">dan131riley <at> gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote">Hi Cedric,<br><br>
About the only doc I've found that describes the barrier state transitions is in the cluster2 architecture doc<br><br><a href="http://people.redhat.com/teigland/cluster2-arch.txt" target="_blank">http://people.redhat.com/teigland/cluster2-arch.txt</a><br><br>
When group membership changes, there's a barrier operation that stops the group, changes the membership, and restarts the group, so that all members agree on the membership change synchronization. &nbsp;LEAVE_START_WAIT means that a node (12) left the group, but restarting the group hasn't completed because not all the nodes have acknowledged agreement. &nbsp;You should do 'group_tool -v' on the different nodes of the cluster and look for a node where the final 'local_done' flag is 0, or where the group membership is inconsistent with the other nodes. &nbsp;Dumping the debug buffer for the group on the various nodes may also identify which node is being waited on. &nbsp;In the cases where we've found inconsistent group membership, fencing the node with the inconsistency let the group finish starting.<br><br>
[as an aside--is there a plan to reengineer the RH cluster group membership protocol stack to take advantage of the virtual synchrony capabilities of Corosync/TOTEM?]<br><span class="HOEnZb"><br>
-dan<br></span><div class="im HOEnZb">
<br>
On Jun 2, 2012, at 9:25 PM, Cedric Kimaru wrote:<br><br>
&gt; Fellow Cluster Compatriots,<br>
&gt; I'm looking for some guidance here. Whenever my rhel 5.7 cluster get's into "LEAVE_START_WAIT" on on a given iscsi volume, the following occurs:<br>
&gt; &nbsp; &nbsp; &nbsp; &bull; I can't r/w io to the volume.<br>
&gt; &nbsp; &nbsp; &nbsp; &bull; Can't unmount it, from any node.<br>
&gt; &nbsp; &nbsp; &nbsp; &bull; In flight/pending IO's are impossible to determine or kill since lsof on the mount fails. Basically all IO operations stall/fail.<br>
&gt; So my questions are:<br>
&gt;<br>
&gt; &nbsp; &nbsp; &nbsp; &bull; What does the output from group_tool -v really indicate, "00030005 LEAVE_START_WAIT 12 c000b0002 1" ? Man on group_tool doesn't list these fields.<br>
&gt; &nbsp; &nbsp; &nbsp; &bull; Does anyone have a list of what these fields represent ?<br>
&gt; &nbsp; &nbsp; &nbsp; &bull; Corrective actions. How do i get out of this state without rebooting the entire cluster ?<br>
&gt; &nbsp; &nbsp; &nbsp; &bull; Is it possible to determine the offending node ?<br>
&gt; thanks,<br>
&gt; -Cedric<br>
&gt;<br>
&gt;<br>
&gt; //misc output<br>
&gt;<br>
</div>
<div class="HOEnZb"><div class="h5">&gt; root <at> bl13-node13:~# group_tool -v<br>
&gt; type &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; level name &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;id &nbsp; &nbsp; &nbsp; state node id local_done<br>
&gt; fence &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; default &nbsp; &nbsp; &nbsp; &nbsp; 0001000d none<br>
&gt; [1 4 5 6 7 8 9 10 11 12 13 14 15]<br>
&gt; dlm &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; clvmd &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0001000c none<br>
&gt; [1 4 5 6 7 8 9 10 11 12 13 14 15]<br>
&gt; dlm &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; cluster3_disk1 &nbsp;00020005 none<br>
&gt; [4 5 6 7 8 9 10 11 12 13 14 15]<br>
&gt; dlm &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; cluster3_disk2 &nbsp;00040005 none<br>
&gt; [4 5 6 7 8 9 10 11 13 14 15]<br>
&gt; dlm &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; cluster3_disk7 &nbsp;00060005 none<br>
&gt; [1 4 5 6 7 8 9 10 11 12 13 14 15]<br>
&gt; dlm &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; cluster3_disk8 &nbsp;00080005 none<br>
&gt; [1 4 5 6 7 8 9 10 11 12 13 14 15]<br>
&gt; dlm &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; cluster3_disk9 &nbsp;000a0005 none<br>
&gt; [1 4 5 6 7 8 9 10 11 12 13 14 15]<br>
&gt; dlm &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; disk10 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;000c0005 none<br>
&gt; [1 4 5 6 7 8 9 10 11 12 13 14 15]<br>
&gt; dlm &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; rgmanager &nbsp; &nbsp; &nbsp; 0001000a none<br>
&gt; [1 4 5 6 7 8 9 10 11 12 13 14 15]<br>
&gt; dlm &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; cluster3_disk3 &nbsp;00020001 none<br>
&gt; [1 5 6 7 8 9 10 11 12 13]<br>
&gt; dlm &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; cluster3_disk6 &nbsp;00020008 none<br>
&gt; [1 4 5 6 7 8 9 10 11 12 13 14 15]<br>
&gt; gfs &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2 &nbsp; &nbsp; cluster3_disk1 &nbsp;00010005 none<br>
&gt; [4 5 6 7 8 9 10 11 12 13 14 15]<br>
&gt; gfs &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2 &nbsp; &nbsp; cluster3_disk2 &nbsp;00030005 LEAVE_START_WAIT 12 c000b0002 1<br>
&gt; [4 5 6 7 8 9 10 11 13 14 15]<br>
&gt; gfs &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2 &nbsp; &nbsp; cluster3_disk7 &nbsp;00050005 none<br>
&gt; [1 4 5 6 7 8 9 10 11 12 13 14 15]<br>
&gt; gfs &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2 &nbsp; &nbsp; cluster3_disk8 &nbsp;00070005 none<br>
&gt; [1 4 5 6 7 8 9 10 11 12 13 14 15]<br>
&gt; gfs &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2 &nbsp; &nbsp; cluster3_disk9 &nbsp;00090005 none<br>
&gt; [1 4 5 6 7 8 9 10 11 12 13 14 15]<br>
&gt; gfs &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2 &nbsp; &nbsp; disk10 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;000b0005 none<br>
&gt; [1 4 5 6 7 8 9 10 11 12 13 14 15]<br>
&gt; gfs &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2 &nbsp; &nbsp; cluster3_disk3 &nbsp;00010001 none<br>
&gt; [1 5 6 7 8 9 10 11 12 13]<br>
&gt; gfs &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2 &nbsp; &nbsp; cluster3_disk6 &nbsp;00010008 none<br>
&gt; [1 4 5 6 7 8 9 10 11 12 13 14 15]<br><br><br>
</div></div>
<div class="HOEnZb"><div class="h5">--<br>
Linux-cluster mailing list<br><a href="mailto:Linux-cluster <at> redhat.com">Linux-cluster <at> redhat.com</a><br><a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a><br>
</div></div>
</blockquote>
</div>
<br>
</div>

Gmane