Malcolm Davis | 14 Jun 2012 00:35

SEDNA: kernal unable to handle kernel paging request

I am able to reproduce paging error consistently with one of our data files.

There are 2 large XML files loading at the same time (100MB & 200MB).

There is a single application with several open connections to the SEDNA
instance.

There is nothing unusual in the SEDNA event.log.

We have over 200+ Amazon instances using the same configuration without a
problem.  Many of the SEDNA instances store much larger files.

SEDNA: 3.5.161

OS: Ubuntu 11.10 (GNU/Linux 3.0.0-14-virtual x86_64)

free -t -m
             total       used       free     shared    buffers     cached
Mem:           592        576         15          0          4        464
-/+ buffers/cache:        106        485
Swap:         1183          1       1182
Total:        1776        577       1198

ps ux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
ubuntu     624  0.0  0.3  26308  2212 ?        Ssl  14:54   0:01
/home/ubuntu/sedna/bin/se_gov -back
ubuntu     705  0.5 35.7 240060 216700 ?       Ssl  14:54   0:48
/home/ubuntu/sedna/bin/se_sm -backg
ubuntu     776  0.1  9.9 1104856 60400 ?       Sl   16:43   0:02
(Continue reading)

Ivan Shcheklein | 14 Jun 2012 16:06
Picon
Gravatar

Re: SEDNA: kernal unable to handle kernel paging request

Hi Malcolm,


"Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.235823] BUG: unable to handle kernel paging request at ffff8800113ace00" - is definitely bug in kernel/hardware/virtualization software, not in Sedna. Though, there is a small chance that some bug in Sedna initiates this. 

Have you tried to reproduce this on different machine (not on amazon)?

Is it reproducible on another amazon machine?

Do you use the latest amazon image, latest kernel?

Ivan

I am able to reproduce paging error consistently with one of our data files.

There are 2 large XML files loading at the same time (100MB & 200MB).

There is a single application with several open connections to the SEDNA
instance.

There is nothing unusual in the SEDNA event.log.

We have over 200+ Amazon instances using the same configuration without a
problem.  Many of the SEDNA instances store much larger files.

SEDNA: 3.5.161

OS: Ubuntu 11.10 (GNU/Linux 3.0.0-14-virtual x86_64)

free -t -m
            total       used       free     shared    buffers     cached
Mem:           592        576         15          0          4        464
-/+ buffers/cache:        106        485
Swap:         1183          1       1182
Total:        1776        577       1198

ps ux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
ubuntu     624  0.0  0.3  26308  2212 ?        Ssl  14:54   0:01
/home/ubuntu/sedna/bin/se_gov -back
ubuntu     705  0.5 35.7 240060 216700 ?       Ssl  14:54   0:48
/home/ubuntu/sedna/bin/se_sm -backg
ubuntu     776  0.1  9.9 1104856 60400 ?       Sl   16:43   0:02
/home/ubuntu/sedna/bin/se_trn
ubuntu     785  2.7  6.7 1104888 41148 ?       Sl   16:43   1:13
/home/ubuntu/sedna/bin/se_trn
ubuntu     788  0.8  9.8 1104792 59812 ?       Sl   16:43   0:22
/home/ubuntu/sedna/bin/se_trn
ubuntu     792  0.8  0.0      0     0 ?        Zl   16:43   0:23 [se_trn]
<defunct>
ubuntu     795  3.4 39.8 1105280 241416 ?      Sl   16:43   1:34
/home/ubuntu/sedna/bin/se_trn
ubuntu     798  1.2  8.8 1104952 53368 ?       Sl   16:43   0:34
/home/ubuntu/sedna/bin/se_trn
ubuntu     826  0.0  0.2  73080  1560 ?        S    17:10   0:00 sshd:
ubuntu <at> pts/0
ubuntu     827  0.0  1.1  26824  7244 pts/0    Ss   17:10   0:00 -bash
ubuntu     949  0.0  0.4  74028  2440 ?        S    17:11   0:00 sshd:
ubuntu <at> notty
ubuntu     950  0.0  0.1  12788  1120 ?        Ss   17:11   0:00
/usr/lib/openssh/sftp-server
ubuntu     954  0.0  0.2  16748  1240 pts/0    R+   17:27   0:00 ps ux


Partial from kern.log with the BUG.

Jun 13 14:54:50 ip-10-244-165-112 kernel: [   27.612603] init:
plymouth-upstart-bridge main process (533) killed by TERM signal
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.235823] BUG: unable to
handle kernel paging request at ffff8800113ace00
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005] IP:
[<ffffffff81006c25>] xen_set_pte+0x25/0xe0
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005] PGD 1c04067 PUD
1c08067 PMD f37067 PTE 80100000113ac065
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005] Oops: 0003 [#1] SMP

Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005] CPU 0
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005] Modules linked in:
acpiphp
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005]
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005] Pid: 792, comm:
se_trn Not tainted 3.0.0-14-virtual #23-Ubuntu
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005] RIP:
e030:[<ffffffff81006c25>]  [<ffffffff81006c25>] xen_set_pte+0x25/0xe0
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005] RSP:
e02b:ffff880023899cb8  EFLAGS: 00010297
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005] RAX:
0000000000000000 RBX: ffff8800113ace00 RCX: 800000025f83d027
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005] RDX:
0000000000000000 RSI: 800000025f83d027 RDI: ffff8800113ace00
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005] RBP:
ffff880023899cd8 R08: ffffea00003c4db0 R09: 00003ffffffff000
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005] R10:
0000000000000008 R11: 0000000000000293 R12: 800000025f83d027
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005] R13:
800000025f83d027 R14: 00007fd3189c0000 R15: 0000000000000000
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005] FS:
00007fd326f7a740(0000) GS:ffff8800266a9000(0000) knlGS:0000000000000000
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005] CS:  e033 DS: 0000
ES: 0000 CR0: 000000008005003b
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005] CR2:
ffff8800113ace00 CR3: 0000000023873000 CR4: 0000000000002620
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005] DR0:
0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005] DR3:
0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000000
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005] Process se_trn
(pid: 792, threadinfo ffff880023898000, task ffff88002344ae00)
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005] Stack:
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005]  0000000000000008
00003ffffffff000 ffff8800236ed2c0 ffffea000070fc38
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005]  ffff880023899ce8
ffffffff81006cf4 ffff880023899d78 ffffffff8112b22b
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005]  000000000000000a
0000000000000000 0000020000000000 ffff8800113ace00
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005] Call Trace:
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005]
[<ffffffff81006cf4>] xen_set_pte_at+0x14/0x20
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005]
[<ffffffff8112b22b>] __do_fault+0x22b/0x510
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005]
[<ffffffff8112e74a>] handle_pte_fault+0xfa/0x210
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005]
[<ffffffff81005cce>] ? xen_pmd_val+0xe/0x10
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005]
[<ffffffff81004759>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005]
[<ffffffff8112ec18>] handle_mm_fault+0x1f8/0x350
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005]
[<ffffffff81073d1b>] ? set_current_blocked+0x5b/0x70
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005]
[<ffffffff8160850e>] do_page_fault+0x14e/0x530
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005]
[<ffffffff8100122a>] ? hypercall_page+0x22a/0x1000
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005]
[<ffffffff81605215>] page_fault+0x25/0x30
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005] Code: 84 00 00 00
00 00 55 48 89 e5 48 83 ec 20 48 89 5d f0 4c 89 65 f8 66 66 66 66 90 48 89
fb 49 89 f4 e8 60 ba 02 00 83 f8 01 74 13 <4c> 89 23 48 8b 5d f0 4c 8b 65 f8
c9 c3 66 0f 1f 44 00 00 ff 14
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005] RIP
[<ffffffff81006c25>] xen_set_pte+0x25/0xe0
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005]  RSP
<ffff880023899cb8>
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005] CR2:
ffff8800113ace00
Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.236005] ---[ end trace
b231ecb3aa501510 ]---

I don't know if the problem is with SEDNA, Linux, or a simple configuration
issue.

Any ideas on the root cause or where I should start?

Thanks,
Malcolm


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Sedna-discussion mailing list
Sedna-discussion <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sedna-discussion

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Sedna-discussion mailing list
Sedna-discussion@...
https://lists.sourceforge.net/lists/listinfo/sedna-discussion
Malcolm Davis | 14 Jun 2012 18:04

Re: SEDNA: kernal unable to handle kernel paging request

Hello Ivan,

Thanks for the response.

The bug is only reproducible on our Ubuntu SEDNA image.  
a. We have built a specialized SEDNA image from which new Amazon instances
are created.  
b. Each of our clients gets their own SEDNA image.  
c. Only SEDNA runs on the image.

1 client fails on a fresh Amazon instance.  We have tried 3-new Amazon
instance with the same results.

The client does not have unusually large files. (Some of our clients have 3
times the data).

I will start by building a new SEDNA image with the latest Ubuntu build, and
then retest.

Thanks,
Malcolm

-----Original Message-----
From: Ivan Shcheklein [mailto:shcheklein@...] 
Sent: Thursday, June 14, 2012 9:06 AM
To: Malcolm Davis
Cc: sedna-discussion@...
Subject: Re: [Sedna-discussion] SEDNA: kernal unable to handle kernel paging
request

Hi Malcolm,

"Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.235823] BUG: unable to
handle kernel paging request at ffff8800113ace00" - is definitely bug in
kernel/hardware/virtualization software, not in Sedna. Though, there is a
small chance that some bug in Sedna initiates this. 

Have you tried to reproduce this on different machine (not on amazon)?

Is it reproducible on another amazon machine?

Do you use the latest amazon image, latest kernel?

Ivan

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
Ivan Shcheklein | 14 Jun 2012 22:10
Picon
Gravatar

Re: SEDNA: kernal unable to handle kernel paging request

Fine. Let us know your results.


BTW, do you use swap? How do you create it?

On Thu, Jun 14, 2012 at 8:04 PM, Malcolm Davis <malcolm-zlEekMELzkVBDgjK7y7TUQ@public.gmane.org> wrote:
Hello Ivan,

Thanks for the response.

The bug is only reproducible on our Ubuntu SEDNA image.
a. We have built a specialized SEDNA image from which new Amazon instances
are created.
b. Each of our clients gets their own SEDNA image.
c. Only SEDNA runs on the image.

1 client fails on a fresh Amazon instance.  We have tried 3-new Amazon
instance with the same results.

The client does not have unusually large files. (Some of our clients have 3
times the data).

I will start by building a new SEDNA image with the latest Ubuntu build, and
then retest.

Thanks,
Malcolm

-----Original Message-----
From: Ivan Shcheklein [mailto:shcheklein-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org]
Sent: Thursday, June 14, 2012 9:06 AM
To: Malcolm Davis
Cc: sedna-discussion-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
Subject: Re: [Sedna-discussion] SEDNA: kernal unable to handle kernel paging
request

Hi Malcolm,

"Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.235823] BUG: unable to
handle kernel paging request at ffff8800113ace00" - is definitely bug in
kernel/hardware/virtualization software, not in Sedna. Though, there is a
small chance that some bug in Sedna initiates this.


Have you tried to reproduce this on different machine (not on amazon)?

Is it reproducible on another amazon machine?

Do you use the latest amazon image, latest kernel?

Ivan


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Sedna-discussion mailing list
Sedna-discussion@...
https://lists.sourceforge.net/lists/listinfo/sedna-discussion
Malcolm Davis | 14 Jun 2012 23:47

Re: SEDNA: kernal unable to handle kernel paging request

I will let you know.

I did create a swap using the following command:
sudo apt-get install dphys-swapfile
sudo losetup /dev/loop0 /var/swap
sudo swapon /dev/loop0

sysctl.conf is also modified based on:
http://modis.ispras.ru/sedna/install.html

The database name is static among all the clients, so a -data-file-init-size
of 5000 (5G) is pre-created.

This version will have the latest Ubuntu 12.04 (Precise Pangolin).

The problem occurred again today with a different client.  We have created
hundreds of these instances in the last 3 months without an issue.

The new client issue makes me think the problem is Amazon related.

Thanks,
Malcolm

-----Original Message-----
From: Ivan Shcheklein [mailto:shcheklein@...] 
Sent: Thursday, June 14, 2012 3:10 PM
To: Malcolm Davis
Cc: sedna-discussion@...
Subject: Re: [Sedna-discussion] SEDNA: kernal unable to handle kernel paging
request

Fine. Let us know your results.

BTW, do you use swap? How do you create it?

On Thu, Jun 14, 2012 at 8:04 PM, Malcolm Davis <malcolm@...> wrote:

	Hello Ivan,
	
	Thanks for the response.
	
	The bug is only reproducible on our Ubuntu SEDNA image.
	a. We have built a specialized SEDNA image from which new Amazon
instances
	are created.
	b. Each of our clients gets their own SEDNA image.
	c. Only SEDNA runs on the image.
	
	1 client fails on a fresh Amazon instance.  We have tried 3-new
Amazon
	instance with the same results.
	
	The client does not have unusually large files. (Some of our clients
have 3
	times the data).
	
	I will start by building a new SEDNA image with the latest Ubuntu
build, and
	then retest.
	
	Thanks,
	Malcolm
	

	-----Original Message-----
	From: Ivan Shcheklein [mailto:shcheklein@...]
	Sent: Thursday, June 14, 2012 9:06 AM
	To: Malcolm Davis
	Cc: sedna-discussion@...
	Subject: Re: [Sedna-discussion] SEDNA: kernal unable to handle
kernel paging
	request
	
	Hi Malcolm,
	
	"Jun 13 16:53:16 ip-10-244-165-112 kernel: [ 7133.235823] BUG:
unable to
	handle kernel paging request at ffff8800113ace00" - is definitely
bug in
	kernel/hardware/virtualization software, not in Sedna. Though, there
is a
	small chance that some bug in Sedna initiates this.
	
	
	Have you tried to reproduce this on different machine (not on
amazon)?
	
	Is it reproducible on another amazon machine?
	
	Do you use the latest amazon image, latest kernel?
	
	Ivan
	
	

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
Malcolm Davis | 15 Jun 2012 19:07

Re: SEDNA: kernal unable to handle kernel paging request

Amazon Update: "The kernel..BUG: unable to handle kernel paging request" was
a precursor to an Amazon outage.  

The next day Amazon had an EC2 outage due to ESB issues (which I think
corresponds to the problem.  The OS tries to page and drive is failing).
The ESB outage was in the region we use.

Things are working now.  I have already synced data with new instances.

OS Upgrade:  We did migrate to the latest server version of Ubuntu.  The
Ubuntu upgrade is noticeable and I think was important.  

Thanks again for everything,
Malcolm

-----Original Message-----
From: Ivan Shcheklein [mailto:shcheklein@...] 
Sent: Thursday, June 14, 2012 3:10 PM
To: Malcolm Davis
Cc: sedna-discussion@...
Subject: Re: [Sedna-discussion] SEDNA: kernal unable to handle kernel paging
request

Fine. Let us know your results.

BTW, do you use swap? How do you create it?

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

Gmane