Matthew Tovbin | 2 Apr 2012 19:48

Re: preflist_exhausted with map-reduce and riak-js

Jon,


Thanks for the reply. I tried increasing the 'map_js_vm_count' and 'reduce_js_vm_count' on the cluster to the values you suggested 24/18 and even larger ones 32/24 and 128/128, but with no success - I still get the same error in the console log of the node.

What are we missing?!

-Matthew 



On Fri, Mar 30, 2012 at 16:19, Jon Meredith <jmeredith <at> basho.com> wrote:
Hi list,

I spend some time looking at this issue today.  I suspect it is due to all the Javascript VMs being busy inside Riak due to many parallel MapReduce jobs.

If you're seeing this issue please check for this message in the console.log file
16:40:50.037 [notice] JS call failed: All VMs are busy.
I've filed https://github.com/basho/riak_pipe/issues/44 to track improvements to error reporting.

If you are seeing the log message, the number of VMs allocated for map and reduce tasks can be configured in the app.config - defaults are 8 for map and 6 for reduce, increasing them will use more memory but should reduce the incidence of the error message.

{riak_kv, [
             ...
            {map_js_vm_count, 24 },
            {reduce_js_vm_count, 18 },
            ...]

If you were affected by it and changing this does not resolve your issue, I'll keep digging.

Cheers, Jon.

On Thu, Mar 29, 2012 at 10:29 AM, Matthew Tovbin <matthew <at> tovbin.com> wrote:
Guys,

Any updates on the issue?!

-Matthew 



On Tue, Mar 13, 2012 at 18:29, Matthew Tovbin <matthew <at> tovbin.com> wrote:
Here is a log from one of the servers:

==> /mnt/dataraid/log/crash.log <==
2012-03-13 18:24:44 =CRASH REPORT====
  crasher:
    initial call: riak_pipe_builder:init/1
    pid: <0.20949.24>
    registered_name: []
    exception exit: {sink_died,shutdown}
      in function  gen_fsm:terminate/7
      in call from proc_lib:init_p_do_apply/3
    ancestors: [riak_pipe_builder_sup,riak_pipe_sup,<0.172.0>]
    messages: [{'DOWN',#Ref<0.0.5.38339>,process,<0.20950.24>,shutdown}]
    links: [<0.175.0>]
    dictionary: [{eunit,[{module,riak_pipe_builder},{ref,#Ref<0.0.5.38318>},{spec,[{fitting_spec,listkeys,riak_kv_pipe_listkeys,undefined,{chash,key_of},1,64}]},{options,[{sink,{fitting,<0.20947.24>,#Ref<0.0.5.38318>,{riak_kv_pipe_get,bkey_chash},{riak_kv_pipe_get,bkey_nval}}},{trace,{set,1,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[error],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}},{log,{sink,{fitting,<0.20060.24>,#Ref<0.0.5.38318>,sink,undefined}}}]},{fittings,[{{fitting,<0.20950.24>,#Ref<0.0.5.38318>,{chash,key_of},1},#Ref<0.0.5.38339>}]}]}]
    trap_exit: false
    status: running
    heap_size: 987
    stack_size: 24
    reductions: 251
  neighbours:
2012-03-13 18:24:44 =SUPERVISOR REPORT====
     Supervisor: {local,riak_pipe_builder_sup}
     Context:    child_terminated
     Reason:     {sink_died,shutdown}
     Offender:   [{pid,<0.20949.24>},{name,undefined},{mfargs,{riak_pipe_builder,start_link,undefined}},{restart_type,temporary},{shutdown,brutal_kill},{child_type,worker}]


==> /mnt/dataraid/log/console.log <==
2012-03-13 18:24:44.645 [error] <0.20949.24> CRASH REPORT Process <0.20949.24> with 0 neighbours crashed with reason: {sink_died,shutdown}
2012-03-13 18:24:44.652 [error] <0.175.0> Supervisor riak_pipe_builder_sup had child undefined started with {riak_pipe_builder,start_link,undefined} at <0.20949.24> exit with reason {sink_died,shutdown} in context child_terminated
2012-03-13 18:24:56.420 [info] <0.71.0> <at> riak_core_sysmon_handler:handle_event:85 monitor long_gc <0.32603.24> [{initial_call,{mochiweb_acceptor,init,3}},{almost_current_function,{gen,do_call,4}},{message_queue_len,0}] [{timeout,120},{old_heap_block_size,0},{heap_block_size,4181},{mbuf_size,0},{stack_size,40},{old_heap_size,0},{heap_size,1955}]


==> /mnt/dataraid/log/erlang.log.1 <==
18:24:44.634 [error] gen_fsm <0.20949.24> in state wait_pipeline_shutdown terminated with reason: {sink_died,shutdown}
(riak <at> 1)1> 18:24:44.641 [error] Pipe worker startup failed:fitting was gone before startup
(riak <at> 1)1> 
18:24:44.645 [error] CRASH REPORT Process <0.20949.24> with 0 neighbours crashed with reason: {sink_died,shutdown}
(riak <at> 1)1> 18:24:44.652 [error] Supervisor riak_pipe_builder_sup had child undefined started with {riak_pipe_builder,start_link,undefined} at <0.20949.24> exit with reason {sink_died,shutdown} in context child_terminated
(riak <at> 1)1> 18:24:56.420 [info] monitor long_gc <0.32603.24> [{initial_call,{mochiweb_acceptor,init,3}},{almost_current_function,{gen,do_call,4}},{message_queue_len,0}] [{timeout,120},{old_heap_block_size,0},{heap_block_size,4181},{mbuf_size,0},{stack_size,40},{old_heap_size,0},{heap_size,1955}]
(riak <at> 1)1> 



==> /mnt/dataraid/log/error.log <==
2012-03-13 18:24:44.645 [error] <0.20949.24> CRASH REPORT Process <0.20949.24> with 0 neighbours crashed with reason: {sink_died,shutdown}
2012-03-13 18:24:44.652 [error] <0.175.0> Supervisor riak_pipe_builder_sup had child undefined started with {riak_pipe_builder,start_link,undefined} at <0.20949.24> exit with reason {sink_died,shutdown} in context child_terminated






-Matthew 



On Tue, Mar 13, 2012 at 18:17, Matthew Tovbin <matthew <at> tovbin.com> wrote:
Hi,

I got the same problem today on 1.1.0.

As suggested, I updated all the nodes to 1.1.1. The error remains the same: 

{ stack: [Getter/Setter],
  arguments: undefined,
  type: undefined,
  message: 'HTTP error 500: {"phase":0,"error":"[preflist_exhausted]","input":"{ok,{r_object ....."}}} 


-Matthew 



On Tue, Mar 13, 2012 at 06:48, Jon Meredith <jmeredith <at> basho.com> wrote:
Hi Arne,

Could you try with 1.1.1 please.  We've fixed a number of bugs with how MapReduce handles errors that could explain some of what you are seeing.

Jon


On Tue, Mar 13, 2012 at 12:55 AM, Arne Claassen <arne+riak <at> claassen.net> wrote:
I'm running 1.1.0 built from source on Centos 6. I set up two nodes on localhost following the instructions on the basho site for testing and everything was going fine until i needed to fetch all items in a bucket.

For this i was using node.js with riak-js and it's db.getAll call,which from the debug logs looks like it uses /mapred

The bucket only has about 1200 items in it, each a json payload of about 3KB. Initially, it'd keep getting inconsistent results, sometimes the data would come back, sometimes data would be undefined. So i kept running it, trying to see if there was a pattern, adding more debugging code, etc. But still it would be about a request every couple of seconds. Then it just stopped working altogether and now i always get:

{ [Error: HTTP error 500: {"phase":0,"error":"[preflist_exhausted]","input":"{ok,{r_object,<<\"tweets_15368536\">>,<<\"178528155714330620\">>,[{r_content,{dict,6,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[[<<\"Links\">>]],[],[],[],[],[],[],[],[[<<\"content-type\">>,97,112,112,108,105,99,97,116,105,111,110,47,106,115,111,110],[<<\"X-Riak-VTag\">>,49,76,115,54,70,75,74,71,119,65,111,70,88,73,117,84,98,88,79,69,52,116]],[[<<\"index\">>]],[],[[<<\"X-Riak-Last-Modified\">>|{1331,487523,218968}]],[],[[<<\"X-Riak-Meta\">>]]}}},<<\"{\"coordinates\":null,\"create...\">>}],...},...}","type":"forward_preflist","stack":"[]"}] statusCode: 500 }

in the console. I haven't tried restarting riak, just because i'd rather find out what's happening than it going away and coming back later when i don't expect it.

Any suggestions on how to troubleshoot this? I can still access items individually and i can even list the keys via ?keys=true, it's just he map reduce that seems to be busted.

cheers,
arne
_______________________________________________
riak-users mailing list
riak-users <at> lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



--
Jon Meredith
Platform Engineering Manager
Basho Technologies, Inc.


_______________________________________________
riak-users mailing list
riak-users <at> lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com







--
Jon Meredith
Platform Engineering Manager
Basho Technologies, Inc.


_______________________________________________
riak-users mailing list
riak-users <at> lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Jon Meredith | 2 Apr 2012 21:27
Gravatar

Re: preflist_exhausted with map-reduce and riak-js

Very strange.

Are you seeing the
16:40:50.037 [notice] JS call failed: All VMs are busy.
message in your console.log files on any of the nodes in your cluster?

Jon

On Mon, Apr 2, 2012 at 11:48 AM, Matthew Tovbin <matthew <at> tovbin.com> wrote:
Jon,

Thanks for the reply. I tried increasing the 'map_js_vm_count' and 'reduce_js_vm_count' on the cluster to the values you suggested 24/18 and even larger ones 32/24 and 128/128, but with no success - I still get the same error in the console log of the node.

What are we missing?!

-Matthew 



On Fri, Mar 30, 2012 at 16:19, Jon Meredith <jmeredith <at> basho.com> wrote:
Hi list,

I spend some time looking at this issue today.  I suspect it is due to all the Javascript VMs being busy inside Riak due to many parallel MapReduce jobs.

If you're seeing this issue please check for this message in the console.log file
16:40:50.037 [notice] JS call failed: All VMs are busy.
I've filed https://github.com/basho/riak_pipe/issues/44 to track improvements to error reporting.

If you are seeing the log message, the number of VMs allocated for map and reduce tasks can be configured in the app.config - defaults are 8 for map and 6 for reduce, increasing them will use more memory but should reduce the incidence of the error message.

{riak_kv, [
             ...
            {map_js_vm_count, 24 },
            {reduce_js_vm_count, 18 },
            ...]

If you were affected by it and changing this does not resolve your issue, I'll keep digging.

Cheers, Jon.

On Thu, Mar 29, 2012 at 10:29 AM, Matthew Tovbin <matthew <at> tovbin.com> wrote:
Guys,

Any updates on the issue?!

-Matthew 



On Tue, Mar 13, 2012 at 18:29, Matthew Tovbin <matthew <at> tovbin.com> wrote:
Here is a log from one of the servers:

==> /mnt/dataraid/log/crash.log <==
2012-03-13 18:24:44 =CRASH REPORT====
  crasher:
    initial call: riak_pipe_builder:init/1
    pid: <0.20949.24>
    registered_name: []
    exception exit: {sink_died,shutdown}
      in function  gen_fsm:terminate/7
      in call from proc_lib:init_p_do_apply/3
    ancestors: [riak_pipe_builder_sup,riak_pipe_sup,<0.172.0>]
    messages: [{'DOWN',#Ref<0.0.5.38339>,process,<0.20950.24>,shutdown}]
    links: [<0.175.0>]
    dictionary: [{eunit,[{module,riak_pipe_builder},{ref,#Ref<0.0.5.38318>},{spec,[{fitting_spec,listkeys,riak_kv_pipe_listkeys,undefined,{chash,key_of},1,64}]},{options,[{sink,{fitting,<0.20947.24>,#Ref<0.0.5.38318>,{riak_kv_pipe_get,bkey_chash},{riak_kv_pipe_get,bkey_nval}}},{trace,{set,1,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[error],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}},{log,{sink,{fitting,<0.20060.24>,#Ref<0.0.5.38318>,sink,undefined}}}]},{fittings,[{{fitting,<0.20950.24>,#Ref<0.0.5.38318>,{chash,key_of},1},#Ref<0.0.5.38339>}]}]}]
    trap_exit: false
    status: running
    heap_size: 987
    stack_size: 24
    reductions: 251
  neighbours:
2012-03-13 18:24:44 =SUPERVISOR REPORT====
     Supervisor: {local,riak_pipe_builder_sup}
     Context:    child_terminated
     Reason:     {sink_died,shutdown}
     Offender:   [{pid,<0.20949.24>},{name,undefined},{mfargs,{riak_pipe_builder,start_link,undefined}},{restart_type,temporary},{shutdown,brutal_kill},{child_type,worker}]


==> /mnt/dataraid/log/console.log <==
2012-03-13 18:24:44.645 [error] <0.20949.24> CRASH REPORT Process <0.20949.24> with 0 neighbours crashed with reason: {sink_died,shutdown}
2012-03-13 18:24:44.652 [error] <0.175.0> Supervisor riak_pipe_builder_sup had child undefined started with {riak_pipe_builder,start_link,undefined} at <0.20949.24> exit with reason {sink_died,shutdown} in context child_terminated
2012-03-13 18:24:56.420 [info] <0.71.0> <at> riak_core_sysmon_handler:handle_event:85 monitor long_gc <0.32603.24> [{initial_call,{mochiweb_acceptor,init,3}},{almost_current_function,{gen,do_call,4}},{message_queue_len,0}] [{timeout,120},{old_heap_block_size,0},{heap_block_size,4181},{mbuf_size,0},{stack_size,40},{old_heap_size,0},{heap_size,1955}]


==> /mnt/dataraid/log/erlang.log.1 <==
18:24:44.634 [error] gen_fsm <0.20949.24> in state wait_pipeline_shutdown terminated with reason: {sink_died,shutdown}
(riak <at> 1)1> 18:24:44.641 [error] Pipe worker startup failed:fitting was gone before startup
(riak <at> 1)1> 
18:24:44.645 [error] CRASH REPORT Process <0.20949.24> with 0 neighbours crashed with reason: {sink_died,shutdown}
(riak <at> 1)1> 18:24:44.652 [error] Supervisor riak_pipe_builder_sup had child undefined started with {riak_pipe_builder,start_link,undefined} at <0.20949.24> exit with reason {sink_died,shutdown} in context child_terminated
(riak <at> 1)1> 18:24:56.420 [info] monitor long_gc <0.32603.24> [{initial_call,{mochiweb_acceptor,init,3}},{almost_current_function,{gen,do_call,4}},{message_queue_len,0}] [{timeout,120},{old_heap_block_size,0},{heap_block_size,4181},{mbuf_size,0},{stack_size,40},{old_heap_size,0},{heap_size,1955}]
(riak <at> 1)1> 



==> /mnt/dataraid/log/error.log <==
2012-03-13 18:24:44.645 [error] <0.20949.24> CRASH REPORT Process <0.20949.24> with 0 neighbours crashed with reason: {sink_died,shutdown}
2012-03-13 18:24:44.652 [error] <0.175.0> Supervisor riak_pipe_builder_sup had child undefined started with {riak_pipe_builder,start_link,undefined} at <0.20949.24> exit with reason {sink_died,shutdown} in context child_terminated






-Matthew 



On Tue, Mar 13, 2012 at 18:17, Matthew Tovbin <matthew <at> tovbin.com> wrote:
Hi,

I got the same problem today on 1.1.0.

As suggested, I updated all the nodes to 1.1.1. The error remains the same: 

{ stack: [Getter/Setter],
  arguments: undefined,
  type: undefined,
  message: 'HTTP error 500: {"phase":0,"error":"[preflist_exhausted]","input":"{ok,{r_object ....."}}} 


-Matthew 



On Tue, Mar 13, 2012 at 06:48, Jon Meredith <jmeredith <at> basho.com> wrote:
Hi Arne,

Could you try with 1.1.1 please.  We've fixed a number of bugs with how MapReduce handles errors that could explain some of what you are seeing.

Jon


On Tue, Mar 13, 2012 at 12:55 AM, Arne Claassen <arne+riak <at> claassen.net> wrote:
I'm running 1.1.0 built from source on Centos 6. I set up two nodes on localhost following the instructions on the basho site for testing and everything was going fine until i needed to fetch all items in a bucket.

For this i was using node.js with riak-js and it's db.getAll call,which from the debug logs looks like it uses /mapred

The bucket only has about 1200 items in it, each a json payload of about 3KB. Initially, it'd keep getting inconsistent results, sometimes the data would come back, sometimes data would be undefined. So i kept running it, trying to see if there was a pattern, adding more debugging code, etc. But still it would be about a request every couple of seconds. Then it just stopped working altogether and now i always get:

{ [Error: HTTP error 500: {"phase":0,"error":"[preflist_exhausted]","input":"{ok,{r_object,<<\"tweets_15368536\">>,<<\"178528155714330620\">>,[{r_content,{dict,6,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[[<<\"Links\">>]],[],[],[],[],[],[],[],[[<<\"content-type\">>,97,112,112,108,105,99,97,116,105,111,110,47,106,115,111,110],[<<\"X-Riak-VTag\">>,49,76,115,54,70,75,74,71,119,65,111,70,88,73,117,84,98,88,79,69,52,116]],[[<<\"index\">>]],[],[[<<\"X-Riak-Last-Modified\">>|{1331,487523,218968}]],[],[[<<\"X-Riak-Meta\">>]]}}},<<\"{\"coordinates\":null,\"create...\">>}],...},...}","type":"forward_preflist","stack":"[]"}] statusCode: 500 }

in the console. I haven't tried restarting riak, just because i'd rather find out what's happening than it going away and coming back later when i don't expect it.

Any suggestions on how to troubleshoot this? I can still access items individually and i can even list the keys via ?keys=true, it's just he map reduce that seems to be busted.

cheers,
arne
_______________________________________________
riak-users mailing list
riak-users <at> lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



--
Jon Meredith
Platform Engineering Manager
Basho Technologies, Inc.


_______________________________________________
riak-users mailing list
riak-users <at> lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com







--
Jon Meredith
Platform Engineering Manager
Basho Technologies, Inc.





--
Jon Meredith
Platform Engineering Manager
Basho Technologies, Inc.

_______________________________________________
riak-users mailing list
riak-users <at> lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Gmane