Mohit Anchlia | 27 Jul 2012 00:32
Picon

Cluster load

I added new regions and the performance didn't improve. I think it still is
the load balancing issue. I want to ensure that my rows are getting
distrbuted accross cluster. What I see is this:

Could you please tell me what's the best way to see the load?

[root <at> dsdb4 ~]# hadoop fs -lsr /hbase/SESSION_TIMELINE1/

drwxr-xr-x - root root 3 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641

drwxr-xr-x - root root 1 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.oldlogs

-rwxr-xr-x 3 root root 124 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.oldlogs/hlog.1343334723359

drwxr-xr-x - root root 0 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/S_T_MTX

-rwxr-xr-x 3 root root 764 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.regioninfo

drwxr-xr-x - root root 3 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854

drwxr-xr-x - root root 1 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854/.oldlogs

-rwxr-xr-x 3 root root 124 2012-07-26 13:32
(Continue reading)

syed kather | 27 Jul 2012 02:53
Picon

Re: Cluster load

First check whether the data in hbase is consistent ... check this by
running hbck (bin/hbase hbck ) If all the region is consistent .
Now check no of splits in localhost:60010 for the table mention ..
On Jul 27, 2012 4:02 AM, "Mohit Anchlia" <mohitanchlia@...> wrote:

> I added new regions and the performance didn't improve. I think it still is
> the load balancing issue. I want to ensure that my rows are getting
> distrbuted accross cluster. What I see is this:
>
> Could you please tell me what's the best way to see the load?
>
>
> [root <at> dsdb4 ~]# hadoop fs -lsr /hbase/SESSION_TIMELINE1/
>
> drwxr-xr-x - root root 3 2012-07-26 13:32
> /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641
>
> drwxr-xr-x - root root 1 2012-07-26 13:32
> /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.oldlogs
>
> -rwxr-xr-x 3 root root 124 2012-07-26 13:32
>
> /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.oldlogs/hlog.1343334723359
>
> drwxr-xr-x - root root 0 2012-07-26 13:32
> /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/S_T_MTX
>
> -rwxr-xr-x 3 root root 764 2012-07-26 13:32
> /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.regioninfo
>
(Continue reading)

Mohit Anchlia | 27 Jul 2012 03:07
Picon

Re: Cluster load

Is there a way to see how much data does each node have per Hbase table?

On Thu, Jul 26, 2012 at 5:53 PM, syed kather <in.abdul@...> wrote:

> First check whether the data in hbase is consistent ... check this by
> running hbck (bin/hbase hbck ) If all the region is consistent .
> Now check no of splits in localhost:60010 for the table mention ..
>  On Jul 27, 2012 4:02 AM, "Mohit Anchlia" <mohitanchlia@...> wrote:
>
> > I added new regions and the performance didn't improve. I think it still
> is
> > the load balancing issue. I want to ensure that my rows are getting
> > distrbuted accross cluster. What I see is this:
> >
> > Could you please tell me what's the best way to see the load?
> >
> >
> > [root <at> dsdb4 ~]# hadoop fs -lsr /hbase/SESSION_TIMELINE1/
> >
> > drwxr-xr-x - root root 3 2012-07-26 13:32
> > /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641
> >
> > drwxr-xr-x - root root 1 2012-07-26 13:32
> > /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.oldlogs
> >
> > -rwxr-xr-x 3 root root 124 2012-07-26 13:32
> >
> >
> /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.oldlogs/hlog.1343334723359
> >
(Continue reading)

Khang Pham | 27 Jul 2012 09:16
Picon

Re: Cluster load

Hi,

by node do you mean regionserver node ?

if you referring to RegionServer node: you can go to the hbase master web
interface master:65510/master.jsp to see load for each regionserver. That's
the overall load. If you want to see load per node per table, you will need
to query on .META. table (column: info:server)

--K
On Fri, Jul 27, 2012 at 9:07 AM, Mohit Anchlia <mohitanchlia@...>wrote:

> Is there a way to see how much data does each node have per Hbase table?
>
> On Thu, Jul 26, 2012 at 5:53 PM, syed kather <in.abdul@...> wrote:
>
> > First check whether the data in hbase is consistent ... check this by
> > running hbck (bin/hbase hbck ) If all the region is consistent .
> > Now check no of splits in localhost:60010 for the table mention ..
> >  On Jul 27, 2012 4:02 AM, "Mohit Anchlia" <mohitanchlia@...>
> wrote:
> >
> > > I added new regions and the performance didn't improve. I think it
> still
> > is
> > > the load balancing issue. I want to ensure that my rows are getting
> > > distrbuted accross cluster. What I see is this:
> > >
> > > Could you please tell me what's the best way to see the load?
> > >
(Continue reading)

Alex Baranau | 27 Jul 2012 16:21
Picon
Gravatar

Re: Cluster load

From what you posted above, I guess one of the regions
(0a5f6fadd0435898c6f4cf11daa9895a,
note that it has 2 files 2GB each [1], while others regions are "empty") is
getting hit with writes. You may want to run "flush 'mytable'" command from
hbase shell before looking at hdfs - this way you make sure your data is
flushed to hdfs (and not hanged in Memstores).

You may want to check the START/END keys of this region (via master web ui
or in .META.). Then you can compare with the keys generated by your app.
This should give you some info about what's going on.

Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

[1]

-rwxr-xr-x 3 root root 1993369 2012-07-26 13:59
/hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/1566523617482885717

-rwxr-xr-x 3 root root 2003372 2012-07-26 13:57
/hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/7665015246030620502

On Fri, Jul 27, 2012 at 3:16 AM, Khang Pham <khangich@...> wrote:

> Hi,
>
> by node do you mean regionserver node ?
>
(Continue reading)

syed kather | 27 Jul 2012 16:52
Picon

Re: Cluster load

Alex Baranau,

    Can please tell how did you found it has 2GB of data from
"0a5f6fadd0435898c6f4cf11daa9895a" . I am pretty much intrested to know it .
            Thanks and Regards,
        S SYED ABDUL KATHER

On Fri, Jul 27, 2012 at 7:51 PM, Alex Baranau <alex.baranov.v@...>wrote:

> From what you posted above, I guess one of the regions
> (0a5f6fadd0435898c6f4cf11daa9895a,
> note that it has 2 files 2GB each [1], while others regions are "empty") is
> getting hit with writes. You may want to run "flush 'mytable'" command from
> hbase shell before looking at hdfs - this way you make sure your data is
> flushed to hdfs (and not hanged in Memstores).
>
> You may want to check the START/END keys of this region (via master web ui
> or in .META.). Then you can compare with the keys generated by your app.
> This should give you some info about what's going on.
>
> Alex Baranau
> ------
> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
> Solr
>
> [1]
>
> -rwxr-xr-x 3 root root 1993369 2012-07-26 13:59
>
> /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/1566523617482885717
(Continue reading)

Alex Baranau | 27 Jul 2012 20:07
Picon
Gravatar

Re: Cluster load

-rwxr-xr-x 3 root root 1993369 2012-07-26 13:59
/hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/1566523617482885717

"1993369" is the size. Oh sorry. It is 2MB, not 2GB. Yeah, that doesn't
tell a lot. Looks like all data is in Memstore. As I said, you should try
flushing the table, so that you can see where data was written.

Of course it is always great to setup monitoring and see what is going on ;)

Anyhow, the piece pasted above, means:

table:SESSION_TIMELINE1, region: 0a5f6fadd0435898c6f4cf11daa9895a,
 columnFamily: S_T_MTX, hfile(created by memstore flush): 1566523617482885717,
size: 1993369 bytes.

btw, 2MB looks weird: very small flush size (in this case, in other cases
this may happen - long story). May be compression does very well :)

Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

On Fri, Jul 27, 2012 at 10:52 AM, syed kather <in.abdul@...> wrote:

> Alex Baranau,
>
>     Can please tell how did you found it has 2GB of data from
> "0a5f6fadd0435898c6f4cf11daa9895a" . I am pretty much intrested to know it
> .
(Continue reading)

syed kather | 27 Jul 2012 20:21
Picon

Re: Cluster load

Thank you so much for your valuable information. I had not yet used any
monitoring tool .. can please suggest me a good monitor tool .

Syed Abdul kather
send from Samsung S3
On Jul 27, 2012 11:37 PM, "Alex Baranau" <alex.baranov.v@...> wrote:

> -rwxr-xr-x 3 root root 1993369 2012-07-26 13:59
>
> /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/1566523617482885717
>
> "1993369" is the size. Oh sorry. It is 2MB, not 2GB. Yeah, that doesn't
> tell a lot. Looks like all data is in Memstore. As I said, you should try
> flushing the table, so that you can see where data was written.
>
> Of course it is always great to setup monitoring and see what is going on
> ;)
>
> Anyhow, the piece pasted above, means:
>
> table:SESSION_TIMELINE1, region: 0a5f6fadd0435898c6f4cf11daa9895a,
>  columnFamily: S_T_MTX, hfile(created by memstore flush):
> 1566523617482885717,
> size: 1993369 bytes.
>
> btw, 2MB looks weird: very small flush size (in this case, in other cases
> this may happen - long story). May be compression does very well :)
>
> Alex Baranau
> ------
(Continue reading)

Alex Baranau | 27 Jul 2012 20:48
Picon
Gravatar

Re: Cluster load

You can read metrics [0] from JMX directly [1] or use Ganglia [2] or other
third-party tools like [3] (I'm a little biased here;)).

[0] http://hbase.apache.org/book.html#hbase_metrics
[1] http://hbase.apache.org/metrics.html
[2] http://wiki.apache.org/hadoop/GangliaMetrics
[3] http://sematext.com/spm/hbase-performance-monitoring/index.html

Note, that metrics values may seem a bit ugly/weird: as they say, you have
to refer to Lars' book HBase in Action to understand how some of them
calculated. There's an ongoing work towards revising metrics, they should
look much better in next releases.

Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

On Fri, Jul 27, 2012 at 2:21 PM, syed kather <in.abdul@...> wrote:

> Thank you so much for your valuable information. I had not yet used any
> monitoring tool .. can please suggest me a good monitor tool .
>
> Syed Abdul kather
> send from Samsung S3
> On Jul 27, 2012 11:37 PM, "Alex Baranau" <alex.baranov.v@...> wrote:
>
> > -rwxr-xr-x 3 root root 1993369 2012-07-26 13:59
> >
> >
(Continue reading)

Mohit Anchlia | 28 Jul 2012 01:24
Picon

Re: Cluster load

On Fri, Jul 27, 2012 at 11:48 AM, Alex Baranau <alex.baranov.v@...>wrote:

> You can read metrics [0] from JMX directly [1] or use Ganglia [2] or other
> third-party tools like [3] (I'm a little biased here;)).
>
> [0] http://hbase.apache.org/book.html#hbase_metrics
> [1] http://hbase.apache.org/metrics.html
> [2] http://wiki.apache.org/hadoop/GangliaMetrics
> [3] http://sematext.com/spm/hbase-performance-monitoring/index.html
>
> Note, that metrics values may seem a bit ugly/weird: as they say, you have
> to refer to Lars' book HBase in Action to understand how some of them
> calculated. There's an ongoing work towards revising metrics, they should
> look much better in next releases.
>
>

I did flush still what I am seeing is that all my keys are still going to
the first region even though my keys have 0-9 as the first character. Is
there a easy way to see why that might be? hbase shell scan only shows
value in hex.

  SESSION_TIMELINE1,,1343334722986.0a5f6fadd0435 column=info:regioninfo,
timestamp=1343334723073, value=REGION => {NAME =>
'SESSION_TIMELINE1,,1343334722986.0a5f6fadd0435898c6f4cf11daa989
 898c6f4cf11daa9895a.                           5a.', STARTKEY => '',
ENDKEY => '0', ENCODED => 0a5f6fadd0435898c6f4cf11daa9895a, TABLE => {{NAME
=> 'SESSION_TIMELINE1', FAMILIES => [{NA
                                                ME => 'S_T_MTX',
BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'GZ',
(Continue reading)

Alex Baranau | 28 Jul 2012 01:51
Picon
Gravatar

Re: Cluster load

Can you scan your table and show one record?

I guess you might be confusing Bytes.toBytes("0") vs byte[] {(byte) 0} that
I mentioned in the other thread. I.e. looks like first region holds records
which key starts with any byte up to "0", which is (byte) 48. Hence, if you
set first byte of your key to anything from (byte) 0 - (byte) 9, all of
them will fall into first regions which holds records with prefixes (byte)
0 - (byte) 48.

Could you check that?

Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

On Fri, Jul 27, 2012 at 7:24 PM, Mohit Anchlia <mohitanchlia@...>wrote:

> On Fri, Jul 27, 2012 at 11:48 AM, Alex Baranau <alex.baranov.v@...
> >wrote:
>
> > You can read metrics [0] from JMX directly [1] or use Ganglia [2] or
> other
> > third-party tools like [3] (I'm a little biased here;)).
> >
> > [0] http://hbase.apache.org/book.html#hbase_metrics
> > [1] http://hbase.apache.org/metrics.html
> > [2] http://wiki.apache.org/hadoop/GangliaMetrics
> > [3] http://sematext.com/spm/hbase-performance-monitoring/index.html
> >
(Continue reading)

Mohit Anchlia | 28 Jul 2012 02:43
Picon

Re: Cluster load

On Fri, Jul 27, 2012 at 4:51 PM, Alex Baranau <alex.baranov.v@...>wrote:

> Can you scan your table and show one record?
>
> I guess you might be confusing Bytes.toBytes("0") vs byte[] {(byte) 0} that
> I mentioned in the other thread. I.e. looks like first region holds records
> which key starts with any byte up to "0", which is (byte) 48. Hence, if you
> set first byte of your key to anything from (byte) 0 - (byte) 9, all of
> them will fall into first regions which holds records with prefixes (byte)
> 0 - (byte) 48.
>
> Could you check that?
>
>
I thought that if I give Bytes.toBytes("0") it really means that the row
keys starting with "0" will go in that region. Here is my code that creates
a row key and splits using admin util. I also am including the output of
hbase shell scan after the code.

public static byte[][] splitRegionsSessionTimeline(int start, int end) {
 byte[][] splitKeys = new byte[end][];
 // the first region starting with empty key will be created
 // automatically
 for (int i = 0; i < splitKeys.length; i++) {
  splitKeys[i] = Bytes.toBytes(String.valueOf(i));
 }
 return splitKeys;
}
 public static byte [] getRowKey(MetricType metricName, Long timestamp,
Short bucketNo, char rowDelim){
(Continue reading)

Alex Baranau | 28 Jul 2012 03:03
Picon
Gravatar

Re: Cluster load

Yeah, your row keys start with \x00 which is = (byte) 0. This is not the
same as "0" (which is = (byte) 48). You know what to fix now ;)

Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

On Fri, Jul 27, 2012 at 8:43 PM, Mohit Anchlia <mohitanchlia@...>wrote:

> On Fri, Jul 27, 2012 at 4:51 PM, Alex Baranau <alex.baranov.v@...
> >wrote:
>
> > Can you scan your table and show one record?
> >
> > I guess you might be confusing Bytes.toBytes("0") vs byte[] {(byte) 0}
> that
> > I mentioned in the other thread. I.e. looks like first region holds
> records
> > which key starts with any byte up to "0", which is (byte) 48. Hence, if
> you
> > set first byte of your key to anything from (byte) 0 - (byte) 9, all of
> > them will fall into first regions which holds records with prefixes
> (byte)
> > 0 - (byte) 48.
> >
> > Could you check that?
> >
> >
> I thought that if I give Bytes.toBytes("0") it really means that the row
(Continue reading)

Mohit Anchlia | 28 Jul 2012 20:07
Picon

Re: Cluster load

On Fri, Jul 27, 2012 at 6:03 PM, Alex Baranau <alex.baranov.v@...>wrote:

> Yeah, your row keys start with \x00 which is = (byte) 0. This is not the
> same as "0" (which is = (byte) 48). You know what to fix now ;)
>
>
Thanks for checking! I'll make the required changes to my split. Is it
possible to alter splits or only way is to re-create the tables?

> Alex Baranau
> ------
> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
> Solr
>
>
> On Fri, Jul 27, 2012 at 8:43 PM, Mohit Anchlia <mohitanchlia@...
> >wrote:
>
> > On Fri, Jul 27, 2012 at 4:51 PM, Alex Baranau <alex.baranov.v@...
> > >wrote:
> >
> > > Can you scan your table and show one record?
> > >
> > > I guess you might be confusing Bytes.toBytes("0") vs byte[] {(byte) 0}
> > that
> > > I mentioned in the other thread. I.e. looks like first region holds
> > records
> > > which key starts with any byte up to "0", which is (byte) 48. Hence, if
> > you
> > > set first byte of your key to anything from (byte) 0 - (byte) 9, all of
(Continue reading)

Suraj Varma | 29 Jul 2012 06:38
Picon

Re: Cluster load

You can also do an online merge to merge the regions together and then
resplit it ... https://issues.apache.org/jira/browse/HBASE-1621
--S

On Sat, Jul 28, 2012 at 11:07 AM, Mohit Anchlia
<mohitanchlia@...> wrote:
> On Fri, Jul 27, 2012 at 6:03 PM, Alex Baranau <alex.baranov.v@...>wrote:
>
>> Yeah, your row keys start with \x00 which is = (byte) 0. This is not the
>> same as "0" (which is = (byte) 48). You know what to fix now ;)
>>
>>
> Thanks for checking! I'll make the required changes to my split. Is it
> possible to alter splits or only way is to re-create the tables?
>
>
>> Alex Baranau
>> ------
>> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
>> Solr
>>
>>
>> On Fri, Jul 27, 2012 at 8:43 PM, Mohit Anchlia <mohitanchlia@...
>> >wrote:
>>
>> > On Fri, Jul 27, 2012 at 4:51 PM, Alex Baranau <alex.baranov.v@...
>> > >wrote:
>> >
>> > > Can you scan your table and show one record?
>> > >
(Continue reading)

Mohit Anchlia | 30 Jul 2012 19:56
Picon

Re: Cluster load

On Fri, Jul 27, 2012 at 6:03 PM, Alex Baranau <alex.baranov.v@...>wrote:

> Yeah, your row keys start with \x00 which is = (byte) 0. This is not the
> same as "0" (which is = (byte) 48). You know what to fix now ;)
>
>

I made required changes and it seems to be load balancing it pretty well. I
do have a follow up question around how to intrepret the output of hbase
shell. If I want to visually calculate the length of the row key can I
assume that \x00\x00 is equal to 2 bytes? I am just trying to get my head
around understanding hex format displayed on the shell.

 \x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7'\x05\x11
column=S_T_MTX:\x00\x00?\xB8, timestamp=1343670017892, value=1343670136312
 \xBF

> Alex Baranau
> ------
> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
> Solr
>
>
> On Fri, Jul 27, 2012 at 8:43 PM, Mohit Anchlia <mohitanchlia@...
> >wrote:
>
> > On Fri, Jul 27, 2012 at 4:51 PM, Alex Baranau <alex.baranov.v@...
> > >wrote:
> >
> > > Can you scan your table and show one record?
(Continue reading)

Alex Baranau | 30 Jul 2012 20:58
Picon
Gravatar

Re: Cluster load

Glad to hear that answers & suggestions helped you!

The format you are seeing is the output of
org.apache.hadoop.hbase.util.Bytes.toStringBinary(..) method [1]. As you
can see below, for "printable characters" it outputs the character itself,
while for "non-printable" characters it outputs data in format "\xNN" (e.g.
"\x00").

I.e. in your case "\x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7'\x05\x11\xBF" ->
"\x00\x00\x00" + ":" + "\x00\x01\x7F\xFF\xFE\xC7" + "'" + "\xBF", which is
3+1+6+1+1=12 bytes.

I'd better use Bytes.toBytesBinary(String) method, which converts back to
byte array. Or, if you are using ResultScanner API for fetching data, just
invoke Result.getRow().length.

Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

[1]

  /**
   * Write a printable representation of a byte array. Non-printable
   * characters are hex escaped in the format \\x%02X, eg:
   * \x00 \x05 etc
   *
   *  <at> param b array to write out
   *  <at> param off offset to start at
(Continue reading)

Mohit Anchlia | 30 Jul 2012 21:37
Picon

Re: Cluster load

On Mon, Jul 30, 2012 at 11:58 AM, Alex Baranau <alex.baranov.v@...>wrote:

> Glad to hear that answers & suggestions helped you!
>
> The format you are seeing is the output of
> org.apache.hadoop.hbase.util.Bytes.toStringBinary(..) method [1]. As you
> can see below, for "printable characters" it outputs the character itself,
> while for "non-printable" characters it outputs data in format "\xNN" (e.g.
> "\x00").
>
> I.e. in your case "\x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7'\x05\x11\xBF" ->
> "\x00\x00\x00" + ":" + "\x00\x01\x7F\xFF\xFE\xC7" + "'" + "\xBF", which is
> 3+1+6+1+1=12 bytes.
>
> I'd better use Bytes.toBytesBinary(String) method, which converts back to
> byte array. Or, if you are using ResultScanner API for fetching data, just
> invoke Result.getRow().length.
>
>
Thanks! Really appreciate your help.

> Alex Baranau
> ------
> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
> Solr
>
> [1]
>
>   /**
>    * Write a printable representation of a byte array. Non-printable
(Continue reading)


Gmane