samar kumar | 27 Jun 2012 12:49
Picon

direct Hfile Read and Writes

Hi Hbase Users,
 I have seen API's supporting HFile direct reads and write. I Do understand
it would create Hfiles in the location specified and it should be much
faster since we would skip all the look ups to ZK. catalog table . RS , but
can anyone point me to a particular case when we would like to read/write
directly .

   1. Since the data we might need would be distributed across regions how
   would direct reading of Hfile be helpful.
   2. Any use-case for direct writes of Hfiles. If we write Hfiles will
   that data be accessible to the hbase shell.

Regards,
Samar
shixing | 27 Jun 2012 16:33
Picon

Re: direct Hfile Read and Writes

  1. Since the data we might need would be distributed across regions how
  would direct reading of Hfile be helpful.

You can read the HFilePrettyPrinter, it shows how to create a HFile.Reader
and use it to read the HFile.
Or you can use the ./hbase org.apache.hadoop.hbase.io.hfile.HFile -p -f
hdfs://xxxx/xxx/hfile to print some info to have a look.

  2. Any use-case for direct writes of Hfiles. If we write Hfiles will
  that data be accessible to the hbase shell.

You can read the HFileOutputFormat, it shows how to create a HFile.Writer
and use it to directly write kvs the HFile.
If you want to read the data by hbase shell, you should firstly load the
HFile to regionservers, details for bulkload
http://hbase.apache.org/book.html#arch.bulk.load .

On Wed, Jun 27, 2012 at 6:49 PM, samar kumar <samar.opensource@...>wrote:

> Hi Hbase Users,
>  I have seen API's supporting HFile direct reads and write. I Do understand
> it would create Hfiles in the location specified and it should be much
> faster since we would skip all the look ups to ZK. catalog table . RS , but
> can anyone point me to a particular case when we would like to read/write
> directly .
>
>
>   1. Since the data we might need would be distributed across regions how
>   would direct reading of Hfile be helpful.
>   2. Any use-case for direct writes of Hfiles. If we write Hfiles will
(Continue reading)

Jerry Lam | 27 Jun 2012 19:22
Picon

Re: direct Hfile Read and Writes

Hi Samar:

I have used IncrementalLoadHFile successfully in the past. Basically, once
you have written hfile youreself you can use the IncrementalLoadHFile to
merge with the HFile currently managed by HBase. Once it is loaded to
HBase, the records in the increment hfile are accessible by clients.

HTH,

Jerry

On Wed, Jun 27, 2012 at 10:33 AM, shixing <paradisehit@...> wrote:

>  1. Since the data we might need would be distributed across regions how
>  would direct reading of Hfile be helpful.
>
> You can read the HFilePrettyPrinter, it shows how to create a HFile.Reader
> and use it to read the HFile.
> Or you can use the ./hbase org.apache.hadoop.hbase.io.hfile.HFile -p -f
> hdfs://xxxx/xxx/hfile to print some info to have a look.
>
>  2. Any use-case for direct writes of Hfiles. If we write Hfiles will
>  that data be accessible to the hbase shell.
>
> You can read the HFileOutputFormat, it shows how to create a HFile.Writer
> and use it to directly write kvs the HFile.
> If you want to read the data by hbase shell, you should firstly load the
> HFile to regionservers, details for bulkload
> http://hbase.apache.org/book.html#arch.bulk.load .
>
(Continue reading)

Anoop Sam John | 28 Jun 2012 05:37
Favicon

RE: direct Hfile Read and Writes

When there is a need of bulk loading huge amount of data into HBase at one time, it will be better go with the
direct HFile write.
Here 1st using the MR framework HFiles are directly written (Into HDFS).. For this HBase provides the
utility classes and the ImportTSV tool itself.
Then using the IncrementalLoadHFile , these files are loaded into the regions managed by RS.
Once these 2 steps are over client can read the data normally.
For loading these much data in a normal way of HTable#put() will take lot of time.

-Anoop-
________________________________________
From: Jerry Lam [chilinglam@...]
Sent: Wednesday, June 27, 2012 10:52 PM
To: user@...
Subject: Re: direct Hfile Read and Writes

Hi Samar:

I have used IncrementalLoadHFile successfully in the past. Basically, once
you have written hfile youreself you can use the IncrementalLoadHFile to
merge with the HFile currently managed by HBase. Once it is loaded to
HBase, the records in the increment hfile are accessible by clients.

HTH,

Jerry

On Wed, Jun 27, 2012 at 10:33 AM, shixing <paradisehit@...> wrote:

>  1. Since the data we might need would be distributed across regions how
>  would direct reading of Hfile be helpful.
(Continue reading)

samar kumar | 28 Jun 2012 10:40
Picon

Re: direct Hfile Read and Writes

Thanks for the replies .  I am aware of the apis but can anyone give me
little bit more insight on the details. After creating the HFiles and
calling IncrementalLoadHFile how does it internally change the RS, Catalog
tables etc?
Can anyone explain the flow.
Thanks ,
Samar

On Thu, Jun 28, 2012 at 9:07 AM, Anoop Sam John <anoopsj@...> wrote:

> When there is a need of bulk loading huge amount of data into HBase at one
> time, it will be better go with the direct HFile write.
> Here 1st using the MR framework HFiles are directly written (Into HDFS)..
> For this HBase provides the utility classes and the ImportTSV tool itself.
> Then using the IncrementalLoadHFile , these files are loaded into the
> regions managed by RS.
> Once these 2 steps are over client can read the data normally.
> For loading these much data in a normal way of HTable#put() will take lot
> of time.
>
>
> -Anoop-
> ________________________________________
> From: Jerry Lam [chilinglam@...]
> Sent: Wednesday, June 27, 2012 10:52 PM
> To: user@...
> Subject: Re: direct Hfile Read and Writes
>
> Hi Samar:
>
(Continue reading)

Stack | 29 Jun 2012 00:39

Re: direct Hfile Read and Writes

On Thu, Jun 28, 2012 at 1:40 AM, samar kumar
<samar.opensource@...> wrote:
> Thanks for the replies .  I am aware of the apis but can anyone give me
> little bit more insight on the details. After creating the HFiles and
> calling IncrementalLoadHFile how does it internally change the RS, Catalog
> tables etc?
> Can anyone explain the flow.

It does not change catalog tables; catalog table only has region
layout, not what files make up a region.  The loading of an hfile into
a region just atomically updates the set of hfiles the currently open
region is using swapping in the new one.

Study the bulk load tool though.  Notice how it will make the
proffered set of hfiles fit the current region layout splitting hfiles
if necessary so they fit w/i the current region confines.  See how it
also threads the updating because update is slow done serially.  Back
up more to see how the files are created using total order
partitioner.  Leverage what bulk loader does if you want to avoid long
solved issues.

St.Ack


Gmane