fang fang chen (JIRA | 14 Jun 2012 15:19
Picon
Favicon

[Created] (PIG-2753) In distributed mapreduce mode, pig can not get correct hbase configuration

fang fang chen created PIG-2753:
-----------------------------------

             Summary: In distributed mapreduce mode, pig can not get correct hbase configuration
                 Key: PIG-2753
                 URL: https://issues.apache.org/jira/browse/PIG-2753
             Project: Pig
          Issue Type: Bug
          Components: piggybank, site
    Affects Versions: 0.9.1
         Environment: OS:Red Hat Enterprise Linux Server release 5.5 (Tikanga)

            Reporter: fang fang chen
            Assignee: fang fang chen

Hadoop/Hbase/Zookeeper/pig node distribution:
hadoop nodes: {node1=[namenode, secondarynamenode, jobtracker], node2=[datanode, tasktracker]}
hbase nodes: {node1=[master, regionserver]}
pig nodes: {node1, node2}
zookeeper nodes: {node1}

Operate hbase table in node1 pig shell like:

test = LOAD 'hbase://table' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'd:sWords','-loadKey true') AS (ID: bytearray  , Words:chararray );
result = FOREACH test GENERATE ID, com.pig.test(Words);
--result = FOREACH AA GENERATE com.pig.test(Words), ID;
--dump result;

store result into 'table' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('d:drools_cat');
(Continue reading)

Daniel Dai (JIRA | 17 Jun 2012 10:31
Picon
Favicon

[Commented] (PIG-2753) In distributed mapreduce mode, pig can not get correct hbase configuration


    [
https://issues.apache.org/jira/browse/PIG-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393501#comment-13393501
] 

Daniel Dai commented on PIG-2753:
---------------------------------

Where do you put your hbase configuration?

> In distributed mapreduce mode, pig can not get correct hbase configuration
> --------------------------------------------------------------------------
>
>                 Key: PIG-2753
>                 URL: https://issues.apache.org/jira/browse/PIG-2753
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank, site
>    Affects Versions: 0.9.1
>         Environment: OS:Red Hat Enterprise Linux Server release 5.5 (Tikanga)
>  
>            Reporter: fang fang chen
>            Assignee: fang fang chen
>
> Hadoop/Hbase/Zookeeper/pig node distribution:
> hadoop nodes: {node1=[namenode, secondarynamenode, jobtracker], node2=[datanode, tasktracker]}
> hbase nodes: {node1=[master, regionserver]}
> pig nodes: {node1, node2}
> zookeeper nodes: {node1}
> Operate hbase table in node1 pig shell like:
(Continue reading)

fang fang chen (JIRA | 19 Jun 2012 03:34
Picon
Favicon

[Commented] (PIG-2753) In distributed mapreduce mode, pig can not get correct hbase configuration


    [
https://issues.apache.org/jira/browse/PIG-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396441#comment-13396441
] 

fang fang chen commented on PIG-2753:
-------------------------------------

hbase is configured at jobtracker node, and there is no hbase configuration in tasktracker node.

> In distributed mapreduce mode, pig can not get correct hbase configuration
> --------------------------------------------------------------------------
>
>                 Key: PIG-2753
>                 URL: https://issues.apache.org/jira/browse/PIG-2753
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank, site
>    Affects Versions: 0.9.1
>         Environment: OS:Red Hat Enterprise Linux Server release 5.5 (Tikanga)
>  
>            Reporter: fang fang chen
>            Assignee: fang fang chen
>
> Hadoop/Hbase/Zookeeper/pig node distribution:
> hadoop nodes: {node1=[namenode, secondarynamenode, jobtracker], node2=[datanode, tasktracker]}
> hbase nodes: {node1=[master, regionserver]}
> pig nodes: {node1, node2}
> zookeeper nodes: {node1}
> Operate hbase table in node1 pig shell like:
(Continue reading)

fang fang chen (JIRA | 19 Jun 2012 03:36
Picon
Favicon

[Commented] (PIG-2753) In distributed mapreduce mode, pig can not get correct hbase configuration


    [
https://issues.apache.org/jira/browse/PIG-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396444#comment-13396444
] 

fang fang chen commented on PIG-2753:
-------------------------------------

This is caused by the steps pig get configurations:
org.apache.pig.backend.hadoop.hbase.HBaseStorage:     
public void setLocation(String location, Job job) throws IOException {
        job.getConfiguration().setBoolean("pig.noSplitCombination", true);
        m_conf = job.getConfiguration();               //comments 1
        HBaseConfiguration.addHbaseResources(m_conf);              //comments 2
        // Make sure the HBase, ZooKeeper, and Guava jars get shipped.
        TableMapReduceUtil.addDependencyJars(job.getConfiguration(), 
            org.apache.hadoop.hbase.client.HTable.class,
            com.google.common.collect.Lists.class,
            org.apache.zookeeper.ZooKeeper.class);

        String tablename = location;
        if (location.startsWith("hbase://")){
           tablename = location.substring(8);
        }
        if (m_table == null) {

comments 1: First load configuration from job.xml(here the hbase configuration is right, i.e
"hbase.zookeeper.quorum" is "node1")
comments 2: Then load from hbase configuration files(first hbase-defalt.xml, then hbase-site.xml).
Then if there is no hbase configuration in tasktracker side, pig will load configuration from
(Continue reading)

fang fang chen (JIRA | 19 Jun 2012 03:40
Picon
Favicon

[Commented] (PIG-2753) In distributed mapreduce mode, pig can not get correct hbase configuration


    [
https://issues.apache.org/jira/browse/PIG-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396445#comment-13396445
] 

fang fang chen commented on PIG-2753:
-------------------------------------

The same situation also happened in following code from class org.apache.pig.backend.hadoop.hbase.HBaseStorage:
    public void setStoreLocation(String location, Job job) throws IOException {
        if (location.startsWith("hbase://")){
            job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, location.substring(8));
        }else{
            job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, location);
        }

        String serializedSchema = getUDFProperties().getProperty(contextSignature + "_schema");
        if (serializedSchema!= null) {
            schema_ = (ResourceSchema) ObjectSerializer.deserialize(serializedSchema);
        }

        m_conf = HBaseConfiguration.addHbaseResources(job.getConfiguration());// This will overwrite the
original correct configuration from job.xml
    }

> In distributed mapreduce mode, pig can not get correct hbase configuration
> --------------------------------------------------------------------------
>
>                 Key: PIG-2753
>                 URL: https://issues.apache.org/jira/browse/PIG-2753
(Continue reading)

Daniel Dai (JIRA | 19 Jun 2012 20:59
Picon
Favicon

[Commented] (PIG-2753) In distributed mapreduce mode, pig can not get correct hbase configuration


    [
https://issues.apache.org/jira/browse/PIG-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396983#comment-13396983
] 

Daniel Dai commented on PIG-2753:
---------------------------------

Seems we made some change on trunk. It is now use job.xml to override hbase configurations. Can you try trunk?

> In distributed mapreduce mode, pig can not get correct hbase configuration
> --------------------------------------------------------------------------
>
>                 Key: PIG-2753
>                 URL: https://issues.apache.org/jira/browse/PIG-2753
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank, site
>    Affects Versions: 0.9.1
>         Environment: OS:Red Hat Enterprise Linux Server release 5.5 (Tikanga)
>  
>            Reporter: fang fang chen
>            Assignee: fang fang chen
>
> Hadoop/Hbase/Zookeeper/pig node distribution:
> hadoop nodes: {node1=[namenode, secondarynamenode, jobtracker], node2=[datanode, tasktracker]}
> hbase nodes: {node1=[master, regionserver]}
> pig nodes: {node1, node2}
> zookeeper nodes: {node1}
> Operate hbase table in node1 pig shell like:
(Continue reading)

fang fang chen (JIRA | 20 Jun 2012 15:04
Picon
Favicon

[Commented] (PIG-2753) In distributed mapreduce mode, pig can not get correct hbase configuration


    [
https://issues.apache.org/jira/browse/PIG-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397482#comment-13397482
] 

fang fang chen commented on PIG-2753:
-------------------------------------

Do you mean PIG-2115?

> In distributed mapreduce mode, pig can not get correct hbase configuration
> --------------------------------------------------------------------------
>
>                 Key: PIG-2753
>                 URL: https://issues.apache.org/jira/browse/PIG-2753
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank, site
>    Affects Versions: 0.9.1
>         Environment: OS:Red Hat Enterprise Linux Server release 5.5 (Tikanga)
>  
>            Reporter: fang fang chen
>            Assignee: fang fang chen
>
> Hadoop/Hbase/Zookeeper/pig node distribution:
> hadoop nodes: {node1=[namenode, secondarynamenode, jobtracker], node2=[datanode, tasktracker]}
> hbase nodes: {node1=[master, regionserver]}
> pig nodes: {node1, node2}
> zookeeper nodes: {node1}
> Operate hbase table in node1 pig shell like:
(Continue reading)

Daniel Dai (JIRA | 20 Jun 2012 20:20
Picon
Favicon

[Commented] (PIG-2753) In distributed mapreduce mode, pig can not get correct hbase configuration


    [
https://issues.apache.org/jira/browse/PIG-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397715#comment-13397715
] 

Daniel Dai commented on PIG-2753:
---------------------------------

Yes, does it work for you?

> In distributed mapreduce mode, pig can not get correct hbase configuration
> --------------------------------------------------------------------------
>
>                 Key: PIG-2753
>                 URL: https://issues.apache.org/jira/browse/PIG-2753
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank, site
>    Affects Versions: 0.9.1
>         Environment: OS:Red Hat Enterprise Linux Server release 5.5 (Tikanga)
>  
>            Reporter: fang fang chen
>            Assignee: fang fang chen
>
> Hadoop/Hbase/Zookeeper/pig node distribution:
> hadoop nodes: {node1=[namenode, secondarynamenode, jobtracker], node2=[datanode, tasktracker]}
> hbase nodes: {node1=[master, regionserver]}
> pig nodes: {node1, node2}
> zookeeper nodes: {node1}
> Operate hbase table in node1 pig shell like:
(Continue reading)

fang fang chen (JIRA | 21 Jun 2012 07:00
Picon
Favicon

[Commented] (PIG-2753) In distributed mapreduce mode, pig can not get correct hbase configuration


    [
https://issues.apache.org/jira/browse/PIG-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398191#comment-13398191
] 

fang fang chen commented on PIG-2753:
-------------------------------------

Yes, PIG-2115 has fixed this issue.

> In distributed mapreduce mode, pig can not get correct hbase configuration
> --------------------------------------------------------------------------
>
>                 Key: PIG-2753
>                 URL: https://issues.apache.org/jira/browse/PIG-2753
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank, site
>    Affects Versions: 0.9.1
>         Environment: OS:Red Hat Enterprise Linux Server release 5.5 (Tikanga)
>  
>            Reporter: fang fang chen
>            Assignee: fang fang chen
>
> Hadoop/Hbase/Zookeeper/pig node distribution:
> hadoop nodes: {node1=[namenode, secondarynamenode, jobtracker], node2=[datanode, tasktracker]}
> hbase nodes: {node1=[master, regionserver]}
> pig nodes: {node1, node2}
> zookeeper nodes: {node1}
> Operate hbase table in node1 pig shell like:
(Continue reading)

fang fang chen (JIRA | 21 Jun 2012 07:02
Picon
Favicon

[Resolved] (PIG-2753) In distributed mapreduce mode, pig can not get correct hbase configuration


     [
https://issues.apache.org/jira/browse/PIG-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

fang fang chen resolved PIG-2753.
---------------------------------

       Resolution: Duplicate
    Fix Version/s: 0.10.0

Duplicated with PIG-2115

> In distributed mapreduce mode, pig can not get correct hbase configuration
> --------------------------------------------------------------------------
>
>                 Key: PIG-2753
>                 URL: https://issues.apache.org/jira/browse/PIG-2753
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank, site
>    Affects Versions: 0.9.1
>         Environment: OS:Red Hat Enterprise Linux Server release 5.5 (Tikanga)
>  
>            Reporter: fang fang chen
>            Assignee: fang fang chen
>             Fix For: 0.10.0
>
>
> Hadoop/Hbase/Zookeeper/pig node distribution:
> hadoop nodes: {node1=[namenode, secondarynamenode, jobtracker], node2=[datanode, tasktracker]}
(Continue reading)

fang fang chen (JIRA | 21 Jun 2012 07:06
Picon
Favicon

[Updated] (PIG-2753) In distributed mapreduce mode, pig can not get correct hbase configuration


     [
https://issues.apache.org/jira/browse/PIG-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

fang fang chen updated PIG-2753:
--------------------------------

    Attachment: 2753.patch

> In distributed mapreduce mode, pig can not get correct hbase configuration
> --------------------------------------------------------------------------
>
>                 Key: PIG-2753
>                 URL: https://issues.apache.org/jira/browse/PIG-2753
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank, site
>    Affects Versions: 0.9.1
>         Environment: OS:Red Hat Enterprise Linux Server release 5.5 (Tikanga)
>  
>            Reporter: fang fang chen
>            Assignee: fang fang chen
>             Fix For: 0.10.0
>
>         Attachments: 2753.patch
>
>
> Hadoop/Hbase/Zookeeper/pig node distribution:
> hadoop nodes: {node1=[namenode, secondarynamenode, jobtracker], node2=[datanode, tasktracker]}
> hbase nodes: {node1=[master, regionserver]}
(Continue reading)

fang fang chen (JIRA | 21 Jun 2012 07:06
Picon
Favicon

[Commented] (PIG-2753) In distributed mapreduce mode, pig can not get correct hbase configuration


    [
https://issues.apache.org/jira/browse/PIG-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398196#comment-13398196
] 

fang fang chen commented on PIG-2753:
-------------------------------------

Also generated patch fot this issue. Based on released pig-0.9.1.

> In distributed mapreduce mode, pig can not get correct hbase configuration
> --------------------------------------------------------------------------
>
>                 Key: PIG-2753
>                 URL: https://issues.apache.org/jira/browse/PIG-2753
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank, site
>    Affects Versions: 0.9.1
>         Environment: OS:Red Hat Enterprise Linux Server release 5.5 (Tikanga)
>  
>            Reporter: fang fang chen
>            Assignee: fang fang chen
>             Fix For: 0.10.0
>
>         Attachments: 2753.patch
>
>
> Hadoop/Hbase/Zookeeper/pig node distribution:
> hadoop nodes: {node1=[namenode, secondarynamenode, jobtracker], node2=[datanode, tasktracker]}
(Continue reading)


Gmane