hbase not working with blueprints #120

rmelick · 2016-02-25T08:22:38Z

Hi,

I'm having trouble with HBase not starting up correctly. I'm not sure if it's caused by us using a blueprint to configure the cluster.

I've checked in the script we use to do the deploy, and the blueprints, on my fork (rmelick/docker-ambari@3639ed4), so it should be easy to try it out.

The main issue we see is that the HBase master does not start, and if I try to run hbase shell and then status from either amb1 or amb2 docker containers, they have an exception like the following

hbase(main):001:0> status

ERROR: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
    at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:2314)
    at org.apache.hadoop.hbase.master.MasterRpcServices.getClusterStatus(MasterRpcServices.java:769)
    at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:53140)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
    at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
    at java.lang.Thread.run(Thread.java:745)

I have attached the logs from the HBase master and the region server
hbase-hbase-master-amb1.service.consul.txt
hbase-hbase-regionserver-amb2.service.consul.txt

I noticed a few things in the log, I'm wondering if they might be the cause
Region Server

2016-02-25 08:02:16,627 ERROR [regionserver/amb2.service.consul/172.17.0.5:16020] regionserver.HRegionServer: Master passed us a different hostname to use; was=amb2.service.consul, but now=amb
2.node.dc1.consul
...
2016-02-25 08:02:21,039 WARN  [RS_OPEN_REGION-amb2:16020-0] zookeeper.ZKAssign: regionserver:16020-0x1531764a33d0004, quorum=amb1.service.consul:2181, baseZNode=/hbase-unsecure Attempt to transition the unassigned node for edd1c60f2ce63d38d14eca31f437c6b0 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING failed, the server that tried to transition was amb2.node.dc1.consul,16020,1456387334788 not the expected amb2.service.consul,16020,1456387334788
...

These seem like they might be related to #94 .

In the master log, the following jumped out at me

2016-02-25 08:07:20,038 FATAL [amb1:16000.activeMasterManager] master.HMaster: Failed to become active master
java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned
        at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:104)
        at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1005)
        at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:799)
        at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:191)
        at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1783)
        at java.lang.Thread.run(Thread.java:745)
2016-02-25 08:07:20,039 FATAL [amb1:16000.activeMasterManager] master.HMaster: Master server abort: loaded coprocessors are: []
2016-02-25 08:07:20,039 FATAL [amb1:16000.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown.
java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned
        at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:104)
        at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1005)
        at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:799)
        at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:191)
        at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1783)
        at java.lang.Thread.run(Thread.java:745)

Is there an ordering dependency between deploying the HBase master and region server? I noticed that the master would usually fail to deploy with error messages about not finding any region servers

2016-02-25 08:00:37,436 INFO  [amb1:16000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 0 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.

Could this be caused by our blueprint not specifying the Hbase components in the correct order?

The text was updated successfully, but these errors were encountered:

casertap · 2016-09-04T12:26:46Z

I have the same problem, did you find a solution?

JackyYangPassion · 2016-11-07T01:41:44Z

I have the same problem! and i have solved it:
The solution is that:

hgebrael · 2020-09-03T14:30:12Z

what is the solution?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hbase not working with blueprints #120

hbase not working with blueprints #120

rmelick commented Feb 25, 2016

casertap commented Sep 4, 2016

JackyYangPassion commented Nov 7, 2016 •

edited

Loading

hgebrael commented Sep 3, 2020

hbase not working with blueprints #120

hbase not working with blueprints #120

Comments

rmelick commented Feb 25, 2016

casertap commented Sep 4, 2016

JackyYangPassion commented Nov 7, 2016 • edited Loading

hgebrael commented Sep 3, 2020

JackyYangPassion commented Nov 7, 2016 •

edited

Loading