Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hbase not working with blueprints #120

Open
rmelick opened this issue Feb 25, 2016 · 3 comments
Open

hbase not working with blueprints #120

rmelick opened this issue Feb 25, 2016 · 3 comments

Comments

@rmelick
Copy link

rmelick commented Feb 25, 2016

Hi,

I'm having trouble with HBase not starting up correctly. I'm not sure if it's caused by us using a blueprint to configure the cluster.

I've checked in the script we use to do the deploy, and the blueprints, on my fork (rmelick/docker-ambari@3639ed4), so it should be easy to try it out.

The main issue we see is that the HBase master does not start, and if I try to run hbase shell and then status from either amb1 or amb2 docker containers, they have an exception like the following

hbase(main):001:0> status

ERROR: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
    at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:2314)
    at org.apache.hadoop.hbase.master.MasterRpcServices.getClusterStatus(MasterRpcServices.java:769)
    at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:53140)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
    at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
    at java.lang.Thread.run(Thread.java:745)

I have attached the logs from the HBase master and the region server
hbase-hbase-master-amb1.service.consul.txt
hbase-hbase-regionserver-amb2.service.consul.txt

I noticed a few things in the log, I'm wondering if they might be the cause
Region Server

2016-02-25 08:02:16,627 ERROR [regionserver/amb2.service.consul/172.17.0.5:16020] regionserver.HRegionServer: Master passed us a different hostname to use; was=amb2.service.consul, but now=amb
2.node.dc1.consul
...
2016-02-25 08:02:21,039 WARN  [RS_OPEN_REGION-amb2:16020-0] zookeeper.ZKAssign: regionserver:16020-0x1531764a33d0004, quorum=amb1.service.consul:2181, baseZNode=/hbase-unsecure Attempt to transition the unassigned node for edd1c60f2ce63d38d14eca31f437c6b0 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING failed, the server that tried to transition was amb2.node.dc1.consul,16020,1456387334788 not the expected amb2.service.consul,16020,1456387334788
...

These seem like they might be related to #94 .

In the master log, the following jumped out at me

2016-02-25 08:07:20,038 FATAL [amb1:16000.activeMasterManager] master.HMaster: Failed to become active master
java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned
        at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:104)
        at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1005)
        at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:799)
        at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:191)
        at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1783)
        at java.lang.Thread.run(Thread.java:745)
2016-02-25 08:07:20,039 FATAL [amb1:16000.activeMasterManager] master.HMaster: Master server abort: loaded coprocessors are: []
2016-02-25 08:07:20,039 FATAL [amb1:16000.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown.
java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned
        at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:104)
        at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1005)
        at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:799)
        at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:191)
        at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1783)
        at java.lang.Thread.run(Thread.java:745)

Is there an ordering dependency between deploying the HBase master and region server? I noticed that the master would usually fail to deploy with error messages about not finding any region servers

2016-02-25 08:00:37,436 INFO  [amb1:16000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 0 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.

Could this be caused by our blueprint not specifying the Hbase components in the correct order?

@casertap
Copy link

casertap commented Sep 4, 2016

I have the same problem, did you find a solution?

@JackyYangPassion
Copy link

JackyYangPassion commented Nov 7, 2016

I have the same problem! and i have solved it:
The solution is that:

@hgebrael
Copy link

hgebrael commented Sep 3, 2020

what is the solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants