Quantcast

Needed to restart HBase. Now Master won't start

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Needed to restart HBase. Now Master won't start

Roy23
Hi,

I needed to restart Hbase , and now it cannot find the master. This is a standalone installation, Here are some logs
[master.out] when bin/hbase-start is run:

$ cat current_fail_log | grep WARN
2013-09-26 00:12:57,024 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/backup-masters/dev01.ec2.salami.pw,45420,1380154376596 already deleted, and this is not a retry
2013-09-26 00:12:57,329 WARN org.apache.hadoop.hbase.master.snapshot.SnapshotManager: Couldn't delete working snapshot directory: file:/ht_data/hbase/hbase/.hbase-snapshot/.tmp
2013-09-26 00:12:58,457 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2013-09-26 00:12:58,513 WARN org.apache.hadoop.hbase.master.MasterFileSystem: Master stopped while trying to get failed servers.
2013-09-26 00:12:58,824 WARN org.apache.zookeeper.server.NIOServerCnxn: caught end of stream exception
2013-09-26 00:12:59,526 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/root-region-server already deleted, and this is not a retry

$ cat current_fail_log | grep ERROR
2013-09-26 00:16:17,115 ERROR org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master


Here are other settings:
/etc/hosts:
127.0.0.1 localhost
10.241.54.111 dev01.ec2.salami.pw

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

hbase-site.xml
configuration>
<property>
    <name>hbase.rootdir</name>
    <value>file:///ht_data/hbase/hbase</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/ht_data/hbase/zookeeper</value>
  </property>
  <property>
     <name>hbase.zookeeper.quorum</name>
      <value></value>
  </property>
  <property>
     <name>hbase.zookeeper.property.clientPort</name>
     <value>2181</value>
  </property>
</configuration>

use localhost or 127.0.0.1 in the value field of quorom does not help. Yields connection refused error.

Any ideas as to why this is happening. It is very much appreciated if someone can help. I will keep adding to this thread if I encounter new issues. As of now, we desperately need to get hbase up and running. Please also feel free to ask any additional questions/requirements/ logs. .

Thanks -  Roy.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Needed to restart HBase. Now Master won't start

Sergey Shelukhin
What are the lines directly before
2013-09-26 00:16:17,115 ERROR
org.apache.hadoop.hbase.
master.HMasterCommandLine: Failed to start master
?
They might be useful (like an exception stack and/or other explanation of
why it failed to start)



On Wed, Sep 25, 2013 at 6:02 PM, Roy23 <[hidden email]> wrote:

> Hi,
>
> I needed to restart Hbase , and now it cannot find the master. This is a
> standalone installation, Here are some logs
> *[master.out] *when bin/hbase-start is run:
>
> $ cat current_fail_log | grep WARN
> 2013-09-26 00:12:57,024 WARN
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node
> /hbase/backup-masters/dev01.ec2.salami.pw,45420,1380154376596 already
> deleted, and this is not a retry
> 2013-09-26 00:12:57,329 WARN
> org.apache.hadoop.hbase.master.snapshot.SnapshotManager: Couldn't delete
> working snapshot directory: file:/ht_data/hbase/hbase/.hbase-snapshot/.tmp
> 2013-09-26 00:12:58,457 WARN org.apache.hadoop.util.NativeCodeLoader:
> Unable
> to load native-hadoop library for your platform... using builtin-java
> classes where applicable
> 2013-09-26 00:12:58,513 WARN
> org.apache.hadoop.hbase.master.MasterFileSystem: Master stopped while
> trying
> to get failed servers.
> 2013-09-26 00:12:58,824 WARN org.apache.zookeeper.server.NIOServerCnxn:
> caught end of stream exception
> 2013-09-26 00:12:59,526 WARN
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node
> /hbase/root-region-server already deleted, and this is not a retry
>
> $ cat current_fail_log | grep ERROR
> 2013-09-26 00:16:17,115 ERROR
> org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master
>
>
> Here are other settings:
> */etc/hosts:*
> 127.0.0.1 localhost
> 10.241.54.111 dev01.ec2.salami.pw
>
> # The following lines are desirable for IPv6 capable hosts
> ::1 ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> ff02::3 ip6-allhosts
>
> *hbase-site.xml*
> configuration>
> <property>
>     <name>hbase.rootdir</name>
>     <value>file:///ht_data/hbase/hbase</value>
>   </property>
>   <property>
>     <name>hbase.zookeeper.property.dataDir</name>
>     <value>/ht_data/hbase/zookeeper</value>
>   </property>
>   <property>
>      <name>hbase.zookeeper.quorum</name>
>       <value></value>
>   </property>
>   <property>
>      <name>hbase.zookeeper.property.clientPort</name>
>      <value>2181</value>
>   </property>
> </configuration>
>
> use localhost or 127.0.0.1 in the value field of quorom does not help.
> Yields connection refused error.
>
> Any ideas as to why this is happening. It is very much appreciated if
> someone can help. I will keep adding to this thread if I encounter new
> issues. As of now, we desperately need to get hbase up and running. Please
> also feel free to ask any additional questions/requirements/ logs. .
>
> Thanks -  Roy.
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/Needed-to-restart-HBase-Now-Master-won-t-start-tp4051241.html
> Sent from the HBase User mailing list archive at Nabble.com.
>

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Needed to restart HBase. Now Master won't start

Roy23
2013-09-26 00:12:58,620 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to stop the worker thread
2013-09-26 00:12:58,620 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer
2013-09-26 00:12:58,621 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:60030
2013-09-26 00:12:58,621 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker interrupted while waiting for task, exiting: java.lang.InterruptedException
2013-09-26 00:12:58,621 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker dev01.ec2.salami.pw,56591,1380154377002 exiting
2013-09-26 00:12:58,723 INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager gracefully.
2013-09-26 00:12:58,723 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: RegionServer:0;dev01.ec2.salami.pw,56591,1380154377002.cacheFlusher exiting
2013-09-26 00:12:58,723 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started; Attempting to free -188320.23 KB of total=2.03 MB
2013-09-26 00:12:58,723 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker: RegionServer:0;dev01.ec2.salami.pw,56591,1380154377002.compactionChecker exiting
2013-09-26 00:12:58,723 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting.
2013-09-26 00:12:58,723 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server dev01.ec2.salami.pw,56591,1380154377002
2013-09-26 00:12:58,724 DEBUG org.apache.hadoop.hbase.catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@309fe84e
2013-09-26 00:12:58,724 INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager gracefully.
2013-09-26 00:12:58,724 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server dev01.ec2.salami.pw,56591,1380154377002; all regions closed.
2013-09-26 00:12:58,724 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: RegionServer:0;dev01.ec2.salami.pw,56591,1380154377002.logSyncer exiting
2013-09-26 00:12:58,724 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: closing hlog writer in file:/ht_data/hbase/hbase/.logs/dev01.ec2.salami.pw,56591,1380154377002
2013-09-26 00:12:58,728 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Moved 1 log files to /ht_data/hbase/hbase/.oldlogs
2013-09-26 00:12:58,728 INFO org.apache.hadoop.hbase.regionserver.Leases: RegionServer:0;dev01.ec2.salami.pw,56591,1380154377002 closing leases
2013-09-26 00:12:58,728 INFO org.apache.hadoop.hbase.regionserver.Leases: RegionServer:0;dev01.ec2.salami.pw,56591,1380154377002 closed leases
2013-09-26 00:12:58,824 WARN org.apache.zookeeper.server.NIOServerCnxn: caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x141579c7f5e0000, likely client has closed socket
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:619)
2013-09-26 00:12:58,826 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:54353 which had sessionid 0x141579c7f5e0000
2013-09-26 00:12:59,050 INFO org.apache.hadoop.hbase.master.SplitLogManager$TimeoutMonitor: dev01.ec2.salami.pw,45420,1380154376596.splitLogManagerTimeoutMonitor exiting
2013-09-26 00:12:59,522 INFO org.apache.hadoop.hbase.catalog.RootLocationEditor: Unsetting ROOT region location in ZooKeeper
2013-09-26 00:12:59,523 INFO org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x141579c7f5e0001 type:delete cxid:0x2e zxid:0x18 txntype:-1 reqpath:n/a Error Path:/hbase/root-region-server Error:KeeperErrorCode = NoNode for /hbase/root-region-server
2013-09-26 00:12:59,526 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/root-region-server already deleted, and this is not a retry
2013-09-26 00:12:59,527 INFO org.apache.hadoop.hbase.master.AssignmentManager: Cluster shutdown is set; skipping assign of -ROOT-,,0.70236052
2013-09-26 00:13:06,938 INFO org.apache.hadoop.hbase.util.ChecksumType: Checksum can use java.util.zip.CRC32
2013-09-26 00:13:07,205 INFO org.apache.hadoop.hbase.master.AssignmentManager$TimerUpdater: dev01.ec2.salami.pw,45420,1380154376596.timerUpdater exiting
2013-09-26 00:13:08,471 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$PeriodicMemstoreFlusher: RegionServer:0;dev01.ec2.salami.pw,56591,1380154377002.periodicFlusher exiting
2013-09-26 00:13:08,471 INFO org.apache.hadoop.hbase.regionserver.Leases: RegionServer:0;dev01.ec2.salami.pw,56591,1380154377002.leaseChecker closing leases
2013-09-26 00:13:08,472 INFO org.apache.hadoop.hbase.regionserver.Leases: RegionServer:0;dev01.ec2.salami.pw,56591,1380154377002.leaseChecker closed leases
2013-09-26 00:13:08,472 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Waiting for Split Thread to finish...
2013-09-26 00:13:08,472 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Waiting for Large Compaction Thread to finish...
2013-09-26 00:13:08,472 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Waiting for Small Compaction Thread to finish...
2013-09-26 00:13:08,477 INFO org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, processing expiration [dev01.ec2.salami.pw,56591,1380154377002]
2013-09-26 00:13:08,478 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x141579c7f5e0002
2013-09-26 00:13:08,478 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: based on AM, current region=-ROOT-,,0.70236052 is on server=null server being checked: dev01.ec2.salami.pw,56591,1380154377002
2013-09-26 00:13:08,479 INFO org.apache.hadoop.hbase.master.ServerManager: Master doesn't enable ServerShutdownHandler during initialization, delay expiring server dev01.ec2.salami.pw,56591,1380154377002
2013-09-26 00:13:08,480 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2013-09-26 00:13:08,481 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:54355 which had sessionid 0x141579c7f5e0002
2013-09-26 00:13:08,481 INFO org.apache.zookeeper.ZooKeeper: Session: 0x141579c7f5e0002 closed
2013-09-26 00:13:08,481 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server dev01.ec2.salami.pw,56591,1380154377002; zookeeper connection closed.
2013-09-26 00:13:08,481 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer:0;dev01.ec2.salami.pw,56591,1380154377002 exiting
2013-09-26 00:13:40,001 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x141579c7f5e0000, timeout of 40000ms exceeded
2013-09-26 00:13:40,001 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x141579c7f5e0000
2013-09-26 00:13:57,369 INFO org.apache.hadoop.hbase.master.cleaner.LogCleaner: Master:0;dev01.ec2.salami.pw,45420,1380154376596.oldLogCleaner exiting
2013-09-26 00:13:57,372 INFO org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Master:0;dev01.ec2.salami.pw,45420,1380154376596.archivedHFileCleaner exiting
2013-09-26 00:16:17,115 ERROR org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master
java.lang.RuntimeException: Master not initialized after 200 seconds
        at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:206)
        at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420)
        at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:149)
        at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:104)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
        at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2079)
2013-09-26 00:16:17,117 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook starting; hbase.shutdown.hook=true; fsShutdownHook=Thread[Thread-23,5,main]
2013-09-26 00:16:17,117 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Shutdown hook
2013-09-26 00:16:17,117 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs shutdown hook thread.
2013-09-26 00:16:17,117 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook finished.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Needed to restart HBase. Now Master won't start

Roy23
Tried running it again, now the error seems different:

ava.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
2013-09-26 02:18:31,405 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/rs/dev01.ec2.salami.pw,45258,1380161701409
2013-09-26 02:18:31,405 INFO org.apache.hadoop.hbase.util.RetryCounter: Sleeping 8000ms before retry #3...
2013-09-26 02:18:33,280 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2013-09-26 02:18:33,281 WARN org.apache.zookeeper.ClientCnxn: Session 0x141580c419a0001 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
2013-09-26 02:18:36,200 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2013-09-26 02:18:36,200 WARN org.apache.zookeeper.ClientCnxn: Session 0x141580c419a0001 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
2013-09-26 02:18:37,985 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2013-09-26 02:18:37,986 WARN org.apache.zookeeper.ClientCnxn: Session 0x141580c419a0001 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
2013-09-26 02:18:39,739 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2013-09-26 02:18:39,739 WARN org.apache.zookeeper.ClientCnxn: Session 0x141580c419a0001 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
2013-09-26 02:18:39,839 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/rs/dev01.ec2.salami.pw,45258,1380161701409
2013-09-26 02:18:39,840 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper delete failed after 3 retries
2013-09-26 02:18:39,840 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Failed deleting my ephemeral node
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/rs/dev01.ec2.salami.pw,45258,1380161701409
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
        at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:133)
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1195)
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1184)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1134)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:899)
        at java.lang.Thread.run(Thread.java:619)
2013-09-26 02:18:40,520 INFO org.apache.zookeeper.ZooKeeper: Session: 0x141580c419a0001 closed
2013-09-26 02:18:40,520 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2013-09-26 02:18:40,520 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server dev01.ec2.salami.pw,45258,1380161701409; zookeeper connection closed.
2013-09-26 02:18:40,520 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer:0;dev01.ec2.salami.pw,45258,1380161701409 exiting
2013-09-26 02:18:40,520 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs shutdown hook thread.
2013-09-26 02:18:40,521 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook finished.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Needed to restart HBase. Now Master won't start

Michael Webster
The last stack trace just looks like zookeeper isn't running.  Can you do a
netstat and see if their are any connections open on port 2181?

I don't know what would cause the original failure though, I haven't
seen "Java.lang.RuntimeException:
Master not initialized after 200 seconds" before, that would indicate some
kind of startup problem, maybe the ROOT region couldn't get assigned?
 Also, did you make any changes to hbase-site.xml between restarts?


On Wed, Sep 25, 2013 at 10:26 PM, Roy23 <[hidden email]> wrote:

> Tried running it again, now the error seems different:
>
> ava.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>         at
>
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>         at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 2013-09-26 02:18:31,405 WARN
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: *Possibly transient
> ZooKeeper exception:
> *org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for
> /hbase/rs/dev01.ec2.salami.pw,45258,1380161701409
> 2013-09-26 02:18:31,405 INFO org.apache.hadoop.hbase.util.RetryCounter:
> Sleeping 8000ms before retry #3...
> 2013-09-26 02:18:33,280 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket
> connection to server localhost/127.0.0.1:2181. Will not attempt to
> authenticate using SASL (Unable to locate a login configuration)
> 2013-09-26 02:18:33,281 WARN org.apache.zookeeper.ClientCnxn: Session
> 0x141580c419a0001 for server null, unexpected error, closing socket
> connection and attempting reconnect
> java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>         at
>
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>         at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 2013-09-26 02:18:36,200 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket
> connection to server localhost/127.0.0.1:2181. Will not attempt to
> authenticate using SASL (Unable to locate a login configuration)
> 2013-09-26 02:18:36,200 WARN org.apache.zookeeper.ClientCnxn: Session
> 0x141580c419a0001 for server null, unexpected error, closing socket
> connection and attempting reconnect
> java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>         at
>
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>         at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 2013-09-26 02:18:37,985 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket
> connection to server localhost/127.0.0.1:2181. Will not attempt to
> authenticate using SASL (Unable to locate a login configuration)
> 2013-09-26 02:18:37,986 WARN org.apache.zookeeper.ClientCnxn: Session
> 0x141580c419a0001 for server null, unexpected error, closing socket
> connection and attempting reconnect
> java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>         at
>
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>         at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 2013-09-26 02:18:39,739 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket
> connection to server localhost/127.0.0.1:2181. Will not attempt to
> authenticate using SASL (Unable to locate a login configuration)
> 2013-09-26 02:18:39,739 WARN org.apache.zookeeper.ClientCnxn: Session
> 0x141580c419a0001 for server null, unexpected error, closing socket
> connection and attempting reconnect
> java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>         at
>
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>         at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 2013-09-26 02:18:39,839 WARN
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
> ZooKeeper exception:
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for
> /hbase/rs/dev01.ec2.salami.pw,45258,1380161701409
> 2013-09-26 02:18:39,840 ERROR
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper delete
> failed after 3 retries
> 2013-09-26 02:18:39,840 WARN
> org.apache.hadoop.hbase.regionserver.HRegionServer: Failed deleting my
> ephemeral node
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for
> /hbase/rs/dev01.ec2.salami.pw,45258,1380161701409
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
>         at
>
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:133)
>         at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1195)
>         at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1184)
>         at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1134)
>         at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:899)
>         at java.lang.Thread.run(Thread.java:619)
> 2013-09-26 02:18:40,520 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x141580c419a0001 closed
> 2013-09-26 02:18:40,520 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2013-09-26 02:18:40,520 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server
> dev01.ec2.salami.pw,45258,1380161701409; zookeeper connection closed.
> 2013-09-26 02:18:40,520 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> RegionServer:0;dev01.ec2.salami.pw,45258,1380161701409 exiting
> 2013-09-26 02:18:40,520 INFO
> org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs shutdown
> hook
> thread.
> 2013-09-26 02:18:40,521 INFO
> org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook finished.
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/Needed-to-restart-HBase-Now-Master-won-t-start-tp4051241p4051244.html
> Sent from the HBase User mailing list archive at Nabble.com.
>



--

*Michael Webster*, Software Engineer
Bronto Software
[hidden email]
bronto.com <http://www.bronto.com/>
Marketing solutions for commerce. Learn more.<http://www.bronto.com/platform>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Needed to restart HBase. Now Master won't start

Roy23
Hi Michael,

Thanks for getting back to me. Really appreciate it..

so, for port 2181 --

$ ps aux | grep :2181
suman    16607  0.0  0.0   8104   924 pts/0    S+   12:55   0:00 grep --color=auto :2181

$ sudo netstat -tulp | grep :2181
$

I did make changes to hbase-site.xml between restarts, it might have been that hbase had not shut down yet when i made changes ?

Also this line in hbase-env.sh is commented out, so zookeeper should be restarted manually since hbase wont start it automatically.
# Tell HBase whether it should manage it's own instance of Zookeeper or not.
#export HBASE_MANAGES_ZK=true

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Needed to restart HBase. Now Master won't start

Bryan Whitehead
This *looks like* Hbase was started in standalone mode with a semi-empty
hbase-site.xml config with only hbase.rootdir set and no external zookeeper
setup.

This means zookeeper was started using /tmp/hbase-hbase/zookeeper[...]. At
some point /tmp got cleaned up but since zookeeper was running everything
ran normally until your restart. Unfortunately I don't know how to recover
from such a scenario.


On Thu, Sep 26, 2013 at 6:04 AM, Roy23 <[hidden email]> wrote:

> Hi Michael,
>
> Thanks for getting back to me. Really appreciate it..
>
> so, for port 2181 --
>
> $ ps aux | grep :2181
> suman    16607  0.0  0.0   8104   924 pts/0    S+   12:55   0:00 grep
> --color=auto :2181
>
> $ sudo netstat -tulp | grep :2181
> $
>
> I did make changes to hbase-site.xml between restarts, it might have been
> that hbase had not shut down yet when i made changes ?
>
> Also this line in hbase-env.sh is commented out, so zookeeper should be
> restarted manually since hbase wont start it automatically.
> # Tell HBase whether it should manage it's own instance of Zookeeper or
> not.
> #export HBASE_MANAGES_ZK=true
>
>
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/Needed-to-restart-HBase-Now-Master-won-t-start-tp4051241p4051270.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
Loading...