Regions in Transition: FAILED_CLOSE status

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Regions in Transition: FAILED_CLOSE status

jeff saremi
Why are a few hundred of our regions in this state? and what can we do to fix this?
I have been running hbck a few times (is running one time enough?) to no avail.

Internet search does not come up with anything useful either.

I have restarted all masters and all region servers with no luck.

Jeff
Reply | Threaded
Open this post in threaded view
|

Re: Regions in Transition: FAILED_CLOSE status

jeff saremi
Our write code throws exceptions like the following:

org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 10331 actions: NotServingRegionException: 10331 times,at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:258)
  at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$2000(AsyncProcess.java:238)
  at org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1817)
  at org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:240)
  at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:146)
  at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1028)
  at com.microsoft.bing.malta.hbaseClient11$$anon$2.run(ImageFeaturesHdfsToHbaseInjector.scala:115)
  at java.lang.Thread.run(Thread.java:745)


________________________________
From: jeff saremi <[hidden email]>
Sent: Tuesday, May 23, 2017 11:36:11 AM
To: [hidden email]
Subject: Regions in Transition: FAILED_CLOSE status

Why are a few hundred of our regions in this state? and what can we do to fix this?
I have been running hbck a few times (is running one time enough?) to no avail.

Internet search does not come up with anything useful either.

I have restarted all masters and all region servers with no luck.

Jeff
Reply | Threaded
Open this post in threaded view
|

Re: Regions in Transition: FAILED_CLOSE status

Vladimir Rodionov-2
You should check RS logs to see why regions can not be assigned.
Get RS name from master log and check RS log

-Vlad

On Tue, May 23, 2017 at 11:47 AM, jeff saremi <[hidden email]>
wrote:

> Our write code throws exceptions like the following:
>
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> Failed 10331 actions: NotServingRegionException: 10331 times,at
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(
> AsyncProcess.java:258)
>   at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$2000(
> AsyncProcess.java:238)
>   at org.apache.hadoop.hbase.client.AsyncProcess.
> waitForAllPreviousOpsAndReset(AsyncProcess.java:1817)
>   at org.apache.hadoop.hbase.client.BufferedMutatorImpl.
> backgroundFlushCommits(BufferedMutatorImpl.java:240)
>   at org.apache.hadoop.hbase.client.BufferedMutatorImpl.
> mutate(BufferedMutatorImpl.java:146)
>   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1028)
>   at com.microsoft.bing.malta.hbaseClient11$$anon$2.run(
> ImageFeaturesHdfsToHbaseInjector.scala:115)
>   at java.lang.Thread.run(Thread.java:745)
>
>
> ________________________________
> From: jeff saremi <[hidden email]>
> Sent: Tuesday, May 23, 2017 11:36:11 AM
> To: [hidden email]
> Subject: Regions in Transition: FAILED_CLOSE status
>
> Why are a few hundred of our regions in this state? and what can we do to
> fix this?
> I have been running hbck a few times (is running one time enough?) to no
> avail.
>
> Internet search does not come up with anything useful either.
>
> I have restarted all masters and all region servers with no luck.
>
> Jeff
>
Reply | Threaded
Open this post in threaded view
|

Re: Regions in Transition: FAILED_CLOSE status

jeff saremi
Are dead region servers to blame? Is this possibly stale information in the ZK?

________________________________
From: Vladimir Rodionov <[hidden email]>
Sent: Tuesday, May 23, 2017 12:20:16 PM
To: [hidden email]
Subject: Re: Regions in Transition: FAILED_CLOSE status

You should check RS logs to see why regions can not be assigned.
Get RS name from master log and check RS log

-Vlad

On Tue, May 23, 2017 at 11:47 AM, jeff saremi <[hidden email]>
wrote:

> Our write code throws exceptions like the following:
>
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> Failed 10331 actions: NotServingRegionException: 10331 times,at
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(
> AsyncProcess.java:258)
>   at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$2000(
> AsyncProcess.java:238)
>   at org.apache.hadoop.hbase.client.AsyncProcess.
> waitForAllPreviousOpsAndReset(AsyncProcess.java:1817)
>   at org.apache.hadoop.hbase.client.BufferedMutatorImpl.
> backgroundFlushCommits(BufferedMutatorImpl.java:240)
>   at org.apache.hadoop.hbase.client.BufferedMutatorImpl.
> mutate(BufferedMutatorImpl.java:146)
>   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1028)
>   at com.microsoft.bing.malta.hbaseClient11$$anon$2.run(
> ImageFeaturesHdfsToHbaseInjector.scala:115)
>   at java.lang.Thread.run(Thread.java:745)
>
>
> ________________________________
> From: jeff saremi <[hidden email]>
> Sent: Tuesday, May 23, 2017 11:36:11 AM
> To: [hidden email]
> Subject: Regions in Transition: FAILED_CLOSE status
>
> Why are a few hundred of our regions in this state? and what can we do to
> fix this?
> I have been running hbck a few times (is running one time enough?) to no
> avail.
>
> Internet search does not come up with anything useful either.
>
> I have restarted all masters and all region servers with no luck.
>
> Jeff
>
Reply | Threaded
Open this post in threaded view
|

Re: Regions in Transition: FAILED_CLOSE status

Vladimir Rodionov-2
When Master attempt to assign region to RS and assignment fails, there
should be something in RS log file (check errors),
that explains reason of a failure.

How many not-assigned region do you have? You can try to assign them
manually in hbase shell

On Tue, May 23, 2017 at 1:25 PM, jeff saremi <[hidden email]> wrote:

> Are dead region servers to blame? Is this possibly stale information in
> the ZK?
>
> ________________________________
> From: Vladimir Rodionov <[hidden email]>
> Sent: Tuesday, May 23, 2017 12:20:16 PM
> To: [hidden email]
> Subject: Re: Regions in Transition: FAILED_CLOSE status
>
> You should check RS logs to see why regions can not be assigned.
> Get RS name from master log and check RS log
>
> -Vlad
>
> On Tue, May 23, 2017 at 11:47 AM, jeff saremi <[hidden email]>
> wrote:
>
> > Our write code throws exceptions like the following:
> >
> > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> > Failed 10331 actions: NotServingRegionException: 10331 times,at
> > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(
> > AsyncProcess.java:258)
> >   at org.apache.hadoop.hbase.client.AsyncProcess$
> BatchErrors.access$2000(
> > AsyncProcess.java:238)
> >   at org.apache.hadoop.hbase.client.AsyncProcess.
> > waitForAllPreviousOpsAndReset(AsyncProcess.java:1817)
> >   at org.apache.hadoop.hbase.client.BufferedMutatorImpl.
> > backgroundFlushCommits(BufferedMutatorImpl.java:240)
> >   at org.apache.hadoop.hbase.client.BufferedMutatorImpl.
> > mutate(BufferedMutatorImpl.java:146)
> >   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1028)
> >   at com.microsoft.bing.malta.hbaseClient11$$anon$2.run(
> > ImageFeaturesHdfsToHbaseInjector.scala:115)
> >   at java.lang.Thread.run(Thread.java:745)
> >
> >
> > ________________________________
> > From: jeff saremi <[hidden email]>
> > Sent: Tuesday, May 23, 2017 11:36:11 AM
> > To: [hidden email]
> > Subject: Regions in Transition: FAILED_CLOSE status
> >
> > Why are a few hundred of our regions in this state? and what can we do to
> > fix this?
> > I have been running hbck a few times (is running one time enough?) to no
> > avail.
> >
> > Internet search does not come up with anything useful either.
> >
> > I have restarted all masters and all region servers with no luck.
> >
> > Jeff
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Regions in Transition: FAILED_CLOSE status

James Moore
How many region servers are dead? and we're they colocated with DataNodes?

On Tue, May 23, 2017 at 5:20 PM, Vladimir Rodionov <[hidden email]>
wrote:

> When Master attempt to assign region to RS and assignment fails, there
> should be something in RS log file (check errors),
> that explains reason of a failure.
>
> How many not-assigned region do you have? You can try to assign them
> manually in hbase shell
>
> On Tue, May 23, 2017 at 1:25 PM, jeff saremi <[hidden email]>
> wrote:
>
> > Are dead region servers to blame? Is this possibly stale information in
> > the ZK?
> >
> > ________________________________
> > From: Vladimir Rodionov <[hidden email]>
> > Sent: Tuesday, May 23, 2017 12:20:16 PM
> > To: [hidden email]
> > Subject: Re: Regions in Transition: FAILED_CLOSE status
> >
> > You should check RS logs to see why regions can not be assigned.
> > Get RS name from master log and check RS log
> >
> > -Vlad
> >
> > On Tue, May 23, 2017 at 11:47 AM, jeff saremi <[hidden email]>
> > wrote:
> >
> > > Our write code throws exceptions like the following:
> > >
> > > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> > > Failed 10331 actions: NotServingRegionException: 10331 times,at
> > > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(
> > > AsyncProcess.java:258)
> > >   at org.apache.hadoop.hbase.client.AsyncProcess$
> > BatchErrors.access$2000(
> > > AsyncProcess.java:238)
> > >   at org.apache.hadoop.hbase.client.AsyncProcess.
> > > waitForAllPreviousOpsAndReset(AsyncProcess.java:1817)
> > >   at org.apache.hadoop.hbase.client.BufferedMutatorImpl.
> > > backgroundFlushCommits(BufferedMutatorImpl.java:240)
> > >   at org.apache.hadoop.hbase.client.BufferedMutatorImpl.
> > > mutate(BufferedMutatorImpl.java:146)
> > >   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1028)
> > >   at com.microsoft.bing.malta.hbaseClient11$$anon$2.run(
> > > ImageFeaturesHdfsToHbaseInjector.scala:115)
> > >   at java.lang.Thread.run(Thread.java:745)
> > >
> > >
> > > ________________________________
> > > From: jeff saremi <[hidden email]>
> > > Sent: Tuesday, May 23, 2017 11:36:11 AM
> > > To: [hidden email]
> > > Subject: Regions in Transition: FAILED_CLOSE status
> > >
> > > Why are a few hundred of our regions in this state? and what can we do
> to
> > > fix this?
> > > I have been running hbck a few times (is running one time enough?) to
> no
> > > avail.
> > >
> > > Internet search does not come up with anything useful either.
> > >
> > > I have restarted all masters and all region servers with no luck.
> > >
> > > Jeff
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Regions in Transition: FAILED_CLOSE status

Vladimir Rodionov-2
My bad, that is FAIL_CLOSE

Anyway, start with Master log, find region name in a FAIL_CLOSE, check RS
log that hosts this region.

On Tue, May 23, 2017 at 2:35 PM, James Moore <[hidden email]> wrote:

> How many region servers are dead? and we're they colocated with DataNodes?
>
> On Tue, May 23, 2017 at 5:20 PM, Vladimir Rodionov <[hidden email]
> >
> wrote:
>
> > When Master attempt to assign region to RS and assignment fails, there
> > should be something in RS log file (check errors),
> > that explains reason of a failure.
> >
> > How many not-assigned region do you have? You can try to assign them
> > manually in hbase shell
> >
> > On Tue, May 23, 2017 at 1:25 PM, jeff saremi <[hidden email]>
> > wrote:
> >
> > > Are dead region servers to blame? Is this possibly stale information in
> > > the ZK?
> > >
> > > ________________________________
> > > From: Vladimir Rodionov <[hidden email]>
> > > Sent: Tuesday, May 23, 2017 12:20:16 PM
> > > To: [hidden email]
> > > Subject: Re: Regions in Transition: FAILED_CLOSE status
> > >
> > > You should check RS logs to see why regions can not be assigned.
> > > Get RS name from master log and check RS log
> > >
> > > -Vlad
> > >
> > > On Tue, May 23, 2017 at 11:47 AM, jeff saremi <[hidden email]>
> > > wrote:
> > >
> > > > Our write code throws exceptions like the following:
> > > >
> > > > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> > > > Failed 10331 actions: NotServingRegionException: 10331 times,at
> > > > org.apache.hadoop.hbase.client.AsyncProcess$
> BatchErrors.makeException(
> > > > AsyncProcess.java:258)
> > > >   at org.apache.hadoop.hbase.client.AsyncProcess$
> > > BatchErrors.access$2000(
> > > > AsyncProcess.java:238)
> > > >   at org.apache.hadoop.hbase.client.AsyncProcess.
> > > > waitForAllPreviousOpsAndReset(AsyncProcess.java:1817)
> > > >   at org.apache.hadoop.hbase.client.BufferedMutatorImpl.
> > > > backgroundFlushCommits(BufferedMutatorImpl.java:240)
> > > >   at org.apache.hadoop.hbase.client.BufferedMutatorImpl.
> > > > mutate(BufferedMutatorImpl.java:146)
> > > >   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1028)
> > > >   at com.microsoft.bing.malta.hbaseClient11$$anon$2.run(
> > > > ImageFeaturesHdfsToHbaseInjector.scala:115)
> > > >   at java.lang.Thread.run(Thread.java:745)
> > > >
> > > >
> > > > ________________________________
> > > > From: jeff saremi <[hidden email]>
> > > > Sent: Tuesday, May 23, 2017 11:36:11 AM
> > > > To: [hidden email]
> > > > Subject: Regions in Transition: FAILED_CLOSE status
> > > >
> > > > Why are a few hundred of our regions in this state? and what can we
> do
> > to
> > > > fix this?
> > > > I have been running hbck a few times (is running one time enough?) to
> > no
> > > > avail.
> > > >
> > > > Internet search does not come up with anything useful either.
> > > >
> > > > I have restarted all masters and all region servers with no luck.
> > > >
> > > > Jeff
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Regions in Transition: FAILED_CLOSE status

jeff saremi
Vladimir, thanks a lot for helping us out

So I checked the no of RS in the master console. It was more than what we alloted.

Then I went to the list of FAIL_CLOSED regions, copied server names and then issued delete against those nodes in ZK.

I restarted masters (I don't think i need to do this step) and now all regions show as fine

Happy now!

________________________________
From: Vladimir Rodionov <[hidden email]>
Sent: Tuesday, May 23, 2017 2:41:30 PM
To: [hidden email]
Subject: Re: Regions in Transition: FAILED_CLOSE status

My bad, that is FAIL_CLOSE

Anyway, start with Master log, find region name in a FAIL_CLOSE, check RS
log that hosts this region.

On Tue, May 23, 2017 at 2:35 PM, James Moore <[hidden email]> wrote:

> How many region servers are dead? and we're they colocated with DataNodes?
>
> On Tue, May 23, 2017 at 5:20 PM, Vladimir Rodionov <[hidden email]
> >
> wrote:
>
> > When Master attempt to assign region to RS and assignment fails, there
> > should be something in RS log file (check errors),
> > that explains reason of a failure.
> >
> > How many not-assigned region do you have? You can try to assign them
> > manually in hbase shell
> >
> > On Tue, May 23, 2017 at 1:25 PM, jeff saremi <[hidden email]>
> > wrote:
> >
> > > Are dead region servers to blame? Is this possibly stale information in
> > > the ZK?
> > >
> > > ________________________________
> > > From: Vladimir Rodionov <[hidden email]>
> > > Sent: Tuesday, May 23, 2017 12:20:16 PM
> > > To: [hidden email]
> > > Subject: Re: Regions in Transition: FAILED_CLOSE status
> > >
> > > You should check RS logs to see why regions can not be assigned.
> > > Get RS name from master log and check RS log
> > >
> > > -Vlad
> > >
> > > On Tue, May 23, 2017 at 11:47 AM, jeff saremi <[hidden email]>
> > > wrote:
> > >
> > > > Our write code throws exceptions like the following:
> > > >
> > > > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> > > > Failed 10331 actions: NotServingRegionException: 10331 times,at
> > > > org.apache.hadoop.hbase.client.AsyncProcess$
> BatchErrors.makeException(
> > > > AsyncProcess.java:258)
> > > >   at org.apache.hadoop.hbase.client.AsyncProcess$
> > > BatchErrors.access$2000(
> > > > AsyncProcess.java:238)
> > > >   at org.apache.hadoop.hbase.client.AsyncProcess.
> > > > waitForAllPreviousOpsAndReset(AsyncProcess.java:1817)
> > > >   at org.apache.hadoop.hbase.client.BufferedMutatorImpl.
> > > > backgroundFlushCommits(BufferedMutatorImpl.java:240)
> > > >   at org.apache.hadoop.hbase.client.BufferedMutatorImpl.
> > > > mutate(BufferedMutatorImpl.java:146)
> > > >   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1028)
> > > >   at com.microsoft.bing.malta.hbaseClient11$$anon$2.run(
> > > > ImageFeaturesHdfsToHbaseInjector.scala:115)
> > > >   at java.lang.Thread.run(Thread.java:745)
> > > >
> > > >
> > > > ________________________________
> > > > From: jeff saremi <[hidden email]>
> > > > Sent: Tuesday, May 23, 2017 11:36:11 AM
> > > > To: [hidden email]
> > > > Subject: Regions in Transition: FAILED_CLOSE status
> > > >
> > > > Why are a few hundred of our regions in this state? and what can we
> do
> > to
> > > > fix this?
> > > > I have been running hbck a few times (is running one time enough?) to
> > no
> > > > avail.
> > > >
> > > > Internet search does not come up with anything useful either.
> > > >
> > > > I have restarted all masters and all region servers with no luck.
> > > >
> > > > Jeff
> > > >
> > >
> >
>