Bucket Cache Failure In HBase 1.3.1

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Bucket Cache Failure In HBase 1.3.1

Saad Mufti-3
HI,

I am running an HBase 1.3.1 cluster on AWS EMR. The bucket cache is
configured to use two attached EBS disks of 50 GB each and I provisioned
the bucket cache to be a bit less than the total, at a total of 98 GB per
instance to be on the safe side. My tables have column families set to
prefetch on open.

On some instances during cluster startup, the bucket cache starts throwing
errors, and eventually the bucket cache gets completely disabled on this
instance. The instance still stays up as a valid region server and the only
clue in the region server UI is that the bucket cache tab reports a count
of 0, and size of 0 bytes.

I have already opened a ticket with AWS to see if there are problems with
the EBS volumes, but wanted to tap the open source community's hive-mind to
see what kind of problem would cause the bucket cache to get disabled. If
the application depends on the bucket cache for performance, wouldn't it be
better to just remove that region server from the pool if its bucket cache
cannot be recovered/enabled?

The error look like the following. Would appreciate any insight, thank:

2018-02-25 01:12:47,780 ERROR [hfile-prefetch-1519513834057]
bucket.BucketCache: Failed reading block
332b0634287f4c42851bc1a55ffe4042_1348128 from bucket cache
java.nio.channels.ClosedByInterruptException
        at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
        at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:746)
        at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:727)
        at
org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$FileReadAccessor.access(FileIOEngine.java:219)
        at
org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.accessFile(FileIOEngine.java:170)
        at
org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.read(FileIOEngine.java:105)
        at
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getBlock(BucketCache.java:492)
        at
org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.getBlock(CombinedBlockCache.java:84)
        at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.getCachedBlock(HFileReaderV2.java:279)
        at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:420)
        at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$1.run(HFileReaderV2.java:209)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

and

2018-02-25 01:12:52,432 ERROR [regionserver/
ip-xx-xx-xx-xx.xx-xx-xx.us-east-1.ec2.xx.net/xx.xx.xx.xx:16020-BucketCacheWriter-7]
bucket.BucketCache: Failed writing to bucket cache
java.nio.channels.ClosedChannelException
        at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110)
        at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:758)
        at
org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$FileWriteAccessor.access(FileIOEngine.java:227)
        at
org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.accessFile(FileIOEngine.java:170)
        at
org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.write(FileIOEngine.java:116)
        at
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$RAMQueueEntry.writeToCache(BucketCache.java:1357)
        at
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$WriterThread.doDrain(BucketCache.java:883)
        at
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$WriterThread.run(BucketCache.java:838)
        at java.lang.Thread.run(Thread.java:748)

and later
2018-02-25 01:13:47,783 INFO  [regionserver/
ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.194.246.70:16020-BucketCacheWriter-4]
bucket.BucketCach
e: regionserver/
ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.194.246.70:16020-BucketCacheWriter-4
exiting, cacheEnabled=false
2018-02-25 01:13:47,864 WARN  [regionserver/
ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.194.246.70:16020-BucketCacheWriter-6]
bucket.FileIOEngi
ne: Failed syncing data to /mnt1/hbase/bucketcache
2018-02-25 01:13:47,864 ERROR [regionserver/
ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.194.246.70:16020-BucketCacheWriter-6]
bucket.BucketCach
e: Failed syncing IO engine
java.nio.channels.ClosedChannelException
        at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110)
        at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:379)
        at
org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.sync(FileIOEngine.java:128)
        at
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$WriterThread.doDrain(BucketCache.java:911)
        at
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$WriterThread.run(BucketCache.java:838)
        at java.lang.Thread.run(Thread.java:748)

----
Saad
Reply | Threaded
Open this post in threaded view
|

Re: Bucket Cache Failure In HBase 1.3.1

Ted Yu-3
Here is related code for disabling bucket cache:

    if (this.ioErrorStartTime > 0) {

      if (cacheEnabled && (now - ioErrorStartTime) > this.
ioErrorsTolerationDuration) {

        LOG.error("IO errors duration time has exceeded " +
ioErrorsTolerationDuration +

          "ms, disabling cache, please check your IOEngine");

        disableCache();

Can you search in the region server log to see if the above occurred ?

Was this server the only one with disabled cache ?

Cheers

On Sun, Feb 25, 2018 at 6:20 AM, Saad Mufti <[hidden email]>
wrote:

> HI,
>
> I am running an HBase 1.3.1 cluster on AWS EMR. The bucket cache is
> configured to use two attached EBS disks of 50 GB each and I provisioned
> the bucket cache to be a bit less than the total, at a total of 98 GB per
> instance to be on the safe side. My tables have column families set to
> prefetch on open.
>
> On some instances during cluster startup, the bucket cache starts throwing
> errors, and eventually the bucket cache gets completely disabled on this
> instance. The instance still stays up as a valid region server and the only
> clue in the region server UI is that the bucket cache tab reports a count
> of 0, and size of 0 bytes.
>
> I have already opened a ticket with AWS to see if there are problems with
> the EBS volumes, but wanted to tap the open source community's hive-mind to
> see what kind of problem would cause the bucket cache to get disabled. If
> the application depends on the bucket cache for performance, wouldn't it be
> better to just remove that region server from the pool if its bucket cache
> cannot be recovered/enabled?
>
> The error look like the following. Would appreciate any insight, thank:
>
> 2018-02-25 01:12:47,780 ERROR [hfile-prefetch-1519513834057]
> bucket.BucketCache: Failed reading block
> 332b0634287f4c42851bc1a55ffe4042_1348128 from bucket cache
> java.nio.channels.ClosedByInterruptException
>         at
> java.nio.channels.spi.AbstractInterruptibleChannel.end(
> AbstractInterruptibleChannel.java:202)
>         at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.
> java:746)
>         at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:727)
>         at
> org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$
> FileReadAccessor.access(FileIOEngine.java:219)
>         at
> org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> accessFile(FileIOEngine.java:170)
>         at
> org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> read(FileIOEngine.java:105)
>         at
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.
> getBlock(BucketCache.java:492)
>         at
> org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.
> getBlock(CombinedBlockCache.java:84)
>         at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.
> getCachedBlock(HFileReaderV2.java:279)
>         at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(
> HFileReaderV2.java:420)
>         at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$1.run(
> HFileReaderV2.java:209)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(
> ScheduledThreadPoolExecutor.java:293)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>
> and
>
> 2018-02-25 01:12:52,432 ERROR [regionserver/
> ip-xx-xx-xx-xx.xx-xx-xx.us-east-1.ec2.xx.net/xx.xx.xx.xx:
> 16020-BucketCacheWriter-7]
> bucket.BucketCache: Failed writing to bucket cache
> java.nio.channels.ClosedChannelException
>         at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110)
>         at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:758)
>         at
> org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$
> FileWriteAccessor.access(FileIOEngine.java:227)
>         at
> org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> accessFile(FileIOEngine.java:170)
>         at
> org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> write(FileIOEngine.java:116)
>         at
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> RAMQueueEntry.writeToCache(BucketCache.java:1357)
>         at
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$WriterThread.doDrain(
> BucketCache.java:883)
>         at
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> WriterThread.run(BucketCache.java:838)
>         at java.lang.Thread.run(Thread.java:748)
>
> and later
> 2018-02-25 01:13:47,783 INFO  [regionserver/
> ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> 194.246.70:16020-BucketCacheWriter-4]
> bucket.BucketCach
> e: regionserver/
> ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> 194.246.70:16020-BucketCacheWriter-4
> exiting, cacheEnabled=false
> 2018-02-25 01:13:47,864 WARN  [regionserver/
> ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> 194.246.70:16020-BucketCacheWriter-6]
> bucket.FileIOEngi
> ne: Failed syncing data to /mnt1/hbase/bucketcache
> 2018-02-25 01:13:47,864 ERROR [regionserver/
> ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> 194.246.70:16020-BucketCacheWriter-6]
> bucket.BucketCach
> e: Failed syncing IO engine
> java.nio.channels.ClosedChannelException
>         at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110)
>         at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:379)
>         at
> org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> sync(FileIOEngine.java:128)
>         at
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$WriterThread.doDrain(
> BucketCache.java:911)
>         at
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> WriterThread.run(BucketCache.java:838)
>         at java.lang.Thread.run(Thread.java:748)
>
> ----
> Saad
>
Reply | Threaded
Open this post in threaded view
|

Re: Bucket Cache Failure In HBase 1.3.1

ramkrishna vasudevan
From the logs, it seems there were some issue with the file that was used
by the bucket cache. Probably the volume where the file was mounted had
some issues.
If you can confirm that , then this issue should be pretty straightforward.
If not let us know, we can help.

Regards
Ram

On Sun, Feb 25, 2018 at 9:40 PM, Ted Yu <[hidden email]> wrote:

> Here is related code for disabling bucket cache:
>
>     if (this.ioErrorStartTime > 0) {
>
>       if (cacheEnabled && (now - ioErrorStartTime) > this.
> ioErrorsTolerationDuration) {
>
>         LOG.error("IO errors duration time has exceeded " +
> ioErrorsTolerationDuration +
>
>           "ms, disabling cache, please check your IOEngine");
>
>         disableCache();
>
> Can you search in the region server log to see if the above occurred ?
>
> Was this server the only one with disabled cache ?
>
> Cheers
>
> On Sun, Feb 25, 2018 at 6:20 AM, Saad Mufti <[hidden email]>
> wrote:
>
> > HI,
> >
> > I am running an HBase 1.3.1 cluster on AWS EMR. The bucket cache is
> > configured to use two attached EBS disks of 50 GB each and I provisioned
> > the bucket cache to be a bit less than the total, at a total of 98 GB per
> > instance to be on the safe side. My tables have column families set to
> > prefetch on open.
> >
> > On some instances during cluster startup, the bucket cache starts
> throwing
> > errors, and eventually the bucket cache gets completely disabled on this
> > instance. The instance still stays up as a valid region server and the
> only
> > clue in the region server UI is that the bucket cache tab reports a count
> > of 0, and size of 0 bytes.
> >
> > I have already opened a ticket with AWS to see if there are problems with
> > the EBS volumes, but wanted to tap the open source community's hive-mind
> to
> > see what kind of problem would cause the bucket cache to get disabled. If
> > the application depends on the bucket cache for performance, wouldn't it
> be
> > better to just remove that region server from the pool if its bucket
> cache
> > cannot be recovered/enabled?
> >
> > The error look like the following. Would appreciate any insight, thank:
> >
> > 2018-02-25 01:12:47,780 ERROR [hfile-prefetch-1519513834057]
> > bucket.BucketCache: Failed reading block
> > 332b0634287f4c42851bc1a55ffe4042_1348128 from bucket cache
> > java.nio.channels.ClosedByInterruptException
> >         at
> > java.nio.channels.spi.AbstractInterruptibleChannel.end(
> > AbstractInterruptibleChannel.java:202)
> >         at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.
> > java:746)
> >         at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:727)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$
> > FileReadAccessor.access(FileIOEngine.java:219)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > accessFile(FileIOEngine.java:170)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > read(FileIOEngine.java:105)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.
> > getBlock(BucketCache.java:492)
> >         at
> > org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.
> > getBlock(CombinedBlockCache.java:84)
> >         at
> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.
> > getCachedBlock(HFileReaderV2.java:279)
> >         at
> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(
> > HFileReaderV2.java:420)
> >         at
> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$1.run(
> > HFileReaderV2.java:209)
> >         at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> >         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >         at
> > java.util.concurrent.ScheduledThreadPoolExecutor$
> > ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> >         at
> > java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.run(
> > ScheduledThreadPoolExecutor.java:293)
> >         at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> > ThreadPoolExecutor.java:1149)
> >         at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > ThreadPoolExecutor.java:624)
> >         at java.lang.Thread.run(Thread.java:748)
> >
> > and
> >
> > 2018-02-25 01:12:52,432 ERROR [regionserver/
> > ip-xx-xx-xx-xx.xx-xx-xx.us-east-1.ec2.xx.net/xx.xx.xx.xx:
> > 16020-BucketCacheWriter-7]
> > bucket.BucketCache: Failed writing to bucket cache
> > java.nio.channels.ClosedChannelException
> >         at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.
> java:110)
> >         at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:758)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$
> > FileWriteAccessor.access(FileIOEngine.java:227)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > accessFile(FileIOEngine.java:170)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > write(FileIOEngine.java:116)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > RAMQueueEntry.writeToCache(BucketCache.java:1357)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> WriterThread.doDrain(
> > BucketCache.java:883)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > WriterThread.run(BucketCache.java:838)
> >         at java.lang.Thread.run(Thread.java:748)
> >
> > and later
> > 2018-02-25 01:13:47,783 INFO  [regionserver/
> > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > 194.246.70:16020-BucketCacheWriter-4]
> > bucket.BucketCach
> > e: regionserver/
> > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > 194.246.70:16020-BucketCacheWriter-4
> > exiting, cacheEnabled=false
> > 2018-02-25 01:13:47,864 WARN  [regionserver/
> > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > 194.246.70:16020-BucketCacheWriter-6]
> > bucket.FileIOEngi
> > ne: Failed syncing data to /mnt1/hbase/bucketcache
> > 2018-02-25 01:13:47,864 ERROR [regionserver/
> > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > 194.246.70:16020-BucketCacheWriter-6]
> > bucket.BucketCach
> > e: Failed syncing IO engine
> > java.nio.channels.ClosedChannelException
> >         at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.
> java:110)
> >         at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:379)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > sync(FileIOEngine.java:128)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> WriterThread.doDrain(
> > BucketCache.java:911)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > WriterThread.run(BucketCache.java:838)
> >         at java.lang.Thread.run(Thread.java:748)
> >
> > ----
> > Saad
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Bucket Cache Failure In HBase 1.3.1

Saad Mufti
In reply to this post by Ted Yu-3
Thanks for the feedback, so you guys are right the bucket cache is getting
disabled due to too many I/O errors from the underlying files making up the
bucket cache. Still do not know the exact underlying cause, but we are
working with our vendor to test a patch they provided that seems to have
resolved the issue for now. They say if it works out well they will
eventually try to promote the patch to the open source versions.

Cheers.

----
Saad


On Sun, Feb 25, 2018 at 11:10 AM, Ted Yu <[hidden email]> wrote:

> Here is related code for disabling bucket cache:
>
>     if (this.ioErrorStartTime > 0) {
>
>       if (cacheEnabled && (now - ioErrorStartTime) > this.
> ioErrorsTolerationDuration) {
>
>         LOG.error("IO errors duration time has exceeded " +
> ioErrorsTolerationDuration +
>
>           "ms, disabling cache, please check your IOEngine");
>
>         disableCache();
>
> Can you search in the region server log to see if the above occurred ?
>
> Was this server the only one with disabled cache ?
>
> Cheers
>
> On Sun, Feb 25, 2018 at 6:20 AM, Saad Mufti <[hidden email]>
> wrote:
>
> > HI,
> >
> > I am running an HBase 1.3.1 cluster on AWS EMR. The bucket cache is
> > configured to use two attached EBS disks of 50 GB each and I provisioned
> > the bucket cache to be a bit less than the total, at a total of 98 GB per
> > instance to be on the safe side. My tables have column families set to
> > prefetch on open.
> >
> > On some instances during cluster startup, the bucket cache starts
> throwing
> > errors, and eventually the bucket cache gets completely disabled on this
> > instance. The instance still stays up as a valid region server and the
> only
> > clue in the region server UI is that the bucket cache tab reports a count
> > of 0, and size of 0 bytes.
> >
> > I have already opened a ticket with AWS to see if there are problems with
> > the EBS volumes, but wanted to tap the open source community's hive-mind
> to
> > see what kind of problem would cause the bucket cache to get disabled. If
> > the application depends on the bucket cache for performance, wouldn't it
> be
> > better to just remove that region server from the pool if its bucket
> cache
> > cannot be recovered/enabled?
> >
> > The error look like the following. Would appreciate any insight, thank:
> >
> > 2018-02-25 01:12:47,780 ERROR [hfile-prefetch-1519513834057]
> > bucket.BucketCache: Failed reading block
> > 332b0634287f4c42851bc1a55ffe4042_1348128 from bucket cache
> > java.nio.channels.ClosedByInterruptException
> >         at
> > java.nio.channels.spi.AbstractInterruptibleChannel.end(
> > AbstractInterruptibleChannel.java:202)
> >         at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.
> > java:746)
> >         at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:727)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$
> > FileReadAccessor.access(FileIOEngine.java:219)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > accessFile(FileIOEngine.java:170)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > read(FileIOEngine.java:105)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.
> > getBlock(BucketCache.java:492)
> >         at
> > org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.
> > getBlock(CombinedBlockCache.java:84)
> >         at
> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.
> > getCachedBlock(HFileReaderV2.java:279)
> >         at
> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(
> > HFileReaderV2.java:420)
> >         at
> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$1.run(
> > HFileReaderV2.java:209)
> >         at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> >         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >         at
> > java.util.concurrent.ScheduledThreadPoolExecutor$
> > ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> >         at
> > java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.run(
> > ScheduledThreadPoolExecutor.java:293)
> >         at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> > ThreadPoolExecutor.java:1149)
> >         at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > ThreadPoolExecutor.java:624)
> >         at java.lang.Thread.run(Thread.java:748)
> >
> > and
> >
> > 2018-02-25 01:12:52,432 ERROR [regionserver/
> > ip-xx-xx-xx-xx.xx-xx-xx.us-east-1.ec2.xx.net/xx.xx.xx.xx:
> > 16020-BucketCacheWriter-7]
> > bucket.BucketCache: Failed writing to bucket cache
> > java.nio.channels.ClosedChannelException
> >         at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.
> java:110)
> >         at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:758)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$
> > FileWriteAccessor.access(FileIOEngine.java:227)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > accessFile(FileIOEngine.java:170)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > write(FileIOEngine.java:116)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > RAMQueueEntry.writeToCache(BucketCache.java:1357)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> WriterThread.doDrain(
> > BucketCache.java:883)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > WriterThread.run(BucketCache.java:838)
> >         at java.lang.Thread.run(Thread.java:748)
> >
> > and later
> > 2018-02-25 01:13:47,783 INFO  [regionserver/
> > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > 194.246.70:16020-BucketCacheWriter-4]
> > bucket.BucketCach
> > e: regionserver/
> > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > 194.246.70:16020-BucketCacheWriter-4
> > exiting, cacheEnabled=false
> > 2018-02-25 01:13:47,864 WARN  [regionserver/
> > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > 194.246.70:16020-BucketCacheWriter-6]
> > bucket.FileIOEngi
> > ne: Failed syncing data to /mnt1/hbase/bucketcache
> > 2018-02-25 01:13:47,864 ERROR [regionserver/
> > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > 194.246.70:16020-BucketCacheWriter-6]
> > bucket.BucketCach
> > e: Failed syncing IO engine
> > java.nio.channels.ClosedChannelException
> >         at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.
> java:110)
> >         at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:379)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > sync(FileIOEngine.java:128)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> WriterThread.doDrain(
> > BucketCache.java:911)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > WriterThread.run(BucketCache.java:838)
> >         at java.lang.Thread.run(Thread.java:748)
> >
> > ----
> > Saad
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Bucket Cache Failure In HBase 1.3.1

Saad Mufti
In reply to this post by ramkrishna vasudevan
Thanks, see my other reply. We have a patch from the vendor but until it
gets promoted to open source we still don't know the real underlying cause,
but you're right the cache got disabled due to too many I/O errors in a
short timespan.

Cheers.

----
Saad


On Mon, Feb 26, 2018 at 12:24 AM, ramkrishna vasudevan <
[hidden email]> wrote:

> From the logs, it seems there were some issue with the file that was used
> by the bucket cache. Probably the volume where the file was mounted had
> some issues.
> If you can confirm that , then this issue should be pretty straightforward.
> If not let us know, we can help.
>
> Regards
> Ram
>
> On Sun, Feb 25, 2018 at 9:40 PM, Ted Yu <[hidden email]> wrote:
>
> > Here is related code for disabling bucket cache:
> >
> >     if (this.ioErrorStartTime > 0) {
> >
> >       if (cacheEnabled && (now - ioErrorStartTime) > this.
> > ioErrorsTolerationDuration) {
> >
> >         LOG.error("IO errors duration time has exceeded " +
> > ioErrorsTolerationDuration +
> >
> >           "ms, disabling cache, please check your IOEngine");
> >
> >         disableCache();
> >
> > Can you search in the region server log to see if the above occurred ?
> >
> > Was this server the only one with disabled cache ?
> >
> > Cheers
> >
> > On Sun, Feb 25, 2018 at 6:20 AM, Saad Mufti <[hidden email]
> >
> > wrote:
> >
> > > HI,
> > >
> > > I am running an HBase 1.3.1 cluster on AWS EMR. The bucket cache is
> > > configured to use two attached EBS disks of 50 GB each and I
> provisioned
> > > the bucket cache to be a bit less than the total, at a total of 98 GB
> per
> > > instance to be on the safe side. My tables have column families set to
> > > prefetch on open.
> > >
> > > On some instances during cluster startup, the bucket cache starts
> > throwing
> > > errors, and eventually the bucket cache gets completely disabled on
> this
> > > instance. The instance still stays up as a valid region server and the
> > only
> > > clue in the region server UI is that the bucket cache tab reports a
> count
> > > of 0, and size of 0 bytes.
> > >
> > > I have already opened a ticket with AWS to see if there are problems
> with
> > > the EBS volumes, but wanted to tap the open source community's
> hive-mind
> > to
> > > see what kind of problem would cause the bucket cache to get disabled.
> If
> > > the application depends on the bucket cache for performance, wouldn't
> it
> > be
> > > better to just remove that region server from the pool if its bucket
> > cache
> > > cannot be recovered/enabled?
> > >
> > > The error look like the following. Would appreciate any insight, thank:
> > >
> > > 2018-02-25 01:12:47,780 ERROR [hfile-prefetch-1519513834057]
> > > bucket.BucketCache: Failed reading block
> > > 332b0634287f4c42851bc1a55ffe4042_1348128 from bucket cache
> > > java.nio.channels.ClosedByInterruptException
> > >         at
> > > java.nio.channels.spi.AbstractInterruptibleChannel.end(
> > > AbstractInterruptibleChannel.java:202)
> > >         at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.
> > > java:746)
> > >         at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:727)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$
> > > FileReadAccessor.access(FileIOEngine.java:219)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > accessFile(FileIOEngine.java:170)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > read(FileIOEngine.java:105)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.
> > > getBlock(BucketCache.java:492)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.
> > > getBlock(CombinedBlockCache.java:84)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.
> > > getCachedBlock(HFileReaderV2.java:279)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(
> > > HFileReaderV2.java:420)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$1.run(
> > > HFileReaderV2.java:209)
> > >         at
> > > java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:511)
> > >         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > >         at
> > > java.util.concurrent.ScheduledThreadPoolExecutor$
> > > ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> > >         at
> > > java.util.concurrent.ScheduledThreadPoolExecutor$
> > ScheduledFutureTask.run(
> > > ScheduledThreadPoolExecutor.java:293)
> > >         at
> > > java.util.concurrent.ThreadPoolExecutor.runWorker(
> > > ThreadPoolExecutor.java:1149)
> > >         at
> > > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > > ThreadPoolExecutor.java:624)
> > >         at java.lang.Thread.run(Thread.java:748)
> > >
> > > and
> > >
> > > 2018-02-25 01:12:52,432 ERROR [regionserver/
> > > ip-xx-xx-xx-xx.xx-xx-xx.us-east-1.ec2.xx.net/xx.xx.xx.xx:
> > > 16020-BucketCacheWriter-7]
> > > bucket.BucketCache: Failed writing to bucket cache
> > > java.nio.channels.ClosedChannelException
> > >         at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.
> > java:110)
> > >         at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:758)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$
> > > FileWriteAccessor.access(FileIOEngine.java:227)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > accessFile(FileIOEngine.java:170)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > write(FileIOEngine.java:116)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > > RAMQueueEntry.writeToCache(BucketCache.java:1357)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > WriterThread.doDrain(
> > > BucketCache.java:883)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > > WriterThread.run(BucketCache.java:838)
> > >         at java.lang.Thread.run(Thread.java:748)
> > >
> > > and later
> > > 2018-02-25 01:13:47,783 INFO  [regionserver/
> > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > > 194.246.70:16020-BucketCacheWriter-4]
> > > bucket.BucketCach
> > > e: regionserver/
> > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > > 194.246.70:16020-BucketCacheWriter-4
> > > exiting, cacheEnabled=false
> > > 2018-02-25 01:13:47,864 WARN  [regionserver/
> > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > > 194.246.70:16020-BucketCacheWriter-6]
> > > bucket.FileIOEngi
> > > ne: Failed syncing data to /mnt1/hbase/bucketcache
> > > 2018-02-25 01:13:47,864 ERROR [regionserver/
> > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > > 194.246.70:16020-BucketCacheWriter-6]
> > > bucket.BucketCach
> > > e: Failed syncing IO engine
> > > java.nio.channels.ClosedChannelException
> > >         at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.
> > java:110)
> > >         at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:379)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > sync(FileIOEngine.java:128)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > WriterThread.doDrain(
> > > BucketCache.java:911)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > > WriterThread.run(BucketCache.java:838)
> > >         at java.lang.Thread.run(Thread.java:748)
> > >
> > > ----
> > > Saad
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Bucket Cache Failure In HBase 1.3.1

Ted Yu-3
In reply to this post by Saad Mufti
Did the vendor say whether the patch is for hbase or some other component ?

Thanks

On Wed, Feb 28, 2018 at 6:33 PM, Saad Mufti <[hidden email]> wrote:

> Thanks for the feedback, so you guys are right the bucket cache is getting
> disabled due to too many I/O errors from the underlying files making up the
> bucket cache. Still do not know the exact underlying cause, but we are
> working with our vendor to test a patch they provided that seems to have
> resolved the issue for now. They say if it works out well they will
> eventually try to promote the patch to the open source versions.
>
> Cheers.
>
> ----
> Saad
>
>
> On Sun, Feb 25, 2018 at 11:10 AM, Ted Yu <[hidden email]> wrote:
>
> > Here is related code for disabling bucket cache:
> >
> >     if (this.ioErrorStartTime > 0) {
> >
> >       if (cacheEnabled && (now - ioErrorStartTime) > this.
> > ioErrorsTolerationDuration) {
> >
> >         LOG.error("IO errors duration time has exceeded " +
> > ioErrorsTolerationDuration +
> >
> >           "ms, disabling cache, please check your IOEngine");
> >
> >         disableCache();
> >
> > Can you search in the region server log to see if the above occurred ?
> >
> > Was this server the only one with disabled cache ?
> >
> > Cheers
> >
> > On Sun, Feb 25, 2018 at 6:20 AM, Saad Mufti <[hidden email]
> >
> > wrote:
> >
> > > HI,
> > >
> > > I am running an HBase 1.3.1 cluster on AWS EMR. The bucket cache is
> > > configured to use two attached EBS disks of 50 GB each and I
> provisioned
> > > the bucket cache to be a bit less than the total, at a total of 98 GB
> per
> > > instance to be on the safe side. My tables have column families set to
> > > prefetch on open.
> > >
> > > On some instances during cluster startup, the bucket cache starts
> > throwing
> > > errors, and eventually the bucket cache gets completely disabled on
> this
> > > instance. The instance still stays up as a valid region server and the
> > only
> > > clue in the region server UI is that the bucket cache tab reports a
> count
> > > of 0, and size of 0 bytes.
> > >
> > > I have already opened a ticket with AWS to see if there are problems
> with
> > > the EBS volumes, but wanted to tap the open source community's
> hive-mind
> > to
> > > see what kind of problem would cause the bucket cache to get disabled.
> If
> > > the application depends on the bucket cache for performance, wouldn't
> it
> > be
> > > better to just remove that region server from the pool if its bucket
> > cache
> > > cannot be recovered/enabled?
> > >
> > > The error look like the following. Would appreciate any insight, thank:
> > >
> > > 2018-02-25 01:12:47,780 ERROR [hfile-prefetch-1519513834057]
> > > bucket.BucketCache: Failed reading block
> > > 332b0634287f4c42851bc1a55ffe4042_1348128 from bucket cache
> > > java.nio.channels.ClosedByInterruptException
> > >         at
> > > java.nio.channels.spi.AbstractInterruptibleChannel.end(
> > > AbstractInterruptibleChannel.java:202)
> > >         at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.
> > > java:746)
> > >         at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:727)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$
> > > FileReadAccessor.access(FileIOEngine.java:219)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > accessFile(FileIOEngine.java:170)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > read(FileIOEngine.java:105)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.
> > > getBlock(BucketCache.java:492)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.
> > > getBlock(CombinedBlockCache.java:84)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.
> > > getCachedBlock(HFileReaderV2.java:279)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(
> > > HFileReaderV2.java:420)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$1.run(
> > > HFileReaderV2.java:209)
> > >         at
> > > java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:511)
> > >         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > >         at
> > > java.util.concurrent.ScheduledThreadPoolExecutor$
> > > ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> > >         at
> > > java.util.concurrent.ScheduledThreadPoolExecutor$
> > ScheduledFutureTask.run(
> > > ScheduledThreadPoolExecutor.java:293)
> > >         at
> > > java.util.concurrent.ThreadPoolExecutor.runWorker(
> > > ThreadPoolExecutor.java:1149)
> > >         at
> > > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > > ThreadPoolExecutor.java:624)
> > >         at java.lang.Thread.run(Thread.java:748)
> > >
> > > and
> > >
> > > 2018-02-25 01:12:52,432 ERROR [regionserver/
> > > ip-xx-xx-xx-xx.xx-xx-xx.us-east-1.ec2.xx.net/xx.xx.xx.xx:
> > > 16020-BucketCacheWriter-7]
> > > bucket.BucketCache: Failed writing to bucket cache
> > > java.nio.channels.ClosedChannelException
> > >         at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.
> > java:110)
> > >         at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:758)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$
> > > FileWriteAccessor.access(FileIOEngine.java:227)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > accessFile(FileIOEngine.java:170)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > write(FileIOEngine.java:116)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > > RAMQueueEntry.writeToCache(BucketCache.java:1357)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > WriterThread.doDrain(
> > > BucketCache.java:883)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > > WriterThread.run(BucketCache.java:838)
> > >         at java.lang.Thread.run(Thread.java:748)
> > >
> > > and later
> > > 2018-02-25 01:13:47,783 INFO  [regionserver/
> > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > > 194.246.70:16020-BucketCacheWriter-4]
> > > bucket.BucketCach
> > > e: regionserver/
> > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > > 194.246.70:16020-BucketCacheWriter-4
> > > exiting, cacheEnabled=false
> > > 2018-02-25 01:13:47,864 WARN  [regionserver/
> > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > > 194.246.70:16020-BucketCacheWriter-6]
> > > bucket.FileIOEngi
> > > ne: Failed syncing data to /mnt1/hbase/bucketcache
> > > 2018-02-25 01:13:47,864 ERROR [regionserver/
> > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > > 194.246.70:16020-BucketCacheWriter-6]
> > > bucket.BucketCach
> > > e: Failed syncing IO engine
> > > java.nio.channels.ClosedChannelException
> > >         at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.
> > java:110)
> > >         at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:379)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > sync(FileIOEngine.java:128)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > WriterThread.doDrain(
> > > BucketCache.java:911)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > > WriterThread.run(BucketCache.java:838)
> > >         at java.lang.Thread.run(Thread.java:748)
> > >
> > > ----
> > > Saad
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Bucket Cache Failure In HBase 1.3.1

Saad Mufti
I think it is for HBASE itself. But I'll have to wait for more details as
they haven't shared the source code with us. I imagine they want to do a
bunch more testing and other process stuff.

----
Saad

On Wed, Feb 28, 2018 at 9:45 PM Ted Yu <[hidden email]> wrote:

> Did the vendor say whether the patch is for hbase or some other component ?
>
> Thanks
>
> On Wed, Feb 28, 2018 at 6:33 PM, Saad Mufti <[hidden email]> wrote:
>
> > Thanks for the feedback, so you guys are right the bucket cache is
> getting
> > disabled due to too many I/O errors from the underlying files making up
> the
> > bucket cache. Still do not know the exact underlying cause, but we are
> > working with our vendor to test a patch they provided that seems to have
> > resolved the issue for now. They say if it works out well they will
> > eventually try to promote the patch to the open source versions.
> >
> > Cheers.
> >
> > ----
> > Saad
> >
> >
> > On Sun, Feb 25, 2018 at 11:10 AM, Ted Yu <[hidden email]> wrote:
> >
> > > Here is related code for disabling bucket cache:
> > >
> > >     if (this.ioErrorStartTime > 0) {
> > >
> > >       if (cacheEnabled && (now - ioErrorStartTime) > this.
> > > ioErrorsTolerationDuration) {
> > >
> > >         LOG.error("IO errors duration time has exceeded " +
> > > ioErrorsTolerationDuration +
> > >
> > >           "ms, disabling cache, please check your IOEngine");
> > >
> > >         disableCache();
> > >
> > > Can you search in the region server log to see if the above occurred ?
> > >
> > > Was this server the only one with disabled cache ?
> > >
> > > Cheers
> > >
> > > On Sun, Feb 25, 2018 at 6:20 AM, Saad Mufti
> <[hidden email]
> > >
> > > wrote:
> > >
> > > > HI,
> > > >
> > > > I am running an HBase 1.3.1 cluster on AWS EMR. The bucket cache is
> > > > configured to use two attached EBS disks of 50 GB each and I
> > provisioned
> > > > the bucket cache to be a bit less than the total, at a total of 98 GB
> > per
> > > > instance to be on the safe side. My tables have column families set
> to
> > > > prefetch on open.
> > > >
> > > > On some instances during cluster startup, the bucket cache starts
> > > throwing
> > > > errors, and eventually the bucket cache gets completely disabled on
> > this
> > > > instance. The instance still stays up as a valid region server and
> the
> > > only
> > > > clue in the region server UI is that the bucket cache tab reports a
> > count
> > > > of 0, and size of 0 bytes.
> > > >
> > > > I have already opened a ticket with AWS to see if there are problems
> > with
> > > > the EBS volumes, but wanted to tap the open source community's
> > hive-mind
> > > to
> > > > see what kind of problem would cause the bucket cache to get
> disabled.
> > If
> > > > the application depends on the bucket cache for performance, wouldn't
> > it
> > > be
> > > > better to just remove that region server from the pool if its bucket
> > > cache
> > > > cannot be recovered/enabled?
> > > >
> > > > The error look like the following. Would appreciate any insight,
> thank:
> > > >
> > > > 2018-02-25 01:12:47,780 ERROR [hfile-prefetch-1519513834057]
> > > > bucket.BucketCache: Failed reading block
> > > > 332b0634287f4c42851bc1a55ffe4042_1348128 from bucket cache
> > > > java.nio.channels.ClosedByInterruptException
> > > >         at
> > > > java.nio.channels.spi.AbstractInterruptibleChannel.end(
> > > > AbstractInterruptibleChannel.java:202)
> > > >         at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.
> > > > java:746)
> > > >         at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:727)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$
> > > > FileReadAccessor.access(FileIOEngine.java:219)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > > accessFile(FileIOEngine.java:170)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > > read(FileIOEngine.java:105)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.
> > > > getBlock(BucketCache.java:492)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.
> > > > getBlock(CombinedBlockCache.java:84)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.
> > > > getCachedBlock(HFileReaderV2.java:279)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(
> > > > HFileReaderV2.java:420)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$1.run(
> > > > HFileReaderV2.java:209)
> > > >         at
> > > > java.util.concurrent.Executors$RunnableAdapter.
> > call(Executors.java:511)
> > > >         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > > >         at
> > > > java.util.concurrent.ScheduledThreadPoolExecutor$
> > > > ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> > > >         at
> > > > java.util.concurrent.ScheduledThreadPoolExecutor$
> > > ScheduledFutureTask.run(
> > > > ScheduledThreadPoolExecutor.java:293)
> > > >         at
> > > > java.util.concurrent.ThreadPoolExecutor.runWorker(
> > > > ThreadPoolExecutor.java:1149)
> > > >         at
> > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > > > ThreadPoolExecutor.java:624)
> > > >         at java.lang.Thread.run(Thread.java:748)
> > > >
> > > > and
> > > >
> > > > 2018-02-25 01:12:52,432 ERROR [regionserver/
> > > > ip-xx-xx-xx-xx.xx-xx-xx.us-east-1.ec2.xx.net/xx.xx.xx.xx:
> > > > 16020-BucketCacheWriter-7]
> > > > bucket.BucketCache: Failed writing to bucket cache
> > > > java.nio.channels.ClosedChannelException
> > > >         at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.
> > > java:110)
> > > >         at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:758)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$
> > > > FileWriteAccessor.access(FileIOEngine.java:227)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > > accessFile(FileIOEngine.java:170)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > > write(FileIOEngine.java:116)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > > > RAMQueueEntry.writeToCache(BucketCache.java:1357)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > > WriterThread.doDrain(
> > > > BucketCache.java:883)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > > > WriterThread.run(BucketCache.java:838)
> > > >         at java.lang.Thread.run(Thread.java:748)
> > > >
> > > > and later
> > > > 2018-02-25 01:13:47,783 INFO  [regionserver/
> > > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > > > 194.246.70:16020-BucketCacheWriter-4]
> > > > bucket.BucketCach
> > > > e: regionserver/
> > > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > > > 194.246.70:16020-BucketCacheWriter-4
> > > > exiting, cacheEnabled=false
> > > > 2018-02-25 01:13:47,864 WARN  [regionserver/
> > > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > > > 194.246.70:16020-BucketCacheWriter-6]
> > > > bucket.FileIOEngi
> > > > ne: Failed syncing data to /mnt1/hbase/bucketcache
> > > > 2018-02-25 01:13:47,864 ERROR [regionserver/
> > > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > > > 194.246.70:16020-BucketCacheWriter-6]
> > > > bucket.BucketCach
> > > > e: Failed syncing IO engine
> > > > java.nio.channels.ClosedChannelException
> > > >         at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.
> > > java:110)
> > > >         at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:379)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > > sync(FileIOEngine.java:128)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > > WriterThread.doDrain(
> > > > BucketCache.java:911)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > > > WriterThread.run(BucketCache.java:838)
> > > >         at java.lang.Thread.run(Thread.java:748)
> > > >
> > > > ----
> > > > Saad
> > > >
> > >
> >
>