Any Repercussions of using Multiwal

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Any Repercussions of using Multiwal

Sachin Jain
Hi,

I was in the middle of a situation where I was getting
*RegionTooBusyException* with log something like:

    *Above Memstore limit, regionName = X ... memstore size = Y and
blockingMemstoreSize = Z*

This potentially hinted me towards *hotspotting* of a particular region. So
I fixed my keyspace partitioning to have more uniform distribution per
region. It did not completely fix the problem but definitely delayed it a
bit.

Next thing, I enabled *multiWal*. As I remember there is a configuration
which leads to flushing of memstores when the threshold of wal is reached.
Upon doing this, problem seems to go away.

But, this raises couple of questions

1. Are there any reprecussions of using *multiWal* in production
environment ?
2. If there are no repercussions and only benefits of using *multiWal*, why
is this not turned on by default. Let other consumers turn it off in
certain (whatever) scenarios.

PS: *Hbase Configuration*
Single Node (Local Setup) v1.3.1 Ubuntu 16 Core machine.

Thanks
-Sachin
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Any Repercussions of using Multiwal

Yu Li
Hi Sachin,

We have been using multiwal in production here in Alibaba for over 2 years
and see no problem. Facebook is also running multiwal online. Please refer
to HBASE-14457 <https://issues.apache.org/jira/browse/HBASE-14457> for more
details.

There's also a JIRA HBASE-15131
<https://issues.apache.org/jira/browse/HBASE-15131> proposing to turn on
multiwal by default but still under discussion, please feel free to leave
your voice there.

Regarding the issue you met, what's the setting of
hbase.regionserver.maxlogs in your env? By default it's 32 which means for
each RS the un-archived wal number shouldn't exceed 32. However, when
multiwal enabled, it allows 32 logs for each group, thus becoming 64 wals
allowed for a single RS.

Let me further explain how it leads to RegionTooBusyException:
1. if the number of un-archived wal exceeds the setting, it will check the
oldest WAL and flush all regions involved in it
2. if the data ingestion speed is high and wal keeps rolling, there'll be
many small hfiles flushed out, that compaction speed cannot catch up
3. when hfile number of one store exceeds the setting of
hbase.hstore.blockingStoreFiles (10 by default), it will delay the flush
for hbase.hstore.blockingWaitTime (90s by default)
4. when data ingestion continues but flush delayed, the memstore size might
exceed the upper limit thus throw RegionTooBusyException

Hope these information helps.

Best Regards,
Yu

On 6 June 2017 at 13:39, Sachin Jain <[hidden email]> wrote:

> Hi,
>
> I was in the middle of a situation where I was getting
> *RegionTooBusyException* with log something like:
>
>     *Above Memstore limit, regionName = X ... memstore size = Y and
> blockingMemstoreSize = Z*
>
> This potentially hinted me towards *hotspotting* of a particular region. So
> I fixed my keyspace partitioning to have more uniform distribution per
> region. It did not completely fix the problem but definitely delayed it a
> bit.
>
> Next thing, I enabled *multiWal*. As I remember there is a configuration
> which leads to flushing of memstores when the threshold of wal is reached.
> Upon doing this, problem seems to go away.
>
> But, this raises couple of questions
>
> 1. Are there any reprecussions of using *multiWal* in production
> environment ?
> 2. If there are no repercussions and only benefits of using *multiWal*, why
> is this not turned on by default. Let other consumers turn it off in
> certain (whatever) scenarios.
>
> PS: *Hbase Configuration*
> Single Node (Local Setup) v1.3.1 Ubuntu 16 Core machine.
>
> Thanks
> -Sachin
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Any Repercussions of using Multiwal

Sachin Jain
Thanks Yu!
It was certainly helpful.

> Regarding the issue you met, what's the setting of
hbase.regionserver.maxlogs in your env? By default it's 32 which means for
each RS the un-archived wal number shouldn't exceed 32. However, when
multiwal enabled, it allows 32 logs for each group, thus becoming 64 wals
allowed for a single RS.

I used default configuration for this. By multiWal, I understand there is
different wal per region. Can you please explain how did you get 64 wals
for a Region Server.

> when multiwal enabled, it allows 32 logs for each group, thus becoming 64
wals allowed for a single RS.

I thought one of the side effects of having multiwal enabled is that there
will be *large amount of data waiting in unarchived wals.*
So if a region server fails, it would take more time to playback the wal
files and hence it could *compromise Availability.*

Wdyt ?

Thanks
-Sachin


On Tue, Jun 6, 2017 at 2:04 PM, Yu Li <[hidden email]> wrote:

> Hi Sachin,
>
> We have been using multiwal in production here in Alibaba for over 2 years
> and see no problem. Facebook is also running multiwal online. Please refer
> to HBASE-14457 <https://issues.apache.org/jira/browse/HBASE-14457> for
> more
> details.
>
> There's also a JIRA HBASE-15131
> <https://issues.apache.org/jira/browse/HBASE-15131> proposing to turn on
> multiwal by default but still under discussion, please feel free to leave
> your voice there.
>
> Regarding the issue you met, what's the setting of
> hbase.regionserver.maxlogs in your env? By default it's 32 which means for
> each RS the un-archived wal number shouldn't exceed 32. However, when
> multiwal enabled, it allows 32 logs for each group, thus becoming 64 wals
> allowed for a single RS.
>
> Let me further explain how it leads to RegionTooBusyException:
> 1. if the number of un-archived wal exceeds the setting, it will check the
> oldest WAL and flush all regions involved in it
> 2. if the data ingestion speed is high and wal keeps rolling, there'll be
> many small hfiles flushed out, that compaction speed cannot catch up
> 3. when hfile number of one store exceeds the setting of
> hbase.hstore.blockingStoreFiles (10 by default), it will delay the flush
> for hbase.hstore.blockingWaitTime (90s by default)
> 4. when data ingestion continues but flush delayed, the memstore size might
> exceed the upper limit thus throw RegionTooBusyException
>
> Hope these information helps.
>
> Best Regards,
> Yu
>
> On 6 June 2017 at 13:39, Sachin Jain <[hidden email]> wrote:
>
> > Hi,
> >
> > I was in the middle of a situation where I was getting
> > *RegionTooBusyException* with log something like:
> >
> >     *Above Memstore limit, regionName = X ... memstore size = Y and
> > blockingMemstoreSize = Z*
> >
> > This potentially hinted me towards *hotspotting* of a particular region.
> So
> > I fixed my keyspace partitioning to have more uniform distribution per
> > region. It did not completely fix the problem but definitely delayed it a
> > bit.
> >
> > Next thing, I enabled *multiWal*. As I remember there is a configuration
> > which leads to flushing of memstores when the threshold of wal is
> reached.
> > Upon doing this, problem seems to go away.
> >
> > But, this raises couple of questions
> >
> > 1. Are there any reprecussions of using *multiWal* in production
> > environment ?
> > 2. If there are no repercussions and only benefits of using *multiWal*,
> why
> > is this not turned on by default. Let other consumers turn it off in
> > certain (whatever) scenarios.
> >
> > PS: *Hbase Configuration*
> > Single Node (Local Setup) v1.3.1 Ubuntu 16 Core machine.
> >
> > Thanks
> > -Sachin
> >
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Any Repercussions of using Multiwal

Anoop John
You can config this max WALs. (As said by Yu , hbase.regionserver.maxlogs)
When the total un archived WAL files count exceeds this, we will do
force flushes and so release some of the WALs.   As Yu mentioned, when
we use multi WAL and say we have 2 WAL groups, this WAL count
effectively will be  32 * 2 = 64.   But u can config it to a lower
value than the def 32.

-Anoop-

On Tue, Jun 6, 2017 at 6:12 PM, Sachin Jain <[hidden email]> wrote:

> Thanks Yu!
> It was certainly helpful.
>
>> Regarding the issue you met, what's the setting of
> hbase.regionserver.maxlogs in your env? By default it's 32 which means for
> each RS the un-archived wal number shouldn't exceed 32. However, when
> multiwal enabled, it allows 32 logs for each group, thus becoming 64 wals
> allowed for a single RS.
>
> I used default configuration for this. By multiWal, I understand there is
> different wal per region. Can you please explain how did you get 64 wals
> for a Region Server.
>
>> when multiwal enabled, it allows 32 logs for each group, thus becoming 64
> wals allowed for a single RS.
>
> I thought one of the side effects of having multiwal enabled is that there
> will be *large amount of data waiting in unarchived wals.*
> So if a region server fails, it would take more time to playback the wal
> files and hence it could *compromise Availability.*
>
> Wdyt ?
>
> Thanks
> -Sachin
>
>
> On Tue, Jun 6, 2017 at 2:04 PM, Yu Li <[hidden email]> wrote:
>
>> Hi Sachin,
>>
>> We have been using multiwal in production here in Alibaba for over 2 years
>> and see no problem. Facebook is also running multiwal online. Please refer
>> to HBASE-14457 <https://issues.apache.org/jira/browse/HBASE-14457> for
>> more
>> details.
>>
>> There's also a JIRA HBASE-15131
>> <https://issues.apache.org/jira/browse/HBASE-15131> proposing to turn on
>> multiwal by default but still under discussion, please feel free to leave
>> your voice there.
>>
>> Regarding the issue you met, what's the setting of
>> hbase.regionserver.maxlogs in your env? By default it's 32 which means for
>> each RS the un-archived wal number shouldn't exceed 32. However, when
>> multiwal enabled, it allows 32 logs for each group, thus becoming 64 wals
>> allowed for a single RS.
>>
>> Let me further explain how it leads to RegionTooBusyException:
>> 1. if the number of un-archived wal exceeds the setting, it will check the
>> oldest WAL and flush all regions involved in it
>> 2. if the data ingestion speed is high and wal keeps rolling, there'll be
>> many small hfiles flushed out, that compaction speed cannot catch up
>> 3. when hfile number of one store exceeds the setting of
>> hbase.hstore.blockingStoreFiles (10 by default), it will delay the flush
>> for hbase.hstore.blockingWaitTime (90s by default)
>> 4. when data ingestion continues but flush delayed, the memstore size might
>> exceed the upper limit thus throw RegionTooBusyException
>>
>> Hope these information helps.
>>
>> Best Regards,
>> Yu
>>
>> On 6 June 2017 at 13:39, Sachin Jain <[hidden email]> wrote:
>>
>> > Hi,
>> >
>> > I was in the middle of a situation where I was getting
>> > *RegionTooBusyException* with log something like:
>> >
>> >     *Above Memstore limit, regionName = X ... memstore size = Y and
>> > blockingMemstoreSize = Z*
>> >
>> > This potentially hinted me towards *hotspotting* of a particular region.
>> So
>> > I fixed my keyspace partitioning to have more uniform distribution per
>> > region. It did not completely fix the problem but definitely delayed it a
>> > bit.
>> >
>> > Next thing, I enabled *multiWal*. As I remember there is a configuration
>> > which leads to flushing of memstores when the threshold of wal is
>> reached.
>> > Upon doing this, problem seems to go away.
>> >
>> > But, this raises couple of questions
>> >
>> > 1. Are there any reprecussions of using *multiWal* in production
>> > environment ?
>> > 2. If there are no repercussions and only benefits of using *multiWal*,
>> why
>> > is this not turned on by default. Let other consumers turn it off in
>> > certain (whatever) scenarios.
>> >
>> > PS: *Hbase Configuration*
>> > Single Node (Local Setup) v1.3.1 Ubuntu 16 Core machine.
>> >
>> > Thanks
>> > -Sachin
>> >
>>
Loading...