[DISCUSS] No regions on Master node in 2.0

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
43 messages Options
123
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[DISCUSS] No regions on Master node in 2.0

stack-3
I would like to start a discussion on whether Master should be carrying
regions or not. No hurry. I see this thread going on a while and what with
2.0 being a ways out yet, there is no need to rush to a decision.

First, some background.

Currently in the master branch, HMaster hosts 'system tables': e.g.
hbase:meta. HMaster is doing more than just gardening the cluster,
bootstrapping and keeping all up and serving healthy as in branch-1; in
master branch, it is actually in the write path for the most critical
system regions.

Master is this way because HMaster and HRegionServer servers have so much
in common, they should be just one binary, w/ HMaster as any other server
with the HMaster function a minor appendage runnable by any running
HRegionServer.

I like this idea, but the unification work was just never finished. What is
in master branch is a compromise. HMaster is not a RegionServer but a
sort-of RegionServer doing part serving. So we have HMaster role, a new
part-RegionServer-carrying-special-regions role and then a full-on
HRegionServer role. We need to fix this messyness. We could revert to plain
branch-1 roles or carrying the
HMaster-function-is-something-any-RegionServer-could-execute through to
completion.

More background from a time long-past with good comments by the likes of
our Francis Liu and Mighty Matteo Bertozzi are here [1], on unifying master
and meta-serving. Slightly related are old discussions on being able to
scale by splitting meta with good comments by our Elliott Clark [2].

Also for consideration, the landscape has since changed. [1] was written
before we had ProcedureV2 available to us where we could record
intermediate transition states local to the Master rather than remote as
intermediate updates to an hbase:meta over rpc running on another node.

Enough on the background.

Let me provoke discussion by making the statement that we should undo
HMaster carrying any regions ever; that the HMaster function is work enough
for a single dedicated server and that it important enough that it cannot
take a background role on a serving RegionServer (I could go back from this
position if evidence HMaster role could be backgrounded). Notions of a
Master carrying system tables only are just not on given system tables will
be too big for a single server especially when hbase:meta is split (so we
can scale). This simple distinction of HMaster and RegionServer roles is
also what our users know and have gotten used to so needs to be a good
reason to change it (We can still pursue the single binary that can do
HMaster or HRegionServer role determined at runtime).

Thanks,
St.Ack

1.
https://docs.google.com/document/d/1xC-bCzAAKO59Xo3XN-Cl6p-5CM_4DMoR-WpnkmYZgpw/edit#heading=h.j5yqy7n04bkn
2.
https://docs.google.com/document/d/1eCuqf7i2dkWHL0PxcE1HE1nLRQ_tCyXI4JsOB6TAk60/edit#heading=h.80vcerzbkj93
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [DISCUSS] No regions on Master node in 2.0

Elliott Clark-3
# Without meta on master, we double assign and lose data.

That is currently a fact that I have seen over and over on multiple loaded
clusters. Some abstract clean up of deployment vs losing data is a
no-brainer for me. Master assignment, region split, region merge are all
risky, and all places that HBase can lose data. Meta being hosted on the
master makes communication easier and less flakey. Running ITBLL on a loop
that creates a new table every time, and without meta on master everything
will fail pretty reliably in ~2 days. With meta on master things pass MUCH
more.

# Master hosting the system tables locates the system tables as close as
possible to the machine that will be mutating the data.

Data locality is something that we all work for. Short circuit local reads,
Caching blocks in jvm, etc. Bringing data closer to the interested party
has a long history of making things faster and better. Master is in charge
of just about all mutations of all systems tables. It's in charge of
changing meta, changing acls, creating new namespaces, etc. So put the
memstore as close as possible to the system that's going to mutate meta.

# If you want to make meta faster then moving it to other regionservers
makes things worse.

Meta can get pretty hot. Putting it with other regions that clients will be
trying to access makes everything worse. It means that meta is competing
with user requests. If meta gets served and other requests don't, causing
more requests to meta; or requests to user regions get served and other
clients get starved.
At FB we've seen read throughput to meta doubled or more by swapping it to
master. Writes to meta are also much faster since there's no rpc hop, no
queueing, to fighting with reads. So far it has been the single biggest
thing to make meta faster.


On Thu, Apr 7, 2016 at 10:11 PM, Stack <[hidden email]> wrote:

> I would like to start a discussion on whether Master should be carrying
> regions or not. No hurry. I see this thread going on a while and what with
> 2.0 being a ways out yet, there is no need to rush to a decision.
>
> First, some background.
>
> Currently in the master branch, HMaster hosts 'system tables': e.g.
> hbase:meta. HMaster is doing more than just gardening the cluster,
> bootstrapping and keeping all up and serving healthy as in branch-1; in
> master branch, it is actually in the write path for the most critical
> system regions.
>
> Master is this way because HMaster and HRegionServer servers have so much
> in common, they should be just one binary, w/ HMaster as any other server
> with the HMaster function a minor appendage runnable by any running
> HRegionServer.
>
> I like this idea, but the unification work was just never finished. What is
> in master branch is a compromise. HMaster is not a RegionServer but a
> sort-of RegionServer doing part serving. So we have HMaster role, a new
> part-RegionServer-carrying-special-regions role and then a full-on
> HRegionServer role. We need to fix this messyness. We could revert to plain
> branch-1 roles or carrying the
> HMaster-function-is-something-any-RegionServer-could-execute through to
> completion.
>
> More background from a time long-past with good comments by the likes of
> our Francis Liu and Mighty Matteo Bertozzi are here [1], on unifying master
> and meta-serving. Slightly related are old discussions on being able to
> scale by splitting meta with good comments by our Elliott Clark [2].
>
> Also for consideration, the landscape has since changed. [1] was written
> before we had ProcedureV2 available to us where we could record
> intermediate transition states local to the Master rather than remote as
> intermediate updates to an hbase:meta over rpc running on another node.
>
> Enough on the background.
>
> Let me provoke discussion by making the statement that we should undo
> HMaster carrying any regions ever; that the HMaster function is work enough
> for a single dedicated server and that it important enough that it cannot
> take a background role on a serving RegionServer (I could go back from this
> position if evidence HMaster role could be backgrounded). Notions of a
> Master carrying system tables only are just not on given system tables will
> be too big for a single server especially when hbase:meta is split (so we
> can scale). This simple distinction of HMaster and RegionServer roles is
> also what our users know and have gotten used to so needs to be a good
> reason to change it (We can still pursue the single binary that can do
> HMaster or HRegionServer role determined at runtime).
>
> Thanks,
> St.Ack
>
> 1.
>
> https://docs.google.com/document/d/1xC-bCzAAKO59Xo3XN-Cl6p-5CM_4DMoR-WpnkmYZgpw/edit#heading=h.j5yqy7n04bkn
> 2.
>
> https://docs.google.com/document/d/1eCuqf7i2dkWHL0PxcE1HE1nLRQ_tCyXI4JsOB6TAk60/edit#heading=h.80vcerzbkj93
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [DISCUSS] No regions on Master node in 2.0

张铎(Duo Zhang)
Agree on the performance concerns. IMO we should not hurt the performance
of small(maybe normal?) clusters when scaling for huge clusters.
And I also agree that the current implementation which allows Master to
carry system regions is not good(sorry for the chinglish...). At least, it
makes the master startup really complicated.

So IMO, we should let the master process or master machine to also carry
system regions, but in another way. Start another RS instance on the same
machine or in the same JVM? Or build a new storage based on the procedure
store and convert it to a normal table when it is too large?

Thanks.

2016-04-08 16:42 GMT+08:00 Elliott Clark <[hidden email]>:

> # Without meta on master, we double assign and lose data.
>
> That is currently a fact that I have seen over and over on multiple loaded
> clusters. Some abstract clean up of deployment vs losing data is a
> no-brainer for me. Master assignment, region split, region merge are all
> risky, and all places that HBase can lose data. Meta being hosted on the
> master makes communication easier and less flakey. Running ITBLL on a loop
> that creates a new table every time, and without meta on master everything
> will fail pretty reliably in ~2 days. With meta on master things pass MUCH
> more.
>
> # Master hosting the system tables locates the system tables as close as
> possible to the machine that will be mutating the data.
>
> Data locality is something that we all work for. Short circuit local reads,
> Caching blocks in jvm, etc. Bringing data closer to the interested party
> has a long history of making things faster and better. Master is in charge
> of just about all mutations of all systems tables. It's in charge of
> changing meta, changing acls, creating new namespaces, etc. So put the
> memstore as close as possible to the system that's going to mutate meta.
>
> # If you want to make meta faster then moving it to other regionservers
> makes things worse.
>
> Meta can get pretty hot. Putting it with other regions that clients will be
> trying to access makes everything worse. It means that meta is competing
> with user requests. If meta gets served and other requests don't, causing
> more requests to meta; or requests to user regions get served and other
> clients get starved.
> At FB we've seen read throughput to meta doubled or more by swapping it to
> master. Writes to meta are also much faster since there's no rpc hop, no
> queueing, to fighting with reads. So far it has been the single biggest
> thing to make meta faster.
>
>
> On Thu, Apr 7, 2016 at 10:11 PM, Stack <[hidden email]> wrote:
>
> > I would like to start a discussion on whether Master should be carrying
> > regions or not. No hurry. I see this thread going on a while and what
> with
> > 2.0 being a ways out yet, there is no need to rush to a decision.
> >
> > First, some background.
> >
> > Currently in the master branch, HMaster hosts 'system tables': e.g.
> > hbase:meta. HMaster is doing more than just gardening the cluster,
> > bootstrapping and keeping all up and serving healthy as in branch-1; in
> > master branch, it is actually in the write path for the most critical
> > system regions.
> >
> > Master is this way because HMaster and HRegionServer servers have so much
> > in common, they should be just one binary, w/ HMaster as any other server
> > with the HMaster function a minor appendage runnable by any running
> > HRegionServer.
> >
> > I like this idea, but the unification work was just never finished. What
> is
> > in master branch is a compromise. HMaster is not a RegionServer but a
> > sort-of RegionServer doing part serving. So we have HMaster role, a new
> > part-RegionServer-carrying-special-regions role and then a full-on
> > HRegionServer role. We need to fix this messyness. We could revert to
> plain
> > branch-1 roles or carrying the
> > HMaster-function-is-something-any-RegionServer-could-execute through to
> > completion.
> >
> > More background from a time long-past with good comments by the likes of
> > our Francis Liu and Mighty Matteo Bertozzi are here [1], on unifying
> master
> > and meta-serving. Slightly related are old discussions on being able to
> > scale by splitting meta with good comments by our Elliott Clark [2].
> >
> > Also for consideration, the landscape has since changed. [1] was written
> > before we had ProcedureV2 available to us where we could record
> > intermediate transition states local to the Master rather than remote as
> > intermediate updates to an hbase:meta over rpc running on another node.
> >
> > Enough on the background.
> >
> > Let me provoke discussion by making the statement that we should undo
> > HMaster carrying any regions ever; that the HMaster function is work
> enough
> > for a single dedicated server and that it important enough that it cannot
> > take a background role on a serving RegionServer (I could go back from
> this
> > position if evidence HMaster role could be backgrounded). Notions of a
> > Master carrying system tables only are just not on given system tables
> will
> > be too big for a single server especially when hbase:meta is split (so we
> > can scale). This simple distinction of HMaster and RegionServer roles is
> > also what our users know and have gotten used to so needs to be a good
> > reason to change it (We can still pursue the single binary that can do
> > HMaster or HRegionServer role determined at runtime).
> >
> > Thanks,
> > St.Ack
> >
> > 1.
> >
> >
> https://docs.google.com/document/d/1xC-bCzAAKO59Xo3XN-Cl6p-5CM_4DMoR-WpnkmYZgpw/edit#heading=h.j5yqy7n04bkn
> > 2.
> >
> >
> https://docs.google.com/document/d/1eCuqf7i2dkWHL0PxcE1HE1nLRQ_tCyXI4JsOB6TAk60/edit#heading=h.80vcerzbkj93
> >
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [DISCUSS] No regions on Master node in 2.0

Matteo Bertozzi
# Without meta on master, we double assign and lose data.

I doubt meta on master solve this problem.
This has more to do on the fact that balancer, assignment, split, merge
are disjoint operations that are not aware of each other.
also those operation in general consist of multiple steps and if the master
crashes you may end up in an inconsistent state.

this is what proc-v2 should solve. since we are aware of each operation
there is no chance of double assignment and similar by design.

The master doesn't need the full meta to operate properly
it just need the "state" (at which point of the operation am I).
which is the wal of proc-v2. given that we can split meta or meta
remote without any problem. since we only have 1 update to meta to
update the location when the assignment is completed.

also at the moment the master has a copy of the information in meta.
a map with the RegionInfo, state and locations. but we are still doing
a query on meta instead of using that local map directly.
if we move meta on master we can remove that extra copy, but that
will tight together meta and master making impossible to offload meta, if
we need to.


In my opinion with the new assignment you have all the main problem solved.
we can keep regions on master as we have now,
so you can configure it to get more performance (avoid the remote rpc).
but our design should allow meta to be split and to be hosted somewhere
else.

Matteo


On Fri, Apr 8, 2016 at 2:08 AM, 张铎 <[hidden email]> wrote:

> Agree on the performance concerns. IMO we should not hurt the performance
> of small(maybe normal?) clusters when scaling for huge clusters.
> And I also agree that the current implementation which allows Master to
> carry system regions is not good(sorry for the chinglish...). At least, it
> makes the master startup really complicated.
>
> So IMO, we should let the master process or master machine to also carry
> system regions, but in another way. Start another RS instance on the same
> machine or in the same JVM? Or build a new storage based on the procedure
> store and convert it to a normal table when it is too large?
>
> Thanks.
>
> 2016-04-08 16:42 GMT+08:00 Elliott Clark <[hidden email]>:
>
> > # Without meta on master, we double assign and lose data.
> >
> > That is currently a fact that I have seen over and over on multiple
> loaded
> > clusters. Some abstract clean up of deployment vs losing data is a
> > no-brainer for me. Master assignment, region split, region merge are all
> > risky, and all places that HBase can lose data. Meta being hosted on the
> > master makes communication easier and less flakey. Running ITBLL on a
> loop
> > that creates a new table every time, and without meta on master
> everything
> > will fail pretty reliably in ~2 days. With meta on master things pass
> MUCH
> > more.
> >
> > # Master hosting the system tables locates the system tables as close as
> > possible to the machine that will be mutating the data.
> >
> > Data locality is something that we all work for. Short circuit local
> reads,
> > Caching blocks in jvm, etc. Bringing data closer to the interested party
> > has a long history of making things faster and better. Master is in
> charge
> > of just about all mutations of all systems tables. It's in charge of
> > changing meta, changing acls, creating new namespaces, etc. So put the
> > memstore as close as possible to the system that's going to mutate meta.
> >
> > # If you want to make meta faster then moving it to other regionservers
> > makes things worse.
> >
> > Meta can get pretty hot. Putting it with other regions that clients will
> be
> > trying to access makes everything worse. It means that meta is competing
> > with user requests. If meta gets served and other requests don't, causing
> > more requests to meta; or requests to user regions get served and other
> > clients get starved.
> > At FB we've seen read throughput to meta doubled or more by swapping it
> to
> > master. Writes to meta are also much faster since there's no rpc hop, no
> > queueing, to fighting with reads. So far it has been the single biggest
> > thing to make meta faster.
> >
> >
> > On Thu, Apr 7, 2016 at 10:11 PM, Stack <[hidden email]> wrote:
> >
> > > I would like to start a discussion on whether Master should be carrying
> > > regions or not. No hurry. I see this thread going on a while and what
> > with
> > > 2.0 being a ways out yet, there is no need to rush to a decision.
> > >
> > > First, some background.
> > >
> > > Currently in the master branch, HMaster hosts 'system tables': e.g.
> > > hbase:meta. HMaster is doing more than just gardening the cluster,
> > > bootstrapping and keeping all up and serving healthy as in branch-1; in
> > > master branch, it is actually in the write path for the most critical
> > > system regions.
> > >
> > > Master is this way because HMaster and HRegionServer servers have so
> much
> > > in common, they should be just one binary, w/ HMaster as any other
> server
> > > with the HMaster function a minor appendage runnable by any running
> > > HRegionServer.
> > >
> > > I like this idea, but the unification work was just never finished.
> What
> > is
> > > in master branch is a compromise. HMaster is not a RegionServer but a
> > > sort-of RegionServer doing part serving. So we have HMaster role, a new
> > > part-RegionServer-carrying-special-regions role and then a full-on
> > > HRegionServer role. We need to fix this messyness. We could revert to
> > plain
> > > branch-1 roles or carrying the
> > > HMaster-function-is-something-any-RegionServer-could-execute through to
> > > completion.
> > >
> > > More background from a time long-past with good comments by the likes
> of
> > > our Francis Liu and Mighty Matteo Bertozzi are here [1], on unifying
> > master
> > > and meta-serving. Slightly related are old discussions on being able to
> > > scale by splitting meta with good comments by our Elliott Clark [2].
> > >
> > > Also for consideration, the landscape has since changed. [1] was
> written
> > > before we had ProcedureV2 available to us where we could record
> > > intermediate transition states local to the Master rather than remote
> as
> > > intermediate updates to an hbase:meta over rpc running on another node.
> > >
> > > Enough on the background.
> > >
> > > Let me provoke discussion by making the statement that we should undo
> > > HMaster carrying any regions ever; that the HMaster function is work
> > enough
> > > for a single dedicated server and that it important enough that it
> cannot
> > > take a background role on a serving RegionServer (I could go back from
> > this
> > > position if evidence HMaster role could be backgrounded). Notions of a
> > > Master carrying system tables only are just not on given system tables
> > will
> > > be too big for a single server especially when hbase:meta is split (so
> we
> > > can scale). This simple distinction of HMaster and RegionServer roles
> is
> > > also what our users know and have gotten used to so needs to be a good
> > > reason to change it (We can still pursue the single binary that can do
> > > HMaster or HRegionServer role determined at runtime).
> > >
> > > Thanks,
> > > St.Ack
> > >
> > > 1.
> > >
> > >
> >
> https://docs.google.com/document/d/1xC-bCzAAKO59Xo3XN-Cl6p-5CM_4DMoR-WpnkmYZgpw/edit#heading=h.j5yqy7n04bkn
> > > 2.
> > >
> > >
> >
> https://docs.google.com/document/d/1eCuqf7i2dkWHL0PxcE1HE1nLRQ_tCyXI4JsOB6TAk60/edit#heading=h.80vcerzbkj93
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [DISCUSS] No regions on Master node in 2.0

Jimmy Xiang
One thing I'd like to say is that it makes the master startup much
more simpler and realiable to put system tables on master.

Even if proc-v2 can solve the problem, it makes things complicated,
right? I prefer to be sure that meta is always available, in a
consistent state.

If we really need to split meta, we should have an option for most
users to have just one meta region, and keep it on master.


On Fri, Apr 8, 2016 at 8:03 AM, Matteo Bertozzi <[hidden email]> wrote:

> # Without meta on master, we double assign and lose data.
>
> I doubt meta on master solve this problem.
> This has more to do on the fact that balancer, assignment, split, merge
> are disjoint operations that are not aware of each other.
> also those operation in general consist of multiple steps and if the master
> crashes you may end up in an inconsistent state.
>
> this is what proc-v2 should solve. since we are aware of each operation
> there is no chance of double assignment and similar by design.
>
> The master doesn't need the full meta to operate properly
> it just need the "state" (at which point of the operation am I).
> which is the wal of proc-v2. given that we can split meta or meta
> remote without any problem. since we only have 1 update to meta to
> update the location when the assignment is completed.
>
> also at the moment the master has a copy of the information in meta.
> a map with the RegionInfo, state and locations. but we are still doing
> a query on meta instead of using that local map directly.
> if we move meta on master we can remove that extra copy, but that
> will tight together meta and master making impossible to offload meta, if
> we need to.
>
>
> In my opinion with the new assignment you have all the main problem solved.
> we can keep regions on master as we have now,
> so you can configure it to get more performance (avoid the remote rpc).
> but our design should allow meta to be split and to be hosted somewhere
> else.
>
> Matteo
>
>
> On Fri, Apr 8, 2016 at 2:08 AM, 张铎 <[hidden email]> wrote:
>
>> Agree on the performance concerns. IMO we should not hurt the performance
>> of small(maybe normal?) clusters when scaling for huge clusters.
>> And I also agree that the current implementation which allows Master to
>> carry system regions is not good(sorry for the chinglish...). At least, it
>> makes the master startup really complicated.
>>
>> So IMO, we should let the master process or master machine to also carry
>> system regions, but in another way. Start another RS instance on the same
>> machine or in the same JVM? Or build a new storage based on the procedure
>> store and convert it to a normal table when it is too large?
>>
>> Thanks.
>>
>> 2016-04-08 16:42 GMT+08:00 Elliott Clark <[hidden email]>:
>>
>> > # Without meta on master, we double assign and lose data.
>> >
>> > That is currently a fact that I have seen over and over on multiple
>> loaded
>> > clusters. Some abstract clean up of deployment vs losing data is a
>> > no-brainer for me. Master assignment, region split, region merge are all
>> > risky, and all places that HBase can lose data. Meta being hosted on the
>> > master makes communication easier and less flakey. Running ITBLL on a
>> loop
>> > that creates a new table every time, and without meta on master
>> everything
>> > will fail pretty reliably in ~2 days. With meta on master things pass
>> MUCH
>> > more.
>> >
>> > # Master hosting the system tables locates the system tables as close as
>> > possible to the machine that will be mutating the data.
>> >
>> > Data locality is something that we all work for. Short circuit local
>> reads,
>> > Caching blocks in jvm, etc. Bringing data closer to the interested party
>> > has a long history of making things faster and better. Master is in
>> charge
>> > of just about all mutations of all systems tables. It's in charge of
>> > changing meta, changing acls, creating new namespaces, etc. So put the
>> > memstore as close as possible to the system that's going to mutate meta.
>> >
>> > # If you want to make meta faster then moving it to other regionservers
>> > makes things worse.
>> >
>> > Meta can get pretty hot. Putting it with other regions that clients will
>> be
>> > trying to access makes everything worse. It means that meta is competing
>> > with user requests. If meta gets served and other requests don't, causing
>> > more requests to meta; or requests to user regions get served and other
>> > clients get starved.
>> > At FB we've seen read throughput to meta doubled or more by swapping it
>> to
>> > master. Writes to meta are also much faster since there's no rpc hop, no
>> > queueing, to fighting with reads. So far it has been the single biggest
>> > thing to make meta faster.
>> >
>> >
>> > On Thu, Apr 7, 2016 at 10:11 PM, Stack <[hidden email]> wrote:
>> >
>> > > I would like to start a discussion on whether Master should be carrying
>> > > regions or not. No hurry. I see this thread going on a while and what
>> > with
>> > > 2.0 being a ways out yet, there is no need to rush to a decision.
>> > >
>> > > First, some background.
>> > >
>> > > Currently in the master branch, HMaster hosts 'system tables': e.g.
>> > > hbase:meta. HMaster is doing more than just gardening the cluster,
>> > > bootstrapping and keeping all up and serving healthy as in branch-1; in
>> > > master branch, it is actually in the write path for the most critical
>> > > system regions.
>> > >
>> > > Master is this way because HMaster and HRegionServer servers have so
>> much
>> > > in common, they should be just one binary, w/ HMaster as any other
>> server
>> > > with the HMaster function a minor appendage runnable by any running
>> > > HRegionServer.
>> > >
>> > > I like this idea, but the unification work was just never finished.
>> What
>> > is
>> > > in master branch is a compromise. HMaster is not a RegionServer but a
>> > > sort-of RegionServer doing part serving. So we have HMaster role, a new
>> > > part-RegionServer-carrying-special-regions role and then a full-on
>> > > HRegionServer role. We need to fix this messyness. We could revert to
>> > plain
>> > > branch-1 roles or carrying the
>> > > HMaster-function-is-something-any-RegionServer-could-execute through to
>> > > completion.
>> > >
>> > > More background from a time long-past with good comments by the likes
>> of
>> > > our Francis Liu and Mighty Matteo Bertozzi are here [1], on unifying
>> > master
>> > > and meta-serving. Slightly related are old discussions on being able to
>> > > scale by splitting meta with good comments by our Elliott Clark [2].
>> > >
>> > > Also for consideration, the landscape has since changed. [1] was
>> written
>> > > before we had ProcedureV2 available to us where we could record
>> > > intermediate transition states local to the Master rather than remote
>> as
>> > > intermediate updates to an hbase:meta over rpc running on another node.
>> > >
>> > > Enough on the background.
>> > >
>> > > Let me provoke discussion by making the statement that we should undo
>> > > HMaster carrying any regions ever; that the HMaster function is work
>> > enough
>> > > for a single dedicated server and that it important enough that it
>> cannot
>> > > take a background role on a serving RegionServer (I could go back from
>> > this
>> > > position if evidence HMaster role could be backgrounded). Notions of a
>> > > Master carrying system tables only are just not on given system tables
>> > will
>> > > be too big for a single server especially when hbase:meta is split (so
>> we
>> > > can scale). This simple distinction of HMaster and RegionServer roles
>> is
>> > > also what our users know and have gotten used to so needs to be a good
>> > > reason to change it (We can still pursue the single binary that can do
>> > > HMaster or HRegionServer role determined at runtime).
>> > >
>> > > Thanks,
>> > > St.Ack
>> > >
>> > > 1.
>> > >
>> > >
>> >
>> https://docs.google.com/document/d/1xC-bCzAAKO59Xo3XN-Cl6p-5CM_4DMoR-WpnkmYZgpw/edit#heading=h.j5yqy7n04bkn
>> > > 2.
>> > >
>> > >
>> >
>> https://docs.google.com/document/d/1eCuqf7i2dkWHL0PxcE1HE1nLRQ_tCyXI4JsOB6TAk60/edit#heading=h.80vcerzbkj93
>> > >
>> >
>>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [DISCUSS] No regions on Master node in 2.0

Matteo Bertozzi
I think proc-v2 make things easier than having meta hard coded on master.
we just read the wal and we get back to the state we were previously.
In this case it doesn't make any difference if meta is on master or remote,
if we have one or we have hundred.

if we hard code meta, we need a special logic to load it and from there
start the bootstrap of the other regions.
then there is no way to switch to multiple metas if someone wants that,
unless we keep two code path and one of that will be proc-v2.
so at that point we should just keep a single code path that does both.


On Fri, Apr 8, 2016 at 8:27 AM, Jimmy Xiang <[hidden email]> wrote:

> One thing I'd like to say is that it makes the master startup much
> more simpler and realiable to put system tables on master.
>
> Even if proc-v2 can solve the problem, it makes things complicated,
> right? I prefer to be sure that meta is always available, in a
> consistent state.
>
> If we really need to split meta, we should have an option for most
> users to have just one meta region, and keep it on master.
>
>
> On Fri, Apr 8, 2016 at 8:03 AM, Matteo Bertozzi <[hidden email]>
> wrote:
> > # Without meta on master, we double assign and lose data.
> >
> > I doubt meta on master solve this problem.
> > This has more to do on the fact that balancer, assignment, split, merge
> > are disjoint operations that are not aware of each other.
> > also those operation in general consist of multiple steps and if the
> master
> > crashes you may end up in an inconsistent state.
> >
> > this is what proc-v2 should solve. since we are aware of each operation
> > there is no chance of double assignment and similar by design.
> >
> > The master doesn't need the full meta to operate properly
> > it just need the "state" (at which point of the operation am I).
> > which is the wal of proc-v2. given that we can split meta or meta
> > remote without any problem. since we only have 1 update to meta to
> > update the location when the assignment is completed.
> >
> > also at the moment the master has a copy of the information in meta.
> > a map with the RegionInfo, state and locations. but we are still doing
> > a query on meta instead of using that local map directly.
> > if we move meta on master we can remove that extra copy, but that
> > will tight together meta and master making impossible to offload meta, if
> > we need to.
> >
> >
> > In my opinion with the new assignment you have all the main problem
> solved.
> > we can keep regions on master as we have now,
> > so you can configure it to get more performance (avoid the remote rpc).
> > but our design should allow meta to be split and to be hosted somewhere
> > else.
> >
> > Matteo
> >
> >
> > On Fri, Apr 8, 2016 at 2:08 AM, 张铎 <[hidden email]> wrote:
> >
> >> Agree on the performance concerns. IMO we should not hurt the
> performance
> >> of small(maybe normal?) clusters when scaling for huge clusters.
> >> And I also agree that the current implementation which allows Master to
> >> carry system regions is not good(sorry for the chinglish...). At least,
> it
> >> makes the master startup really complicated.
> >>
> >> So IMO, we should let the master process or master machine to also carry
> >> system regions, but in another way. Start another RS instance on the
> same
> >> machine or in the same JVM? Or build a new storage based on the
> procedure
> >> store and convert it to a normal table when it is too large?
> >>
> >> Thanks.
> >>
> >> 2016-04-08 16:42 GMT+08:00 Elliott Clark <[hidden email]>:
> >>
> >> > # Without meta on master, we double assign and lose data.
> >> >
> >> > That is currently a fact that I have seen over and over on multiple
> >> loaded
> >> > clusters. Some abstract clean up of deployment vs losing data is a
> >> > no-brainer for me. Master assignment, region split, region merge are
> all
> >> > risky, and all places that HBase can lose data. Meta being hosted on
> the
> >> > master makes communication easier and less flakey. Running ITBLL on a
> >> loop
> >> > that creates a new table every time, and without meta on master
> >> everything
> >> > will fail pretty reliably in ~2 days. With meta on master things pass
> >> MUCH
> >> > more.
> >> >
> >> > # Master hosting the system tables locates the system tables as close
> as
> >> > possible to the machine that will be mutating the data.
> >> >
> >> > Data locality is something that we all work for. Short circuit local
> >> reads,
> >> > Caching blocks in jvm, etc. Bringing data closer to the interested
> party
> >> > has a long history of making things faster and better. Master is in
> >> charge
> >> > of just about all mutations of all systems tables. It's in charge of
> >> > changing meta, changing acls, creating new namespaces, etc. So put the
> >> > memstore as close as possible to the system that's going to mutate
> meta.
> >> >
> >> > # If you want to make meta faster then moving it to other
> regionservers
> >> > makes things worse.
> >> >
> >> > Meta can get pretty hot. Putting it with other regions that clients
> will
> >> be
> >> > trying to access makes everything worse. It means that meta is
> competing
> >> > with user requests. If meta gets served and other requests don't,
> causing
> >> > more requests to meta; or requests to user regions get served and
> other
> >> > clients get starved.
> >> > At FB we've seen read throughput to meta doubled or more by swapping
> it
> >> to
> >> > master. Writes to meta are also much faster since there's no rpc hop,
> no
> >> > queueing, to fighting with reads. So far it has been the single
> biggest
> >> > thing to make meta faster.
> >> >
> >> >
> >> > On Thu, Apr 7, 2016 at 10:11 PM, Stack <[hidden email]> wrote:
> >> >
> >> > > I would like to start a discussion on whether Master should be
> carrying
> >> > > regions or not. No hurry. I see this thread going on a while and
> what
> >> > with
> >> > > 2.0 being a ways out yet, there is no need to rush to a decision.
> >> > >
> >> > > First, some background.
> >> > >
> >> > > Currently in the master branch, HMaster hosts 'system tables': e.g.
> >> > > hbase:meta. HMaster is doing more than just gardening the cluster,
> >> > > bootstrapping and keeping all up and serving healthy as in
> branch-1; in
> >> > > master branch, it is actually in the write path for the most
> critical
> >> > > system regions.
> >> > >
> >> > > Master is this way because HMaster and HRegionServer servers have so
> >> much
> >> > > in common, they should be just one binary, w/ HMaster as any other
> >> server
> >> > > with the HMaster function a minor appendage runnable by any running
> >> > > HRegionServer.
> >> > >
> >> > > I like this idea, but the unification work was just never finished.
> >> What
> >> > is
> >> > > in master branch is a compromise. HMaster is not a RegionServer but
> a
> >> > > sort-of RegionServer doing part serving. So we have HMaster role, a
> new
> >> > > part-RegionServer-carrying-special-regions role and then a full-on
> >> > > HRegionServer role. We need to fix this messyness. We could revert
> to
> >> > plain
> >> > > branch-1 roles or carrying the
> >> > > HMaster-function-is-something-any-RegionServer-could-execute
> through to
> >> > > completion.
> >> > >
> >> > > More background from a time long-past with good comments by the
> likes
> >> of
> >> > > our Francis Liu and Mighty Matteo Bertozzi are here [1], on unifying
> >> > master
> >> > > and meta-serving. Slightly related are old discussions on being
> able to
> >> > > scale by splitting meta with good comments by our Elliott Clark [2].
> >> > >
> >> > > Also for consideration, the landscape has since changed. [1] was
> >> written
> >> > > before we had ProcedureV2 available to us where we could record
> >> > > intermediate transition states local to the Master rather than
> remote
> >> as
> >> > > intermediate updates to an hbase:meta over rpc running on another
> node.
> >> > >
> >> > > Enough on the background.
> >> > >
> >> > > Let me provoke discussion by making the statement that we should
> undo
> >> > > HMaster carrying any regions ever; that the HMaster function is work
> >> > enough
> >> > > for a single dedicated server and that it important enough that it
> >> cannot
> >> > > take a background role on a serving RegionServer (I could go back
> from
> >> > this
> >> > > position if evidence HMaster role could be backgrounded). Notions
> of a
> >> > > Master carrying system tables only are just not on given system
> tables
> >> > will
> >> > > be too big for a single server especially when hbase:meta is split
> (so
> >> we
> >> > > can scale). This simple distinction of HMaster and RegionServer
> roles
> >> is
> >> > > also what our users know and have gotten used to so needs to be a
> good
> >> > > reason to change it (We can still pursue the single binary that can
> do
> >> > > HMaster or HRegionServer role determined at runtime).
> >> > >
> >> > > Thanks,
> >> > > St.Ack
> >> > >
> >> > > 1.
> >> > >
> >> > >
> >> >
> >>
> https://docs.google.com/document/d/1xC-bCzAAKO59Xo3XN-Cl6p-5CM_4DMoR-WpnkmYZgpw/edit#heading=h.j5yqy7n04bkn
> >> > > 2.
> >> > >
> >> > >
> >> >
> >>
> https://docs.google.com/document/d/1eCuqf7i2dkWHL0PxcE1HE1nLRQ_tCyXI4JsOB6TAk60/edit#heading=h.80vcerzbkj93
> >> > >
> >> >
> >>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [DISCUSS] No regions on Master node in 2.0

Elliott Clark-4
In reply to this post by Matteo Bertozzi
Proc v2 can't fix that it's harder to get a write into meta when going over
rpc. Our try at qos doesn't fix it. As long as critical meta operations are
competing with user requests meta will be unstabla

I am absolutely confident that meta on master makes hbase lose less data.
The itbll tests bear this out. The real world experience bears this out.
On Apr 8, 2016 8:03 AM, "Matteo Bertozzi" <[hidden email]> wrote:

> # Without meta on master, we double assign and lose data.
>
> I doubt meta on master solve this problem.
> This has more to do on the fact that balancer, assignment, split, merge
> are disjoint operations that are not aware of each other.
> also those operation in general consist of multiple steps and if the master
> crashes you may end up in an inconsistent state.
>
> this is what proc-v2 should solve. since we are aware of each operation
> there is no chance of double assignment and similar by design.
>
> The master doesn't need the full meta to operate properly
> it just need the "state" (at which point of the operation am I).
> which is the wal of proc-v2. given that we can split meta or meta
> remote without any problem. since we only have 1 update to meta to
> update the location when the assignment is completed.
>
> also at the moment the master has a copy of the information in meta.
> a map with the RegionInfo, state and locations. but we are still doing
> a query on meta instead of using that local map directly.
> if we move meta on master we can remove that extra copy, but that
> will tight together meta and master making impossible to offload meta, if
> we need to.
>
>
> In my opinion with the new assignment you have all the main problem solved.
> we can keep regions on master as we have now,
> so you can configure it to get more performance (avoid the remote rpc).
> but our design should allow meta to be split and to be hosted somewhere
> else.
>
> Matteo
>
>
> On Fri, Apr 8, 2016 at 2:08 AM, 张铎 <[hidden email]> wrote:
>
> > Agree on the performance concerns. IMO we should not hurt the performance
> > of small(maybe normal?) clusters when scaling for huge clusters.
> > And I also agree that the current implementation which allows Master to
> > carry system regions is not good(sorry for the chinglish...). At least,
> it
> > makes the master startup really complicated.
> >
> > So IMO, we should let the master process or master machine to also carry
> > system regions, but in another way. Start another RS instance on the same
> > machine or in the same JVM? Or build a new storage based on the procedure
> > store and convert it to a normal table when it is too large?
> >
> > Thanks.
> >
> > 2016-04-08 16:42 GMT+08:00 Elliott Clark <[hidden email]>:
> >
> > > # Without meta on master, we double assign and lose data.
> > >
> > > That is currently a fact that I have seen over and over on multiple
> > loaded
> > > clusters. Some abstract clean up of deployment vs losing data is a
> > > no-brainer for me. Master assignment, region split, region merge are
> all
> > > risky, and all places that HBase can lose data. Meta being hosted on
> the
> > > master makes communication easier and less flakey. Running ITBLL on a
> > loop
> > > that creates a new table every time, and without meta on master
> > everything
> > > will fail pretty reliably in ~2 days. With meta on master things pass
> > MUCH
> > > more.
> > >
> > > # Master hosting the system tables locates the system tables as close
> as
> > > possible to the machine that will be mutating the data.
> > >
> > > Data locality is something that we all work for. Short circuit local
> > reads,
> > > Caching blocks in jvm, etc. Bringing data closer to the interested
> party
> > > has a long history of making things faster and better. Master is in
> > charge
> > > of just about all mutations of all systems tables. It's in charge of
> > > changing meta, changing acls, creating new namespaces, etc. So put the
> > > memstore as close as possible to the system that's going to mutate
> meta.
> > >
> > > # If you want to make meta faster then moving it to other regionservers
> > > makes things worse.
> > >
> > > Meta can get pretty hot. Putting it with other regions that clients
> will
> > be
> > > trying to access makes everything worse. It means that meta is
> competing
> > > with user requests. If meta gets served and other requests don't,
> causing
> > > more requests to meta; or requests to user regions get served and other
> > > clients get starved.
> > > At FB we've seen read throughput to meta doubled or more by swapping it
> > to
> > > master. Writes to meta are also much faster since there's no rpc hop,
> no
> > > queueing, to fighting with reads. So far it has been the single biggest
> > > thing to make meta faster.
> > >
> > >
> > > On Thu, Apr 7, 2016 at 10:11 PM, Stack <[hidden email]> wrote:
> > >
> > > > I would like to start a discussion on whether Master should be
> carrying
> > > > regions or not. No hurry. I see this thread going on a while and what
> > > with
> > > > 2.0 being a ways out yet, there is no need to rush to a decision.
> > > >
> > > > First, some background.
> > > >
> > > > Currently in the master branch, HMaster hosts 'system tables': e.g.
> > > > hbase:meta. HMaster is doing more than just gardening the cluster,
> > > > bootstrapping and keeping all up and serving healthy as in branch-1;
> in
> > > > master branch, it is actually in the write path for the most critical
> > > > system regions.
> > > >
> > > > Master is this way because HMaster and HRegionServer servers have so
> > much
> > > > in common, they should be just one binary, w/ HMaster as any other
> > server
> > > > with the HMaster function a minor appendage runnable by any running
> > > > HRegionServer.
> > > >
> > > > I like this idea, but the unification work was just never finished.
> > What
> > > is
> > > > in master branch is a compromise. HMaster is not a RegionServer but a
> > > > sort-of RegionServer doing part serving. So we have HMaster role, a
> new
> > > > part-RegionServer-carrying-special-regions role and then a full-on
> > > > HRegionServer role. We need to fix this messyness. We could revert to
> > > plain
> > > > branch-1 roles or carrying the
> > > > HMaster-function-is-something-any-RegionServer-could-execute through
> to
> > > > completion.
> > > >
> > > > More background from a time long-past with good comments by the likes
> > of
> > > > our Francis Liu and Mighty Matteo Bertozzi are here [1], on unifying
> > > master
> > > > and meta-serving. Slightly related are old discussions on being able
> to
> > > > scale by splitting meta with good comments by our Elliott Clark [2].
> > > >
> > > > Also for consideration, the landscape has since changed. [1] was
> > written
> > > > before we had ProcedureV2 available to us where we could record
> > > > intermediate transition states local to the Master rather than remote
> > as
> > > > intermediate updates to an hbase:meta over rpc running on another
> node.
> > > >
> > > > Enough on the background.
> > > >
> > > > Let me provoke discussion by making the statement that we should undo
> > > > HMaster carrying any regions ever; that the HMaster function is work
> > > enough
> > > > for a single dedicated server and that it important enough that it
> > cannot
> > > > take a background role on a serving RegionServer (I could go back
> from
> > > this
> > > > position if evidence HMaster role could be backgrounded). Notions of
> a
> > > > Master carrying system tables only are just not on given system
> tables
> > > will
> > > > be too big for a single server especially when hbase:meta is split
> (so
> > we
> > > > can scale). This simple distinction of HMaster and RegionServer roles
> > is
> > > > also what our users know and have gotten used to so needs to be a
> good
> > > > reason to change it (We can still pursue the single binary that can
> do
> > > > HMaster or HRegionServer role determined at runtime).
> > > >
> > > > Thanks,
> > > > St.Ack
> > > >
> > > > 1.
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1xC-bCzAAKO59Xo3XN-Cl6p-5CM_4DMoR-WpnkmYZgpw/edit#heading=h.j5yqy7n04bkn
> > > > 2.
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1eCuqf7i2dkWHL0PxcE1HE1nLRQ_tCyXI4JsOB6TAk60/edit#heading=h.80vcerzbkj93
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [DISCUSS] No regions on Master node in 2.0

Matteo Bertozzi
You are still thinking at meta used as a state machine.
to simplify, meta should just be: region:location

not being able to access meta only means that we can't publish
to the client the new location of the region.
when meta will be available, that location will be published.

what you are thinking about for meta on the master is "the state".
and with proc-v2 we have that state on the master.

Matteo


On Fri, Apr 8, 2016 at 8:46 AM, Elliott Clark <[hidden email]>
wrote:

> Proc v2 can't fix that it's harder to get a write into meta when going over
> rpc. Our try at qos doesn't fix it. As long as critical meta operations are
> competing with user requests meta will be unstabla
>
> I am absolutely confident that meta on master makes hbase lose less data.
> The itbll tests bear this out. The real world experience bears this out.
> On Apr 8, 2016 8:03 AM, "Matteo Bertozzi" <[hidden email]> wrote:
>
> > # Without meta on master, we double assign and lose data.
> >
> > I doubt meta on master solve this problem.
> > This has more to do on the fact that balancer, assignment, split, merge
> > are disjoint operations that are not aware of each other.
> > also those operation in general consist of multiple steps and if the
> master
> > crashes you may end up in an inconsistent state.
> >
> > this is what proc-v2 should solve. since we are aware of each operation
> > there is no chance of double assignment and similar by design.
> >
> > The master doesn't need the full meta to operate properly
> > it just need the "state" (at which point of the operation am I).
> > which is the wal of proc-v2. given that we can split meta or meta
> > remote without any problem. since we only have 1 update to meta to
> > update the location when the assignment is completed.
> >
> > also at the moment the master has a copy of the information in meta.
> > a map with the RegionInfo, state and locations. but we are still doing
> > a query on meta instead of using that local map directly.
> > if we move meta on master we can remove that extra copy, but that
> > will tight together meta and master making impossible to offload meta, if
> > we need to.
> >
> >
> > In my opinion with the new assignment you have all the main problem
> solved.
> > we can keep regions on master as we have now,
> > so you can configure it to get more performance (avoid the remote rpc).
> > but our design should allow meta to be split and to be hosted somewhere
> > else.
> >
> > Matteo
> >
> >
> > On Fri, Apr 8, 2016 at 2:08 AM, 张铎 <[hidden email]> wrote:
> >
> > > Agree on the performance concerns. IMO we should not hurt the
> performance
> > > of small(maybe normal?) clusters when scaling for huge clusters.
> > > And I also agree that the current implementation which allows Master to
> > > carry system regions is not good(sorry for the chinglish...). At least,
> > it
> > > makes the master startup really complicated.
> > >
> > > So IMO, we should let the master process or master machine to also
> carry
> > > system regions, but in another way. Start another RS instance on the
> same
> > > machine or in the same JVM? Or build a new storage based on the
> procedure
> > > store and convert it to a normal table when it is too large?
> > >
> > > Thanks.
> > >
> > > 2016-04-08 16:42 GMT+08:00 Elliott Clark <[hidden email]>:
> > >
> > > > # Without meta on master, we double assign and lose data.
> > > >
> > > > That is currently a fact that I have seen over and over on multiple
> > > loaded
> > > > clusters. Some abstract clean up of deployment vs losing data is a
> > > > no-brainer for me. Master assignment, region split, region merge are
> > all
> > > > risky, and all places that HBase can lose data. Meta being hosted on
> > the
> > > > master makes communication easier and less flakey. Running ITBLL on a
> > > loop
> > > > that creates a new table every time, and without meta on master
> > > everything
> > > > will fail pretty reliably in ~2 days. With meta on master things pass
> > > MUCH
> > > > more.
> > > >
> > > > # Master hosting the system tables locates the system tables as close
> > as
> > > > possible to the machine that will be mutating the data.
> > > >
> > > > Data locality is something that we all work for. Short circuit local
> > > reads,
> > > > Caching blocks in jvm, etc. Bringing data closer to the interested
> > party
> > > > has a long history of making things faster and better. Master is in
> > > charge
> > > > of just about all mutations of all systems tables. It's in charge of
> > > > changing meta, changing acls, creating new namespaces, etc. So put
> the
> > > > memstore as close as possible to the system that's going to mutate
> > meta.
> > > >
> > > > # If you want to make meta faster then moving it to other
> regionservers
> > > > makes things worse.
> > > >
> > > > Meta can get pretty hot. Putting it with other regions that clients
> > will
> > > be
> > > > trying to access makes everything worse. It means that meta is
> > competing
> > > > with user requests. If meta gets served and other requests don't,
> > causing
> > > > more requests to meta; or requests to user regions get served and
> other
> > > > clients get starved.
> > > > At FB we've seen read throughput to meta doubled or more by swapping
> it
> > > to
> > > > master. Writes to meta are also much faster since there's no rpc hop,
> > no
> > > > queueing, to fighting with reads. So far it has been the single
> biggest
> > > > thing to make meta faster.
> > > >
> > > >
> > > > On Thu, Apr 7, 2016 at 10:11 PM, Stack <[hidden email]> wrote:
> > > >
> > > > > I would like to start a discussion on whether Master should be
> > carrying
> > > > > regions or not. No hurry. I see this thread going on a while and
> what
> > > > with
> > > > > 2.0 being a ways out yet, there is no need to rush to a decision.
> > > > >
> > > > > First, some background.
> > > > >
> > > > > Currently in the master branch, HMaster hosts 'system tables': e.g.
> > > > > hbase:meta. HMaster is doing more than just gardening the cluster,
> > > > > bootstrapping and keeping all up and serving healthy as in
> branch-1;
> > in
> > > > > master branch, it is actually in the write path for the most
> critical
> > > > > system regions.
> > > > >
> > > > > Master is this way because HMaster and HRegionServer servers have
> so
> > > much
> > > > > in common, they should be just one binary, w/ HMaster as any other
> > > server
> > > > > with the HMaster function a minor appendage runnable by any running
> > > > > HRegionServer.
> > > > >
> > > > > I like this idea, but the unification work was just never finished.
> > > What
> > > > is
> > > > > in master branch is a compromise. HMaster is not a RegionServer
> but a
> > > > > sort-of RegionServer doing part serving. So we have HMaster role, a
> > new
> > > > > part-RegionServer-carrying-special-regions role and then a full-on
> > > > > HRegionServer role. We need to fix this messyness. We could revert
> to
> > > > plain
> > > > > branch-1 roles or carrying the
> > > > > HMaster-function-is-something-any-RegionServer-could-execute
> through
> > to
> > > > > completion.
> > > > >
> > > > > More background from a time long-past with good comments by the
> likes
> > > of
> > > > > our Francis Liu and Mighty Matteo Bertozzi are here [1], on
> unifying
> > > > master
> > > > > and meta-serving. Slightly related are old discussions on being
> able
> > to
> > > > > scale by splitting meta with good comments by our Elliott Clark
> [2].
> > > > >
> > > > > Also for consideration, the landscape has since changed. [1] was
> > > written
> > > > > before we had ProcedureV2 available to us where we could record
> > > > > intermediate transition states local to the Master rather than
> remote
> > > as
> > > > > intermediate updates to an hbase:meta over rpc running on another
> > node.
> > > > >
> > > > > Enough on the background.
> > > > >
> > > > > Let me provoke discussion by making the statement that we should
> undo
> > > > > HMaster carrying any regions ever; that the HMaster function is
> work
> > > > enough
> > > > > for a single dedicated server and that it important enough that it
> > > cannot
> > > > > take a background role on a serving RegionServer (I could go back
> > from
> > > > this
> > > > > position if evidence HMaster role could be backgrounded). Notions
> of
> > a
> > > > > Master carrying system tables only are just not on given system
> > tables
> > > > will
> > > > > be too big for a single server especially when hbase:meta is split
> > (so
> > > we
> > > > > can scale). This simple distinction of HMaster and RegionServer
> roles
> > > is
> > > > > also what our users know and have gotten used to so needs to be a
> > good
> > > > > reason to change it (We can still pursue the single binary that can
> > do
> > > > > HMaster or HRegionServer role determined at runtime).
> > > > >
> > > > > Thanks,
> > > > > St.Ack
> > > > >
> > > > > 1.
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1xC-bCzAAKO59Xo3XN-Cl6p-5CM_4DMoR-WpnkmYZgpw/edit#heading=h.j5yqy7n04bkn
> > > > > 2.
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1eCuqf7i2dkWHL0PxcE1HE1nLRQ_tCyXI4JsOB6TAk60/edit#heading=h.80vcerzbkj93
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [DISCUSS] No regions on Master node in 2.0

Elliott Clark-3
On Fri, Apr 8, 2016 at 8:59 AM, Matteo Bertozzi <[hidden email]>
wrote:

> You are still thinking at meta used as a state machine.
> to simplify, meta should just be: region:location
>
> not being able to access meta only means that we can't publish
> to the client the new location of the region.
> when meta will be available, that location will be published.
>
> what you are thinking about for meta on the master is "the state".
> and with proc-v2 we have that state on the master.
>

No, writing to meta is be how we publish the state to clients. That
operation will always be more reliable if we don't have to go over rpc.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [DISCUSS] No regions on Master node in 2.0

Elliott Clark-3
Let me put it this way: Removing meta fixes no issues seen from day to day
operations, but makes worse just about everything that has been an issue on
loaded clusters.

On Fri, Apr 8, 2016 at 9:05 AM, Elliott Clark <[hidden email]> wrote:

>
> On Fri, Apr 8, 2016 at 8:59 AM, Matteo Bertozzi <[hidden email]>
> wrote:
>
>> You are still thinking at meta used as a state machine.
>> to simplify, meta should just be: region:location
>>
>> not being able to access meta only means that we can't publish
>> to the client the new location of the region.
>> when meta will be available, that location will be published.
>>
>> what you are thinking about for meta on the master is "the state".
>> and with proc-v2 we have that state on the master.
>>
>
> No, writing to meta is be how we publish the state to clients. That
> operation will always be more reliable if we don't have to go over rpc.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [DISCUSS] No regions on Master node in 2.0

Andrew Purtell-3
In reply to this post by stack-3
> This simple distinction of HMaster and RegionServer roles is also what our users know and have gotten used to so needs to be a good reason to change it (We can still pursue the single binary that can do HMaster or HRegionServer role determined at runtime).

I have always liked the idea of a single HBase daemon with dynamic role switching. Reducing the number of separate processes to manage will make life easier for operators by degree. We would need to move meta around with the master until we fix issues with remote meta, perhaps that gets tackled as part of splittable meta work, but I don't know who would be doing that. However, most of the complexity is in HDFS (like NN, ZKFC, and QJM as separate daemons when they should be all-in-one IMHO), and our reliance on a ZK quorum adds another daemon type to contemplate if you're coming to HBase without other ZK-dependent services already in production. Collapsing the daemon roles at the HBase layer provides only limited complexity reduction in the big picture. Sadly.

> On Apr 7, 2016, at 10:11 PM, Stack <[hidden email]> wrote:
>
> I would like to start a discussion on whether Master should be carrying
> regions or not. No hurry. I see this thread going on a while and what with
> 2.0 being a ways out yet, there is no need to rush to a decision.
>
> First, some background.
>
> Currently in the master branch, HMaster hosts 'system tables': e.g.
> hbase:meta. HMaster is doing more than just gardening the cluster,
> bootstrapping and keeping all up and serving healthy as in branch-1; in
> master branch, it is actually in the write path for the most critical
> system regions.
>
> Master is this way because HMaster and HRegionServer servers have so much
> in common, they should be just one binary, w/ HMaster as any other server
> with the HMaster function a minor appendage runnable by any running
> HRegionServer.
>
> I like this idea, but the unification work was just never finished. What is
> in master branch is a compromise. HMaster is not a RegionServer but a
> sort-of RegionServer doing part serving. So we have HMaster role, a new
> part-RegionServer-carrying-special-regions role and then a full-on
> HRegionServer role. We need to fix this messyness. We could revert to plain
> branch-1 roles or carrying the
> HMaster-function-is-something-any-RegionServer-could-execute through to
> completion.
>
> More background from a time long-past with good comments by the likes of
> our Francis Liu and Mighty Matteo Bertozzi are here [1], on unifying master
> and meta-serving. Slightly related are old discussions on being able to
> scale by splitting meta with good comments by our Elliott Clark [2].
>
> Also for consideration, the landscape has since changed. [1] was written
> before we had ProcedureV2 available to us where we could record
> intermediate transition states local to the Master rather than remote as
> intermediate updates to an hbase:meta over rpc running on another node.
>
> Enough on the background.
>
> Let me provoke discussion by making the statement that we should undo
> HMaster carrying any regions ever; that the HMaster function is work enough
> for a single dedicated server and that it important enough that it cannot
> take a background role on a serving RegionServer (I could go back from this
> position if evidence HMaster role could be backgrounded). Notions of a
> Master carrying system tables only are just not on given system tables will
> be too big for a single server especially when hbase:meta is split (so we
> can scale). This simple distinction of HMaster and RegionServer roles is
> also what our users know and have gotten used to so needs to be a good
> reason to change it (We can still pursue the single binary that can do
> HMaster or HRegionServer role determined at runtime).
>
> Thanks,
> St.Ack
>
> 1.
> https://docs.google.com/document/d/1xC-bCzAAKO59Xo3XN-Cl6p-5CM_4DMoR-WpnkmYZgpw/edit#heading=h.j5yqy7n04bkn
> 2.
> https://docs.google.com/document/d/1eCuqf7i2dkWHL0PxcE1HE1nLRQ_tCyXI4JsOB6TAk60/edit#heading=h.80vcerzbkj93
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [DISCUSS] No regions on Master node in 2.0

Gary Helmling
In reply to this post by stack-3
Sorry to be late to the party here.  I'll sprinkle my comments over the
thread where they make the most sense.


> Currently in the master branch, HMaster hosts 'system tables': e.g.
> hbase:meta. HMaster is doing more than just gardening the cluster,
> bootstrapping and keeping all up and serving healthy as in branch-1; in
> master branch, it is actually in the write path for the most critical
> system regions.
>
>
I think it's important to point out that this feature exists and is usable
in branch-1 as well, including in all 1.x releases.  It just disabled by
default branch-1 and enabled by default in master. So this is really a
comparison of an existing, shipping, feature that does work, and is being
used vs. ongoing development work in master.


>
> Let me provoke discussion by making the statement that we should undo
> HMaster carrying any regions ever; that the HMaster function is work enough
> for a single dedicated server and that it important enough that it cannot
> take a background role on a serving RegionServer (I could go back from this
> position if evidence HMaster role could be backgrounded). Notions of a
> Master carrying system tables only are just not on given system tables will
> be too big for a single server especially when hbase:meta is split (so we
> can scale).


If we really think that normal master housekeeping functions are work
enough that we shouldn't combine with region serving, then why do we think
that those will _not_ have to be scaled by splitting the metadata space
across multiple servers when we encounter meta-scaling issues that require
splitting meta to distribute it across multiple servers?  If we really want
to scale, then it seems like we need to tackle scaling the region metadata
in general across multiple active masters, in which case meta-on-master is
not really an argument either way.


> This simple distinction of HMaster and RegionServer roles is
> also what our users know and have gotten used to so needs to be a good
> reason to change it (We can still pursue the single binary that can do
> HMaster or HRegionServer role determined at runtime).
>

The distinction in roles in HBase has long been used as a criticism of
HBase's operational complexity.  I think we would be doing our users a
service by simplifying this and making it a detail they do not need to
worry about. If we can truly make this transparent to users and improve
operability at the same time, I think that would be the best outcome.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [DISCUSS] No regions on Master node in 2.0

Gary Helmling
In reply to this post by Matteo Bertozzi
>
> # Without meta on master, we double assign and lose data.
>
> I doubt meta on master solve this problem.
> This has more to do on the fact that balancer, assignment, split, merge
> are disjoint operations that are not aware of each other.
> also those operation in general consist of multiple steps and if the master
> crashes you may end up in an inconsistent state.
>
>
Meta-on-master does dramatically improve things.  For example, it makes it
possible to cold-start HBase under load, where a non-meta-serving master is
never able to successfully complete initialization.  This is the difference
between a cluster being able to come to a healthy state vs. one that is
never able to complete assignments, communicate those assignments to
clients and come to a steady state.


> this is what proc-v2 should solve. since we are aware of each operation
> there is no chance of double assignment and similar by design.
>
>
Again, I think it is difficult to compare an existing feature that is
working in production use vs. one that is actively being developed in
master.

Preventing double assignment sounds great.  When happens when the update of
meta to communicate this to clients fails?  So long as meta is served
elsewhere you still have distributed state.

Until we have an alternative that is feature complete and has demonstrated
success and stability in production use, I don't see how we can even
propose removing a feature that is solving real problems.

I also think that this proposed direction will amplify our release problems
and get us further away from regular, incremental releases.  Master will
remain unreleaseable indefinitely until proc v2 development is finished,
and even initial releases will have problems that need to be ironed out.
Ironing out issues in initial releases is not unexpected, but by removing
existing solution we would be forcing a big-bang approach where everything
has to work before anyone can move over to 2.0, which increases pressure
for users to stay on 1.x releases, which increases pressure to backport
features and brings us closer to the Hadoop way.  I would much rather see
us working on incrementally improving what we have and proving out new
solutions piece by piece.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [DISCUSS] No regions on Master node in 2.0

Francis Liu
Very late to the party....
IMHO having the master doing only gardening and not become a part of the user access path is a good design and something we should stick to. It's good SoC (ie makes gardening tasks more isolated from user workload).
> we double assign and lose data.
Given that meta only has a single writer/manager (aka master) this IMHO is more about having a clean state machine than because of remotely writing to a region. We should be able to remain in a good state in the event of write failures. After all even the writes to the filesystem involve remote writes. 
> Running ITBLL on a loop that creates a new table every time, and without meta on master everything will fail pretty reliably in ~2 days.
This is interesting. I'll give it a try. Just run generator for 2 days? Creating a new table everytime? Do I drop the old one? 
> Short circuit local reads, Caching blocks in jvm, etc. Bringing data closer to the interested party has a long history of making things faster and better.
AFAIK All the metadata that the master needs is already cached in memory during startup. It does not require meta to be on master.
> Master is in charge of just about all mutations of all systems tables.
Locality is not as useful here writes still end up being remote by virtue of hdfs.
> At FB we've seen read throughput to meta doubled or more by swapping it to master. Writes to meta are also much faster since there's no rpc hop, no queueing, to fighting with reads. So far it has been the single biggest thing to make meta faster.
This can be addressed with region server groups. :-) As in this case that's pretty much what you're doing here having a special region server, serve system tables, isolating it from user tables. The upside is you can have more than one "system regionserver" in this case. This is how we do things internally so we've never experienced user region access interfering with meta. 
> For example, it makes it possible to cold-start HBase under load, where a non-meta-serving master is never able to successfully complete initialization.
Is this problem here because meta is affected by user region workloads? If so region server groups should help in this case as well.
> If we really think that normal master housekeeping functions are work enough that we shouldn't combine with region serving, then why do we think that those will _not_ have to be scaled by splitting the metadata space across multiple servers when we encounter meta-scaling issues that require splitting meta to distribute it across multiple servers?  
Based on our tests a single master (without meta) is fine with handling a few million regions the bottlenecks are elsewhere (ie updating meta). 







 



 

    On Tuesday, April 12, 2016 11:55 AM, Gary Helmling <[hidden email]> wrote:
 

 >
> # Without meta on master, we double assign and lose data.
>
> I doubt meta on master solve this problem.
> This has more to do on the fact that balancer, assignment, split, merge
> are disjoint operations that are not aware of each other.
> also those operation in general consist of multiple steps and if the master
> crashes you may end up in an inconsistent state.
>
>
Meta-on-master does dramatically improve things.  For example, it makes it
possible to cold-start HBase under load, where a non-meta-serving master is
never able to successfully complete initialization.  This is the difference
between a cluster being able to come to a healthy state vs. one that is
never able to complete assignments, communicate those assignments to
clients and come to a steady state.


> this is what proc-v2 should solve. since we are aware of each operation
> there is no chance of double assignment and similar by design.
>
>
Again, I think it is difficult to compare an existing feature that is
working in production use vs. one that is actively being developed in
master.

Preventing double assignment sounds great.  When happens when the update of
meta to communicate this to clients fails?  So long as meta is served
elsewhere you still have distributed state.

Until we have an alternative that is feature complete and has demonstrated
success and stability in production use, I don't see how we can even
propose removing a feature that is solving real problems.

I also think that this proposed direction will amplify our release problems
and get us further away from regular, incremental releases.  Master will
remain unreleaseable indefinitely until proc v2 development is finished,
and even initial releases will have problems that need to be ironed out.
Ironing out issues in initial releases is not unexpected, but by removing
existing solution we would be forcing a big-bang approach where everything
has to work before anyone can move over to 2.0, which increases pressure
for users to stay on 1.x releases, which increases pressure to backport
features and brings us closer to the Hadoop way.  I would much rather see
us working on incrementally improving what we have and proving out new
solutions piece by piece.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [DISCUSS] No regions on Master node in 2.0

Elliott Clark-3
On Tue, Apr 19, 2016 at 1:52 PM, Francis Liu <[hidden email]> wrote:

> Locality is not as useful here writes still end up being remote by virtue
> of hdfs.
>

Removing one hop is still useful. It's the same reason that for hdfs writes
the first copy is local.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [DISCUSS] No regions on Master node in 2.0

stack-3
In reply to this post by Elliott Clark-3
On Fri, Apr 8, 2016 at 1:42 AM, Elliott Clark <[hidden email]> wrote:

> # Without meta on master, we double assign and lose data.
>
> That is currently a fact that I have seen over and over on multiple loaded
> clusters. Some abstract clean up of deployment vs losing data is a
> no-brainer for me. Master assignment, region split, region merge are all
> risky, and all places that HBase can lose data. Meta being hosted on the
> master makes communication easier and less flakey. Running ITBLL on a loop
> that creates a new table every time, and without meta on master everything
> will fail pretty reliably in ~2 days. With meta on master things pass MUCH
> more.
>
>
The above is a problem of branch-1?

The discussion is what to do in 2.0 with the assumption that master state
would be done up on procedure v2 making most of the transitions now done
over zk and hbase:meta instead local to the master with only the final
state published to a remote meta (an RPC but if we can't make RPC work
reliably in our distributed system, thats a bigger problem).


> # Master hosting the system tables locates the system tables as close as
> possible to the machine that will be mutating the data.
>
> Data locality is something that we all work for. Short circuit local reads,
> Caching blocks in jvm, etc. Bringing data closer to the interested party
> has a long history of making things faster and better. Master is in charge
> of just about all mutations of all systems tables. It's in charge of
> changing meta, changing acls, creating new namespaces, etc. So put the
> memstore as close as possible to the system that's going to mutate meta.
>


Above is fine except for the bit where we need to be able to field reads.
Lets distribute the data to be read over the cluster rather than treat meta
reads with kid gloves hosted on a 'special' server; let these 'reads' be
like any other read the cluster takes (see next point)



> # If you want to make meta faster then moving it to other regionservers
> makes things worse.
>
> Meta can get pretty hot. Putting it with other regions that clients will be
> trying to access makes everything worse. It means that meta is competing
> with user requests. If meta gets served and other requests don't, causing
> more requests to meta; or requests to user regions get served and other
> clients get starved.
> At FB we've seen read throughput to meta doubled or more by swapping it to
> master. Writes to meta are also much faster since there's no rpc hop, no
> queueing, to fighting with reads. So far it has been the single biggest
> thing to make meta faster.
>
>
Is this just because meta had a dedicated server?

St.Ack


>
> On Thu, Apr 7, 2016 at 10:11 PM, Stack <[hidden email]> wrote:
>
> > I would like to start a discussion on whether Master should be carrying
> > regions or not. No hurry. I see this thread going on a while and what
> with
> > 2.0 being a ways out yet, there is no need to rush to a decision.
> >
> > First, some background.
> >
> > Currently in the master branch, HMaster hosts 'system tables': e.g.
> > hbase:meta. HMaster is doing more than just gardening the cluster,
> > bootstrapping and keeping all up and serving healthy as in branch-1; in
> > master branch, it is actually in the write path for the most critical
> > system regions.
> >
> > Master is this way because HMaster and HRegionServer servers have so much
> > in common, they should be just one binary, w/ HMaster as any other server
> > with the HMaster function a minor appendage runnable by any running
> > HRegionServer.
> >
> > I like this idea, but the unification work was just never finished. What
> is
> > in master branch is a compromise. HMaster is not a RegionServer but a
> > sort-of RegionServer doing part serving. So we have HMaster role, a new
> > part-RegionServer-carrying-special-regions role and then a full-on
> > HRegionServer role. We need to fix this messyness. We could revert to
> plain
> > branch-1 roles or carrying the
> > HMaster-function-is-something-any-RegionServer-could-execute through to
> > completion.
> >
> > More background from a time long-past with good comments by the likes of
> > our Francis Liu and Mighty Matteo Bertozzi are here [1], on unifying
> master
> > and meta-serving. Slightly related are old discussions on being able to
> > scale by splitting meta with good comments by our Elliott Clark [2].
> >
> > Also for consideration, the landscape has since changed. [1] was written
> > before we had ProcedureV2 available to us where we could record
> > intermediate transition states local to the Master rather than remote as
> > intermediate updates to an hbase:meta over rpc running on another node.
> >
> > Enough on the background.
> >
> > Let me provoke discussion by making the statement that we should undo
> > HMaster carrying any regions ever; that the HMaster function is work
> enough
> > for a single dedicated server and that it important enough that it cannot
> > take a background role on a serving RegionServer (I could go back from
> this
> > position if evidence HMaster role could be backgrounded). Notions of a
> > Master carrying system tables only are just not on given system tables
> will
> > be too big for a single server especially when hbase:meta is split (so we
> > can scale). This simple distinction of HMaster and RegionServer roles is
> > also what our users know and have gotten used to so needs to be a good
> > reason to change it (We can still pursue the single binary that can do
> > HMaster or HRegionServer role determined at runtime).
> >
> > Thanks,
> > St.Ack
> >
> > 1.
> >
> >
> https://docs.google.com/document/d/1xC-bCzAAKO59Xo3XN-Cl6p-5CM_4DMoR-WpnkmYZgpw/edit#heading=h.j5yqy7n04bkn
> > 2.
> >
> >
> https://docs.google.com/document/d/1eCuqf7i2dkWHL0PxcE1HE1nLRQ_tCyXI4JsOB6TAk60/edit#heading=h.80vcerzbkj93
> >
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [DISCUSS] No regions on Master node in 2.0

stack-3
In reply to this post by Matteo Bertozzi
On Fri, Apr 8, 2016 at 8:43 AM, Matteo Bertozzi <[hidden email]>
wrote:

> ...
> if we hard code meta, we need a special logic to load it and from there
> start the bootstrap of the other regions.
> then there is no way to switch to multiple metas if someone wants that,
> unless we keep two code path and one of that will be proc-v2.
> so at that point we should just keep a single code path that does both.
>
>
Yes. Lets not have two code paths if we can avoid it.
St.Ack


>
> On Fri, Apr 8, 2016 at 8:27 AM, Jimmy Xiang <[hidden email]> wrote:
>
> > One thing I'd like to say is that it makes the master startup much
> > more simpler and realiable to put system tables on master.
> >
> > Even if proc-v2 can solve the problem, it makes things complicated,
> > right? I prefer to be sure that meta is always available, in a
> > consistent state.
> >
> > If we really need to split meta, we should have an option for most
> > users to have just one meta region, and keep it on master.
> >
> >
> > On Fri, Apr 8, 2016 at 8:03 AM, Matteo Bertozzi <[hidden email]
> >
> > wrote:
> > > # Without meta on master, we double assign and lose data.
> > >
> > > I doubt meta on master solve this problem.
> > > This has more to do on the fact that balancer, assignment, split, merge
> > > are disjoint operations that are not aware of each other.
> > > also those operation in general consist of multiple steps and if the
> > master
> > > crashes you may end up in an inconsistent state.
> > >
> > > this is what proc-v2 should solve. since we are aware of each operation
> > > there is no chance of double assignment and similar by design.
> > >
> > > The master doesn't need the full meta to operate properly
> > > it just need the "state" (at which point of the operation am I).
> > > which is the wal of proc-v2. given that we can split meta or meta
> > > remote without any problem. since we only have 1 update to meta to
> > > update the location when the assignment is completed.
> > >
> > > also at the moment the master has a copy of the information in meta.
> > > a map with the RegionInfo, state and locations. but we are still doing
> > > a query on meta instead of using that local map directly.
> > > if we move meta on master we can remove that extra copy, but that
> > > will tight together meta and master making impossible to offload meta,
> if
> > > we need to.
> > >
> > >
> > > In my opinion with the new assignment you have all the main problem
> > solved.
> > > we can keep regions on master as we have now,
> > > so you can configure it to get more performance (avoid the remote rpc).
> > > but our design should allow meta to be split and to be hosted somewhere
> > > else.
> > >
> > > Matteo
> > >
> > >
> > > On Fri, Apr 8, 2016 at 2:08 AM, 张铎 <[hidden email]> wrote:
> > >
> > >> Agree on the performance concerns. IMO we should not hurt the
> > performance
> > >> of small(maybe normal?) clusters when scaling for huge clusters.
> > >> And I also agree that the current implementation which allows Master
> to
> > >> carry system regions is not good(sorry for the chinglish...). At
> least,
> > it
> > >> makes the master startup really complicated.
> > >>
> > >> So IMO, we should let the master process or master machine to also
> carry
> > >> system regions, but in another way. Start another RS instance on the
> > same
> > >> machine or in the same JVM? Or build a new storage based on the
> > procedure
> > >> store and convert it to a normal table when it is too large?
> > >>
> > >> Thanks.
> > >>
> > >> 2016-04-08 16:42 GMT+08:00 Elliott Clark <[hidden email]>:
> > >>
> > >> > # Without meta on master, we double assign and lose data.
> > >> >
> > >> > That is currently a fact that I have seen over and over on multiple
> > >> loaded
> > >> > clusters. Some abstract clean up of deployment vs losing data is a
> > >> > no-brainer for me. Master assignment, region split, region merge are
> > all
> > >> > risky, and all places that HBase can lose data. Meta being hosted on
> > the
> > >> > master makes communication easier and less flakey. Running ITBLL on
> a
> > >> loop
> > >> > that creates a new table every time, and without meta on master
> > >> everything
> > >> > will fail pretty reliably in ~2 days. With meta on master things
> pass
> > >> MUCH
> > >> > more.
> > >> >
> > >> > # Master hosting the system tables locates the system tables as
> close
> > as
> > >> > possible to the machine that will be mutating the data.
> > >> >
> > >> > Data locality is something that we all work for. Short circuit local
> > >> reads,
> > >> > Caching blocks in jvm, etc. Bringing data closer to the interested
> > party
> > >> > has a long history of making things faster and better. Master is in
> > >> charge
> > >> > of just about all mutations of all systems tables. It's in charge of
> > >> > changing meta, changing acls, creating new namespaces, etc. So put
> the
> > >> > memstore as close as possible to the system that's going to mutate
> > meta.
> > >> >
> > >> > # If you want to make meta faster then moving it to other
> > regionservers
> > >> > makes things worse.
> > >> >
> > >> > Meta can get pretty hot. Putting it with other regions that clients
> > will
> > >> be
> > >> > trying to access makes everything worse. It means that meta is
> > competing
> > >> > with user requests. If meta gets served and other requests don't,
> > causing
> > >> > more requests to meta; or requests to user regions get served and
> > other
> > >> > clients get starved.
> > >> > At FB we've seen read throughput to meta doubled or more by swapping
> > it
> > >> to
> > >> > master. Writes to meta are also much faster since there's no rpc
> hop,
> > no
> > >> > queueing, to fighting with reads. So far it has been the single
> > biggest
> > >> > thing to make meta faster.
> > >> >
> > >> >
> > >> > On Thu, Apr 7, 2016 at 10:11 PM, Stack <[hidden email]> wrote:
> > >> >
> > >> > > I would like to start a discussion on whether Master should be
> > carrying
> > >> > > regions or not. No hurry. I see this thread going on a while and
> > what
> > >> > with
> > >> > > 2.0 being a ways out yet, there is no need to rush to a decision.
> > >> > >
> > >> > > First, some background.
> > >> > >
> > >> > > Currently in the master branch, HMaster hosts 'system tables':
> e.g.
> > >> > > hbase:meta. HMaster is doing more than just gardening the cluster,
> > >> > > bootstrapping and keeping all up and serving healthy as in
> > branch-1; in
> > >> > > master branch, it is actually in the write path for the most
> > critical
> > >> > > system regions.
> > >> > >
> > >> > > Master is this way because HMaster and HRegionServer servers have
> so
> > >> much
> > >> > > in common, they should be just one binary, w/ HMaster as any other
> > >> server
> > >> > > with the HMaster function a minor appendage runnable by any
> running
> > >> > > HRegionServer.
> > >> > >
> > >> > > I like this idea, but the unification work was just never
> finished.
> > >> What
> > >> > is
> > >> > > in master branch is a compromise. HMaster is not a RegionServer
> but
> > a
> > >> > > sort-of RegionServer doing part serving. So we have HMaster role,
> a
> > new
> > >> > > part-RegionServer-carrying-special-regions role and then a full-on
> > >> > > HRegionServer role. We need to fix this messyness. We could revert
> > to
> > >> > plain
> > >> > > branch-1 roles or carrying the
> > >> > > HMaster-function-is-something-any-RegionServer-could-execute
> > through to
> > >> > > completion.
> > >> > >
> > >> > > More background from a time long-past with good comments by the
> > likes
> > >> of
> > >> > > our Francis Liu and Mighty Matteo Bertozzi are here [1], on
> unifying
> > >> > master
> > >> > > and meta-serving. Slightly related are old discussions on being
> > able to
> > >> > > scale by splitting meta with good comments by our Elliott Clark
> [2].
> > >> > >
> > >> > > Also for consideration, the landscape has since changed. [1] was
> > >> written
> > >> > > before we had ProcedureV2 available to us where we could record
> > >> > > intermediate transition states local to the Master rather than
> > remote
> > >> as
> > >> > > intermediate updates to an hbase:meta over rpc running on another
> > node.
> > >> > >
> > >> > > Enough on the background.
> > >> > >
> > >> > > Let me provoke discussion by making the statement that we should
> > undo
> > >> > > HMaster carrying any regions ever; that the HMaster function is
> work
> > >> > enough
> > >> > > for a single dedicated server and that it important enough that it
> > >> cannot
> > >> > > take a background role on a serving RegionServer (I could go back
> > from
> > >> > this
> > >> > > position if evidence HMaster role could be backgrounded). Notions
> > of a
> > >> > > Master carrying system tables only are just not on given system
> > tables
> > >> > will
> > >> > > be too big for a single server especially when hbase:meta is split
> > (so
> > >> we
> > >> > > can scale). This simple distinction of HMaster and RegionServer
> > roles
> > >> is
> > >> > > also what our users know and have gotten used to so needs to be a
> > good
> > >> > > reason to change it (We can still pursue the single binary that
> can
> > do
> > >> > > HMaster or HRegionServer role determined at runtime).
> > >> > >
> > >> > > Thanks,
> > >> > > St.Ack
> > >> > >
> > >> > > 1.
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> https://docs.google.com/document/d/1xC-bCzAAKO59Xo3XN-Cl6p-5CM_4DMoR-WpnkmYZgpw/edit#heading=h.j5yqy7n04bkn
> > >> > > 2.
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> https://docs.google.com/document/d/1eCuqf7i2dkWHL0PxcE1HE1nLRQ_tCyXI4JsOB6TAk60/edit#heading=h.80vcerzbkj93
> > >> > >
> > >> >
> > >>
> >
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [DISCUSS] No regions on Master node in 2.0

stack-3
In reply to this post by Gary Helmling
On Tue, Apr 12, 2016 at 11:22 AM, Gary Helmling <[hidden email]> wrote:

> ...
>
> > Currently in the master branch, HMaster hosts 'system tables': e.g.
> > hbase:meta. HMaster is doing more than just gardening the cluster,
> > bootstrapping and keeping all up and serving healthy as in branch-1; in
> > master branch, it is actually in the write path for the most critical
> > system regions.
> >
> >
> I think it's important to point out that this feature exists and is usable
> in branch-1 as well, including in all 1.x releases.  It just disabled by
> default branch-1 and enabled by default in master. So this is really a
> comparison of an existing, shipping, feature that does work, and is being
> used vs. ongoing development work in master.
>
>
I did not realize this facility was being used in branch-1 or even that it
worked well enough to be deployed to production in branch-1.


>
> >
> > Let me provoke discussion by making the statement that we should undo
> > HMaster carrying any regions ever; that the HMaster function is work
> enough
> > for a single dedicated server and that it important enough that it cannot
> > take a background role on a serving RegionServer (I could go back from
> this
> > position if evidence HMaster role could be backgrounded). Notions of a
> > Master carrying system tables only are just not on given system tables
> will
> > be too big for a single server especially when hbase:meta is split (so we
> > can scale).
>
>
> If we really think that normal master housekeeping functions are work
> enough that we shouldn't combine with region serving, then why do we think
> that those will _not_ have to be scaled by splitting the metadata space
> across multiple servers when we encounter meta-scaling issues that require
> splitting meta to distribute it across multiple servers?


Master meta functions may one day grow such that they are more than one
server can manage. Chatting w/ folks who have run a system that is like
hbase's (smile) that is run at scale, rather than split the master
function, they took the time to make the master more efficient so they
didn't have to distribute its duties.

We can split hbase:meta and distribute it around the cluster if an
hbase:meta region can be served like any other region in the system.




>   If we really want
> to scale, then it seems like we need to tackle scaling the region metadata
> in general across multiple active masters, in which case meta-on-master is
> not really an argument either way.
>
>
Distributing the metadata function amongst a cluster of  masters has come
up before. But before we go there, a single master that does metadata
function only rather than metadata function AND fielding all metadata reads
will be able to do more metadata ops if the hbase:meta reads are done
elsewhere. Rough experiments with more to follow show that this should get
us to our next scaling target, 1M regions on a cluster.


>
> > This simple distinction of HMaster and RegionServer roles is
> > also what our users know and have gotten used to so needs to be a good
> > reason to change it (We can still pursue the single binary that can do
> > HMaster or HRegionServer role determined at runtime).
> >
>
> The distinction in roles in HBase has long been used as a criticism of
> HBase's operational complexity.  I think we would be doing our users a
> service by simplifying this and making it a detail they do not need to
> worry about. If we can truly make this transparent to users and improve
> operability at the same time, I think that would be the best outcome.
>

I could go this route of the floating master after we'd done some work.
First lets figure out this current state of hbase master branch where we
have an inbetweenie, a master that is sort-of-a-regionserver carrying only
system tables and thereby getting in the way of our being able to scale
metadata function.

St.Ack
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [DISCUSS] No regions on Master node in 2.0

Gary Helmling
In reply to this post by stack-3
On Mon, Apr 25, 2016 at 11:20 AM Stack <[hidden email]> wrote:

> On Fri, Apr 8, 2016 at 1:42 AM, Elliott Clark <[hidden email]> wrote:
>
> > # Without meta on master, we double assign and lose data.
> >
> > That is currently a fact that I have seen over and over on multiple
> loaded
> > clusters. Some abstract clean up of deployment vs losing data is a
> > no-brainer for me. Master assignment, region split, region merge are all
> > risky, and all places that HBase can lose data. Meta being hosted on the
> > master makes communication easier and less flakey. Running ITBLL on a
> loop
> > that creates a new table every time, and without meta on master
> everything
> > will fail pretty reliably in ~2 days. With meta on master things pass
> MUCH
> > more.
> >
> >
> The above is a problem of branch-1?
>
> The discussion is what to do in 2.0 with the assumption that master state
> would be done up on procedure v2 making most of the transitions now done
> over zk and hbase:meta instead local to the master with only the final
> state published to a remote meta (an RPC but if we can't make RPC work
> reliably in our distributed system, thats a bigger problem).
>
>
But making RPC work for assignment here is precisely the problem.  There's
no reason master should have to contend with user requests to meta in order
to be able to make updates.  And until clients can actually see the change,
it doesn't really matter if the master state has been updated or not.

Sure, we could add more RPC priorities, even more handler pools and
additional queues for master requests to meta vs. user requests to meta.
Maybe with that plus adding in regionserver groups we actually start to
have something that comes close to what we already have today with meta on
master.  But why should we have to add all that complexity?  None of this
is an issue if master updates to meta are local and don't have to go
through RPC.


>
> > # Master hosting the system tables locates the system tables as close as
> > possible to the machine that will be mutating the data.
> >
> > Data locality is something that we all work for. Short circuit local
> reads,
> > Caching blocks in jvm, etc. Bringing data closer to the interested party
> > has a long history of making things faster and better. Master is in
> charge
> > of just about all mutations of all systems tables. It's in charge of
> > changing meta, changing acls, creating new namespaces, etc. So put the
> > memstore as close as possible to the system that's going to mutate meta.
> >
>
>
> Above is fine except for the bit where we need to be able to field reads.
> Lets distribute the data to be read over the cluster rather than treat meta
> reads with kid gloves hosted on a 'special' server; let these 'reads' be
> like any other read the cluster takes (see next point)
>
>
In my opinion, the real "special" part here is the master bit -- which I
think we should be working to make less special and more just a normal bit
of housekeeping spread across nodes -- not the regionserver role.  It only
looks special right now because the evolution has stopped in the middle.  I
really don't think enshrining master as a separate process is the right way
forward for us.


>
> > # If you want to make meta faster then moving it to other regionservers
> > makes things worse.
> >
> > Meta can get pretty hot. Putting it with other regions that clients will
> be
> > trying to access makes everything worse. It means that meta is competing
> > with user requests. If meta gets served and other requests don't, causing
> > more requests to meta; or requests to user regions get served and other
> > clients get starved.
> > At FB we've seen read throughput to meta doubled or more by swapping it
> to
> > master. Writes to meta are also much faster since there's no rpc hop, no
> > queueing, to fighting with reads. So far it has been the single biggest
> > thing to make meta faster.
> >
> >
> Is this just because meta had a dedicated server?
>
>
I'm sure that having dedicated resources for meta helps.  But I don't think
that's sufficient.  The key is that master writes to meta are local, and do
not have to contend with the user requests to meta.

It seems premature to be discussing dropping a working implementation which
eliminates painful parts of distributed consensus, until we have a complete
working alternative to evaluate.  Until then, why are we looking at
features that are in use and work well?



>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [DISCUSS] No regions on Master node in 2.0

stack-3
(Reviving an old thread that needs resolving before 2.0.0. Does Master
carry regions in hbase-2.0.0 or not? A strong argument by one of our
biggest users is made below that master hosting hbase:meta can be more
robust when updates are local and that we can up the throughput of meta
operations if hbae:meta is exclusively hosted by master.)

On Mon, Apr 25, 2016 at 12:35 PM, Gary Helmling <[hidden email]> wrote:

> On Mon, Apr 25, 2016 at 11:20 AM Stack <[hidden email]> wrote:
>
> > On Fri, Apr 8, 2016 at 1:42 AM, Elliott Clark <[hidden email]> wrote:
> >
> > > # Without meta on master, we double assign and lose data.
> > >
> > > That is currently a fact that I have seen over and over on multiple
> > loaded
> > > clusters. Some abstract clean up of deployment vs losing data is a
> > > no-brainer for me. Master assignment, region split, region merge are
> all
> > > risky, and all places that HBase can lose data. Meta being hosted on
> the
> > > master makes communication easier and less flakey. Running ITBLL on a
> > loop
> > > that creates a new table every time, and without meta on master
> > everything
> > > will fail pretty reliably in ~2 days. With meta on master things pass
> > MUCH
> > > more.
> > >
> > >
>

Only answer to the above observation is demonstration that ITBLL with meta
not on master is as robust as runs that have master carrying meta.



> > The discussion is what to do in 2.0 with the assumption that master state
> > would be done up on procedure v2 making most of the transitions now done
> > over zk and hbase:meta instead local to the master with only the final
> > state published to a remote meta (an RPC but if we can't make RPC work
> > reliably in our distributed system, thats a bigger problem).
> >

>
> But making RPC work for assignment here is precisely the problem.  There's
> no reason master should have to contend with user requests to meta in order
> to be able to make updates.  And until clients can actually see the change,
> it doesn't really matter if the master state has been updated or not.
>
>
In hbase-2.0.0, there'll be a new regime. hbase:meta writing will be single
writer by master only. No more contention on writes. Regards contention
reading, this is unavoidable.

In hbase-2.0.0, only the final publishing step, what we want clients to
see, will update hbase:meta. All other transitions will be internal.


> Sure, we could add more RPC priorities, even more handler pools and
> additional queues for master requests to meta vs. user requests to meta.
> Maybe with that plus adding in regionserver groups we actually start to
> have something that comes close to what we already have today with meta on
> master.  But why should we have to add all that complexity?  None of this
> is an issue if master updates to meta are local and don't have to go
> through RPC.
>
>
(Old args)  A single server carrying meta doesn't scale, etc.

New observation is that there has been no work carrying home our recasting
of our deploy format such that master now is inline with read/writes and
exclusive host of hbase:meta region.


> > # Master hosting the system tables locates the system tables as close as
> > > possible to the machine that will be mutating the data.
> > >
> > > Data locality is something that we all work for. Short circuit local
> > reads,
> > > Caching blocks in jvm, etc. Bringing data closer to the interested
> party
> > > has a long history of making things faster and better. Master is in
> > charge
> > > of just about all mutations of all systems tables. It's in charge of
> > > changing meta, changing acls, creating new namespaces, etc. So put the
> > > memstore as close as possible to the system that's going to mutate
> meta.
> > >
> >
> >
> > Above is fine except for the bit where we need to be able to field reads.
> > Lets distribute the data to be read over the cluster rather than treat
> meta
> > reads with kid gloves hosted on a 'special' server; let these 'reads' be
> > like any other read the cluster takes (see next point)
> >
> >
> In my opinion, the real "special" part here is the master bit -- which I
> think we should be working to make less special and more just a normal bit
> of housekeeping spread across nodes -- not the regionserver role.  It only
> looks special right now because the evolution has stopped in the middle.  I
> really don't think enshrining master as a separate process is the right way
> forward for us.
>
>
I always liked this notion.

To be worked out is how Master and hbase:meta hosting would interplay (The
RS that is designated Master would also host hbase:meta? Would it be
exclusively hosting hbase;meta or hbase:meta would move with Master
function.... [Stuff we've talked about before]).



>
> >
> > > # If you want to make meta faster then moving it to other regionservers
> > > makes things worse.
> > >
> > > Meta can get pretty hot. Putting it with other regions that clients
> will
> > be
> > > trying to access makes everything worse. It means that meta is
> competing
> > > with user requests. If meta gets served and other requests don't,
> causing
> > > more requests to meta; or requests to user regions get served and other
> > > clients get starved.
> > > At FB we've seen read throughput to meta doubled or more by swapping it
> > to
> > > master. Writes to meta are also much faster since there's no rpc hop,
> no
> > > queueing, to fighting with reads. So far it has been the single biggest
> > > thing to make meta faster.
> > >
> > >
> > Is this just because meta had a dedicated server?
> >
> >
> I'm sure that having dedicated resources for meta helps.  But I don't think
> that's sufficient.  The key is that master writes to meta are local, and do
> not have to contend with the user requests to meta.
>
> It seems premature to be discussing dropping a working implementation which
> eliminates painful parts of distributed consensus, until we have a complete
> working alternative to evaluate.  Until then, why are we looking at
> features that are in use and work well?
>
>
>
How to move forward here? The Pv2 master is almost done. An ITBLL bakeoff
of new Pv2 based assign vs a Master that exclusively hosts hbase:meta?

St.Ack



>
> >
>
123
Loading...