Regarding Connection Pooling

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Regarding Connection Pooling

Sachin Jain
Hi,

I was going through connections in hbase. Here is reference from
ConnectionFactory API doc.

>  Connection encapsulates all housekeeping for a connection to the
cluster. All tables and interfaces created from returned connection share
zookeeper connection, meta cache, and connections to region servers and
masters.

Suppose I am building a REST api and trying to retrieve data from Hbase in
REST calls. I am thinking to pre-create a connection and use it among
different request threads.

Suppose I get multiple requests for keys within same region, will that
single connection be able to serve multiple requests via same region server
?

Or Are those requests handled serially because once a request is made to
region server for key1,
another requests for key2,..,keyN have to wait for request of key1 to
complete.

Even if I create a connection pool of pre-created connections of N size,
does that mean I can serve only N parallel requests if all those requests
have to deal with same hbase region server. Is this true ?

[0]:
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/ConnectionFactory.html

Thanks
-Sachin
Reply | Threaded
Open this post in threaded view
|

Re: Regarding Connection Pooling

Allan Yang-2
Connection is thread safe. You can use it across different threads. And
requests made by different thread are handled in parallel no matter the
keys are in the same region or not.

2017-06-12 20:44 GMT+08:00 Sachin Jain <[hidden email]>:

> Hi,
>
> I was going through connections in hbase. Here is reference from
> ConnectionFactory API doc.
>
> >  Connection encapsulates all housekeeping for a connection to the
> cluster. All tables and interfaces created from returned connection share
> zookeeper connection, meta cache, and connections to region servers and
> masters.
>
> Suppose I am building a REST api and trying to retrieve data from Hbase in
> REST calls. I am thinking to pre-create a connection and use it among
> different request threads.
>
> Suppose I get multiple requests for keys within same region, will that
> single connection be able to serve multiple requests via same region server
> ?
>
> Or Are those requests handled serially because once a request is made to
> region server for key1,
> another requests for key2,..,keyN have to wait for request of key1 to
> complete.
>
> Even if I create a connection pool of pre-created connections of N size,
> does that mean I can serve only N parallel requests if all those requests
> have to deal with same hbase region server. Is this true ?
>
> [0]:
> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/
> ConnectionFactory.html
>
> Thanks
> -Sachin
>
Reply | Threaded
Open this post in threaded view
|

Re: Regarding Connection Pooling

Sachin Jain
I meant to ask since connection object has predefined connections to region
servers that means there is a Socket based connection already open with
some region server R1. When a Hbase client has to make two or more  get
requests to region server R1 how does that work with same connection with
R1.

On 12-Jun-2017 7:31 PM, "Allan Yang" <[hidden email]> wrote:

Connection is thread safe. You can use it across different threads. And
requests made by different thread are handled in parallel no matter the
keys are in the same region or not.

2017-06-12 20:44 GMT+08:00 Sachin Jain <[hidden email]>:

> Hi,
>
> I was going through connections in hbase. Here is reference from
> ConnectionFactory API doc.
>
> >  Connection encapsulates all housekeeping for a connection to the
> cluster. All tables and interfaces created from returned connection share
> zookeeper connection, meta cache, and connections to region servers and
> masters.
>
> Suppose I am building a REST api and trying to retrieve data from Hbase in
> REST calls. I am thinking to pre-create a connection and use it among
> different request threads.
>
> Suppose I get multiple requests for keys within same region, will that
> single connection be able to serve multiple requests via same region
server

> ?
>
> Or Are those requests handled serially because once a request is made to
> region server for key1,
> another requests for key2,..,keyN have to wait for request of key1 to
> complete.
>
> Even if I create a connection pool of pre-created connections of N size,
> does that mean I can serve only N parallel requests if all those requests
> have to deal with same hbase region server. Is this true ?
>
> [0]:
> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/
> ConnectionFactory.html
>
> Thanks
> -Sachin
>
Reply | Threaded
Open this post in threaded view
|

Re: Regarding Connection Pooling

Allan Yang-2
Which HBase version are you using?(I'm assuming you are using the original
blocking client, the new netty client only available in 2.0 ) Yes, in
default, there is only one socket to each RS, and the calls written to this
socket are synchronized(or queued using another thread called CallSender ).
But usually, this won't become a bottleneck. If this is a problem for you,
you can tune "hbase.client.ipc.pool.size".

2017-06-12 23:47 GMT+08:00 Sachin Jain <[hidden email]>:

> I meant to ask since connection object has predefined connections to region
> servers that means there is a Socket based connection already open with
> some region server R1. When a Hbase client has to make two or more  get
> requests to region server R1 how does that work with same connection with
> R1.
>
> On 12-Jun-2017 7:31 PM, "Allan Yang" <[hidden email]> wrote:
>
> Connection is thread safe. You can use it across different threads. And
> requests made by different thread are handled in parallel no matter the
> keys are in the same region or not.
>
> 2017-06-12 20:44 GMT+08:00 Sachin Jain <[hidden email]>:
>
> > Hi,
> >
> > I was going through connections in hbase. Here is reference from
> > ConnectionFactory API doc.
> >
> > >  Connection encapsulates all housekeeping for a connection to the
> > cluster. All tables and interfaces created from returned connection share
> > zookeeper connection, meta cache, and connections to region servers and
> > masters.
> >
> > Suppose I am building a REST api and trying to retrieve data from Hbase
> in
> > REST calls. I am thinking to pre-create a connection and use it among
> > different request threads.
> >
> > Suppose I get multiple requests for keys within same region, will that
> > single connection be able to serve multiple requests via same region
> server
> > ?
> >
> > Or Are those requests handled serially because once a request is made to
> > region server for key1,
> > another requests for key2,..,keyN have to wait for request of key1 to
> > complete.
> >
> > Even if I create a connection pool of pre-created connections of N size,
> > does that mean I can serve only N parallel requests if all those requests
> > have to deal with same hbase region server. Is this true ?
> >
> > [0]:
> > https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/
> > ConnectionFactory.html
> >
> > Thanks
> > -Sachin
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Regarding Connection Pooling

Sachin Jain
Thanks Allan,

This is what I understood initially that further calls will be serial if a
request is already pending on some RS. I am running hbase 1.3.1
Is "hbase.client.ipc.pool.size" still valid ? I thought it was valid for
older versions of hbase when hbase used to provide connection pools but it
does not provide.

Is it right ?

On Tue, Jun 13, 2017 at 8:14 AM, Allan Yang <[hidden email]> wrote:

> Which HBase version are you using?(I'm assuming you are using the original
> blocking client, the new netty client only available in 2.0 ) Yes, in
> default, there is only one socket to each RS, and the calls written to this
> socket are synchronized(or queued using another thread called CallSender ).
> But usually, this won't become a bottleneck. If this is a problem for you,
> you can tune "hbase.client.ipc.pool.size".
>
> 2017-06-12 23:47 GMT+08:00 Sachin Jain <[hidden email]>:
>
> > I meant to ask since connection object has predefined connections to
> region
> > servers that means there is a Socket based connection already open with
> > some region server R1. When a Hbase client has to make two or more  get
> > requests to region server R1 how does that work with same connection with
> > R1.
> >
> > On 12-Jun-2017 7:31 PM, "Allan Yang" <[hidden email]> wrote:
> >
> > Connection is thread safe. You can use it across different threads. And
> > requests made by different thread are handled in parallel no matter the
> > keys are in the same region or not.
> >
> > 2017-06-12 20:44 GMT+08:00 Sachin Jain <[hidden email]>:
> >
> > > Hi,
> > >
> > > I was going through connections in hbase. Here is reference from
> > > ConnectionFactory API doc.
> > >
> > > >  Connection encapsulates all housekeeping for a connection to the
> > > cluster. All tables and interfaces created from returned connection
> share
> > > zookeeper connection, meta cache, and connections to region servers and
> > > masters.
> > >
> > > Suppose I am building a REST api and trying to retrieve data from Hbase
> > in
> > > REST calls. I am thinking to pre-create a connection and use it among
> > > different request threads.
> > >
> > > Suppose I get multiple requests for keys within same region, will that
> > > single connection be able to serve multiple requests via same region
> > server
> > > ?
> > >
> > > Or Are those requests handled serially because once a request is made
> to
> > > region server for key1,
> > > another requests for key2,..,keyN have to wait for request of key1 to
> > > complete.
> > >
> > > Even if I create a connection pool of pre-created connections of N
> size,
> > > does that mean I can serve only N parallel requests if all those
> requests
> > > have to deal with same hbase region server. Is this true ?
> > >
> > > [0]:
> > > https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/
> > > ConnectionFactory.html
> > >
> > > Thanks
> > > -Sachin
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Regarding Connection Pooling

Jerry He
 At the high level, you have the Connection (which is HConnection) you
obtained from ConnectionFactory API .

As you mentioned, you create one such Connection.

Internally within that Connection, there are physical RPC/socket
connections to the different region servers.
By default, one physical connection is opened to one remote region server
simplistically speaking.
("hbase.client.ipc.pool.size" can help here to have multiple low level
connections.)
RPC calls/requests to the same server are multiplexed to go through this
one physical connection.
It is not 'serial' however.  i.e. requests and responses are interleaved
(tracked by call ids).
On the server side, the requests are processed by the handlers concurrently
as they arrive.

Hope this helps clarify it.

Jerry




On Mon, Jun 12, 2017 at 9:35 PM, Sachin Jain <[hidden email]>
wrote:

> Thanks Allan,
>
> This is what I understood initially that further calls will be serial if a
> request is already pending on some RS. I am running hbase 1.3.1
> Is "hbase.client.ipc.pool.size" still valid ? I thought it was valid for
> older versions of hbase when hbase used to provide connection pools but it
> does not provide.
>
> Is it right ?
>
> On Tue, Jun 13, 2017 at 8:14 AM, Allan Yang <[hidden email]> wrote:
>
> > Which HBase version are you using?(I'm assuming you are using the
> original
> > blocking client, the new netty client only available in 2.0 ) Yes, in
> > default, there is only one socket to each RS, and the calls written to
> this
> > socket are synchronized(or queued using another thread called CallSender
> ).
> > But usually, this won't become a bottleneck. If this is a problem for
> you,
> > you can tune "hbase.client.ipc.pool.size".
> >
> > 2017-06-12 23:47 GMT+08:00 Sachin Jain <[hidden email]>:
> >
> > > I meant to ask since connection object has predefined connections to
> > region
> > > servers that means there is a Socket based connection already open with
> > > some region server R1. When a Hbase client has to make two or more  get
> > > requests to region server R1 how does that work with same connection
> with
> > > R1.
> > >
> > > On 12-Jun-2017 7:31 PM, "Allan Yang" <[hidden email]> wrote:
> > >
> > > Connection is thread safe. You can use it across different threads. And
> > > requests made by different thread are handled in parallel no matter the
> > > keys are in the same region or not.
> > >
> > > 2017-06-12 20:44 GMT+08:00 Sachin Jain <[hidden email]>:
> > >
> > > > Hi,
> > > >
> > > > I was going through connections in hbase. Here is reference from
> > > > ConnectionFactory API doc.
> > > >
> > > > >  Connection encapsulates all housekeeping for a connection to the
> > > > cluster. All tables and interfaces created from returned connection
> > share
> > > > zookeeper connection, meta cache, and connections to region servers
> and
> > > > masters.
> > > >
> > > > Suppose I am building a REST api and trying to retrieve data from
> Hbase
> > > in
> > > > REST calls. I am thinking to pre-create a connection and use it among
> > > > different request threads.
> > > >
> > > > Suppose I get multiple requests for keys within same region, will
> that
> > > > single connection be able to serve multiple requests via same region
> > > server
> > > > ?
> > > >
> > > > Or Are those requests handled serially because once a request is made
> > to
> > > > region server for key1,
> > > > another requests for key2,..,keyN have to wait for request of key1 to
> > > > complete.
> > > >
> > > > Even if I create a connection pool of pre-created connections of N
> > size,
> > > > does that mean I can serve only N parallel requests if all those
> > requests
> > > > have to deal with same hbase region server. Is this true ?
> > > >
> > > > [0]:
> > > > https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/
> > > > ConnectionFactory.html
> > > >
> > > > Thanks
> > > > -Sachin
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Regarding Connection Pooling

Sachin Jain
Since I am on Hbase 1.3.1 I guess HConnection is deprecated by 1.0
therefore I cannot use hbase.client.ipc.pool.size configuration to increase
the number of connections from  client to individual region servers.

I think the best bet is to create multiple connections on client and serve
requests from that connection pool.

Suggestions?



On 15-Jun-2017 10:34 AM, "Jerry He" <[hidden email]> wrote:

 At the high level, you have the Connection (which is HConnection) you
obtained from ConnectionFactory API .

As you mentioned, you create one such Connection.

Internally within that Connection, there are physical RPC/socket
connections to the different region servers.
By default, one physical connection is opened to one remote region server
simplistically speaking.
("hbase.client.ipc.pool.size" can help here to have multiple low level
connections.)
RPC calls/requests to the same server are multiplexed to go through this
one physical connection.
It is not 'serial' however.  i.e. requests and responses are interleaved
(tracked by call ids).
On the server side, the requests are processed by the handlers concurrently
as they arrive.

Hope this helps clarify it.

Jerry




On Mon, Jun 12, 2017 at 9:35 PM, Sachin Jain <[hidden email]>
wrote:

> Thanks Allan,
>
> This is what I understood initially that further calls will be serial if a
> request is already pending on some RS. I am running hbase 1.3.1
> Is "hbase.client.ipc.pool.size" still valid ? I thought it was valid for
> older versions of hbase when hbase used to provide connection pools but it
> does not provide.
>
> Is it right ?
>
> On Tue, Jun 13, 2017 at 8:14 AM, Allan Yang <[hidden email]> wrote:
>
> > Which HBase version are you using?(I'm assuming you are using the
> original
> > blocking client, the new netty client only available in 2.0 ) Yes, in
> > default, there is only one socket to each RS, and the calls written to
> this
> > socket are synchronized(or queued using another thread called CallSender
> ).
> > But usually, this won't become a bottleneck. If this is a problem for
> you,
> > you can tune "hbase.client.ipc.pool.size".
> >
> > 2017-06-12 23:47 GMT+08:00 Sachin Jain <[hidden email]>:
> >
> > > I meant to ask since connection object has predefined connections to
> > region
> > > servers that means there is a Socket based connection already open
with
> > > some region server R1. When a Hbase client has to make two or more
get
> > > requests to region server R1 how does that work with same connection
> with
> > > R1.
> > >
> > > On 12-Jun-2017 7:31 PM, "Allan Yang" <[hidden email]> wrote:
> > >
> > > Connection is thread safe. You can use it across different threads.
And
> > > requests made by different thread are handled in parallel no matter
the

> > > keys are in the same region or not.
> > >
> > > 2017-06-12 20:44 GMT+08:00 Sachin Jain <[hidden email]>:
> > >
> > > > Hi,
> > > >
> > > > I was going through connections in hbase. Here is reference from
> > > > ConnectionFactory API doc.
> > > >
> > > > >  Connection encapsulates all housekeeping for a connection to the
> > > > cluster. All tables and interfaces created from returned connection
> > share
> > > > zookeeper connection, meta cache, and connections to region servers
> and
> > > > masters.
> > > >
> > > > Suppose I am building a REST api and trying to retrieve data from
> Hbase
> > > in
> > > > REST calls. I am thinking to pre-create a connection and use it
among
> > > > different request threads.
> > > >
> > > > Suppose I get multiple requests for keys within same region, will
> that
> > > > single connection be able to serve multiple requests via same region
> > > server
> > > > ?
> > > >
> > > > Or Are those requests handled serially because once a request is
made
> > to
> > > > region server for key1,
> > > > another requests for key2,..,keyN have to wait for request of key1
to

> > > > complete.
> > > >
> > > > Even if I create a connection pool of pre-created connections of N
> > size,
> > > > does that mean I can serve only N parallel requests if all those
> > requests
> > > > have to deal with same hbase region server. Is this true ?
> > > >
> > > > [0]:
> > > > https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/
> > > > ConnectionFactory.html
> > > >
> > > > Thanks
> > > > -Sachin
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Regarding Connection Pooling

Ted Yu-3
Looking at branch-1.3, hbase.client.ipc.pool.size is still effective.
See the following code in AbstractRpcClient.java :

  protected static int getPoolSize(Configuration config) {
    return config.getInt(HConstants.HBASE_CLIENT_IPC_POOL_SIZE, 1);

On Fri, Jun 16, 2017 at 6:32 PM, Sachin Jain <[hidden email]>
wrote:

> Since I am on Hbase 1.3.1 I guess HConnection is deprecated by 1.0
> therefore I cannot use hbase.client.ipc.pool.size configuration to increase
> the number of connections from  client to individual region servers.
>
> I think the best bet is to create multiple connections on client and serve
> requests from that connection pool.
>
> Suggestions?
>
>
>
> On 15-Jun-2017 10:34 AM, "Jerry He" <[hidden email]> wrote:
>
>  At the high level, you have the Connection (which is HConnection) you
> obtained from ConnectionFactory API .
>
> As you mentioned, you create one such Connection.
>
> Internally within that Connection, there are physical RPC/socket
> connections to the different region servers.
> By default, one physical connection is opened to one remote region server
> simplistically speaking.
> ("hbase.client.ipc.pool.size" can help here to have multiple low level
> connections.)
> RPC calls/requests to the same server are multiplexed to go through this
> one physical connection.
> It is not 'serial' however.  i.e. requests and responses are interleaved
> (tracked by call ids).
> On the server side, the requests are processed by the handlers concurrently
> as they arrive.
>
> Hope this helps clarify it.
>
> Jerry
>
>
>
>
> On Mon, Jun 12, 2017 at 9:35 PM, Sachin Jain <[hidden email]>
> wrote:
>
> > Thanks Allan,
> >
> > This is what I understood initially that further calls will be serial if
> a
> > request is already pending on some RS. I am running hbase 1.3.1
> > Is "hbase.client.ipc.pool.size" still valid ? I thought it was valid for
> > older versions of hbase when hbase used to provide connection pools but
> it
> > does not provide.
> >
> > Is it right ?
> >
> > On Tue, Jun 13, 2017 at 8:14 AM, Allan Yang <[hidden email]> wrote:
> >
> > > Which HBase version are you using?(I'm assuming you are using the
> > original
> > > blocking client, the new netty client only available in 2.0 ) Yes, in
> > > default, there is only one socket to each RS, and the calls written to
> > this
> > > socket are synchronized(or queued using another thread called
> CallSender
> > ).
> > > But usually, this won't become a bottleneck. If this is a problem for
> > you,
> > > you can tune "hbase.client.ipc.pool.size".
> > >
> > > 2017-06-12 23:47 GMT+08:00 Sachin Jain <[hidden email]>:
> > >
> > > > I meant to ask since connection object has predefined connections to
> > > region
> > > > servers that means there is a Socket based connection already open
> with
> > > > some region server R1. When a Hbase client has to make two or more
> get
> > > > requests to region server R1 how does that work with same connection
> > with
> > > > R1.
> > > >
> > > > On 12-Jun-2017 7:31 PM, "Allan Yang" <[hidden email]> wrote:
> > > >
> > > > Connection is thread safe. You can use it across different threads.
> And
> > > > requests made by different thread are handled in parallel no matter
> the
> > > > keys are in the same region or not.
> > > >
> > > > 2017-06-12 20:44 GMT+08:00 Sachin Jain <[hidden email]>:
> > > >
> > > > > Hi,
> > > > >
> > > > > I was going through connections in hbase. Here is reference from
> > > > > ConnectionFactory API doc.
> > > > >
> > > > > >  Connection encapsulates all housekeeping for a connection to the
> > > > > cluster. All tables and interfaces created from returned connection
> > > share
> > > > > zookeeper connection, meta cache, and connections to region servers
> > and
> > > > > masters.
> > > > >
> > > > > Suppose I am building a REST api and trying to retrieve data from
> > Hbase
> > > > in
> > > > > REST calls. I am thinking to pre-create a connection and use it
> among
> > > > > different request threads.
> > > > >
> > > > > Suppose I get multiple requests for keys within same region, will
> > that
> > > > > single connection be able to serve multiple requests via same
> region
> > > > server
> > > > > ?
> > > > >
> > > > > Or Are those requests handled serially because once a request is
> made
> > > to
> > > > > region server for key1,
> > > > > another requests for key2,..,keyN have to wait for request of key1
> to
> > > > > complete.
> > > > >
> > > > > Even if I create a connection pool of pre-created connections of N
> > > size,
> > > > > does that mean I can serve only N parallel requests if all those
> > > requests
> > > > > have to deal with same hbase region server. Is this true ?
> > > > >
> > > > > [0]:
> > > > > https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/
> > > > > ConnectionFactory.html
> > > > >
> > > > > Thanks
> > > > > -Sachin
> > > > >
> > > >
> > >
> >
>