Quantcast

HBase - Performance issue

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

HBase - Performance issue

kzurek
The problem is that when I'm putting my data (multithreaded client, ~30MB/s traffic outgoing) into the cluster the load is equally spread over all RegionServer with 3.5% average CPU wait time (average CPU user: 51%). When I've added similar, mutlithreaded client that Scans for, let say, 100 last samples of randomly generated key from chosen time range, I'm getting high CPU wait time (20% and up) on two (or more if there is higher number of threads, default 10) random RegionServers. Therefore, machines that held those RS are getting very hot - one of the consequences is that number of store file is constantly increasing, up to the maximum limit. Rest of the RS are having 10-12% CPU wait time and everything seems to be OK (number of store files varies so they are being compacted and not increasing over time). Any ideas? Maybe  I could prioritize writes over reads somehow? Is it possible? If so what would be the best way to that and where it should be placed - on the client or cluster side)?

Cluster specification:
HBase Version 0.94.2-cdh4.2.0
Hadoop Version 2.0.0-cdh4.2.0
There are 6xDataNodes (5xHDD for storing data), 1xMasterNodes
Other settings:
 - Bloom filters (ROWCOL) set
 - Short circuit turned on
 - HDFS Block Size: 128MB
 - Java Heap Size of Namenode/Secondary Namenode in Bytes: 8 GiB
 - Java Heap Size of HBase RegionServer in Bytes: 12 GiB
 - Java Heap Size of HBase Master in Bytes: 4 GiB
 - Java Heap Size of DataNode in Bytes: 1 GiB (default)
Number of regions per RegionServer: 19 (total 114 regions on 6 RS)
Key design: <UUID><TIMESTAMP> -> UUID: 1-10M, TIMESTAMP: 1-N
Table design: 1 column family with 20 columns of 8 bytes

Get client:
Multiple threads
Each thread have its own tables instance with their Scanner.
Each thread have its own range of UUIDs and randomly draws beginning of time range to build rowkey properly (see above).
Each time Scan requests same amount of rows, but with random rowkey.
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HBase - Performance issue

Anoop John
Hi
           How many request handlers are there in ur RS?  Can you up this
number and see?

-Anoop-
On Wed, Apr 24, 2013 at 3:42 PM, kzurek <[hidden email]> wrote:

> The problem is that when I'm putting my data (multithreaded client, ~30MB/s
> traffic outgoing) into the cluster the load is equally spread over all
> RegionServer with 3.5% average CPU wait time (average CPU user: 51%). When
> I've added similar, mutlithreaded client that Scans for, let say, 100 last
> samples of randomly generated key from chosen time range, I'm getting high
> CPU wait time (20% and up) on two (or more if there is higher number of
> threads, default 10) random RegionServers. Therefore, machines that held
> those RS are getting very hot - one of the consequences is that number of
> store file is constantly increasing, up to the maximum limit. Rest of the
> RS
> are having 10-12% CPU wait time and everything seems to be OK (number of
> store files varies so they are being compacted and not increasing over
> time). Any ideas? Maybe  I could prioritize writes over reads somehow? Is
> it
> possible? If so what would be the best way to that and where it should be
> placed - on the client or cluster side)?
>
> Cluster specification:
> HBase Version   0.94.2-cdh4.2.0
> Hadoop Version  2.0.0-cdh4.2.0
> There are 6xDataNodes (5xHDD for storing data), 1xMasterNodes
> Other settings:
>  - Bloom filters (ROWCOL) set
>  - Short circuit turned on
>  - HDFS Block Size: 128MB
>  - Java Heap Size of Namenode/Secondary Namenode in Bytes: 8 GiB
>  - Java Heap Size of HBase RegionServer in Bytes: 12 GiB
>  - Java Heap Size of HBase Master in Bytes: 4 GiB
>  - Java Heap Size of DataNode in Bytes: 1 GiB (default)
> Number of regions per RegionServer: 19 (total 114 regions on 6 RS)
> Key design: <UUID><TIMESTAMP> -> UUID: 1-10M, TIMESTAMP: 1-N
> Table design: 1 column family with 20 columns of 8 bytes
>
> Get client:
> Multiple threads
> Each thread have its own tables instance with their Scanner.
> Each thread have its own range of UUIDs and randomly draws beginning of
> time
> range to build rowkey properly (see above).
> Each time Scan requests same amount of rows, but with random rowkey.
>
>
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/HBase-Performance-issue-tp4042836.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HBase - Performance issue

kzurek
I've following settings:
 hbase.master.handler.count = 25 (default value in CDH4.2)
 hbase.regionserver.handler.count = 20 (default 10)
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HBase - Performance issue

lars hofhansl-2
In reply to this post by kzurek
You may have run into https://issues.apache.org/jira/browse/HBASE-7336 (which is in 0.94.4)
(Although I had not observed this effect as much when short circuit reads are enabled)



----- Original Message -----
From: kzurek <[hidden email]>
To: [hidden email]
Cc:
Sent: Wednesday, April 24, 2013 3:12 AM
Subject: HBase - Performance issue

The problem is that when I'm putting my data (multithreaded client, ~30MB/s
traffic outgoing) into the cluster the load is equally spread over all
RegionServer with 3.5% average CPU wait time (average CPU user: 51%). When
I've added similar, mutlithreaded client that Scans for, let say, 100 last
samples of randomly generated key from chosen time range, I'm getting high
CPU wait time (20% and up) on two (or more if there is higher number of
threads, default 10) random RegionServers. Therefore, machines that held
those RS are getting very hot - one of the consequences is that number of
store file is constantly increasing, up to the maximum limit. Rest of the RS
are having 10-12% CPU wait time and everything seems to be OK (number of
store files varies so they are being compacted and not increasing over
time). Any ideas? Maybe  I could prioritize writes over reads somehow? Is it
possible? If so what would be the best way to that and where it should be
placed - on the client or cluster side)?

Cluster specification:
HBase Version    0.94.2-cdh4.2.0
Hadoop Version    2.0.0-cdh4.2.0
There are 6xDataNodes (5xHDD for storing data), 1xMasterNodes
Other settings:
- Bloom filters (ROWCOL) set
- Short circuit turned on
- HDFS Block Size: 128MB
- Java Heap Size of Namenode/Secondary Namenode in Bytes: 8 GiB
- Java Heap Size of HBase RegionServer in Bytes: 12 GiB
- Java Heap Size of HBase Master in Bytes: 4 GiB
- Java Heap Size of DataNode in Bytes: 1 GiB (default)
Number of regions per RegionServer: 19 (total 114 regions on 6 RS)
Key design: <UUID><TIMESTAMP> -> UUID: 1-10M, TIMESTAMP: 1-N
Table design: 1 column family with 20 columns of 8 bytes

Get client:
Multiple threads
Each thread have its own tables instance with their Scanner.
Each thread have its own range of UUIDs and randomly draws beginning of time
range to build rowkey properly (see above).
Each time Scan requests same amount of rows, but with random rowkey.





--
View this message in context: http://apache-hbase.679495.n3.nabble.com/HBase-Performance-issue-tp4042836.html
Sent from the HBase User mailing list archive at Nabble.com.

Loading...