Quantcast

HBase vs Hadoop memory configuration.

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

HBase vs Hadoop memory configuration.

Jean-Marc Spaggiari
Hi,

I saw on another message that hadoop only need 1GB...

Today, I have configured my nodes with 45% memory for HBase, 45%
memory for Hadoop. The last 10% are for the OS.

Should I move that with 1GB for Hadoop, 10% for the OS and the rest
for HBase? Even if running MR jobs?

Thanks,

JM
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HBase vs Hadoop memory configuration.

Kevin O'dell
Hey JM,

  I suspect they are referring to the DN process only.  It is important in
these discussion to talk about individual component memory usage.  In
my experience most HBase clusters only need 1 - 2 GB of heap space for the
DN process.  I am not a Map Reduce expert, but typically the actual TT
process only needs 1GB of memory then you control everything else through
max slots and child heap.  What is your current block count per DN?

On Sun, Jan 27, 2013 at 9:28 AM, Jean-Marc Spaggiari <
[hidden email]> wrote:

> Hi,
>
> I saw on another message that hadoop only need 1GB...
>
> Today, I have configured my nodes with 45% memory for HBase, 45%
> memory for Hadoop. The last 10% are for the OS.
>
> Should I move that with 1GB for Hadoop, 10% for the OS and the rest
> for HBase? Even if running MR jobs?
>
> Thanks,
>
> JM
>



--
Kevin O'Dell
Customer Operations Engineer, Cloudera
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HBase vs Hadoop memory configuration.

Jean-Marc Spaggiari
Hi Kevin,

What do you mean by "current block count per DN"? I kept the standard settings.

fsck is telling me that I have 10893 titak blocks. Since I have 8
nodes, it's giving me 1361 blocks per node.

It that what you are asking?

JM

2013/1/27, Kevin O'dell <[hidden email]>:

> Hey JM,
>
>   I suspect they are referring to the DN process only.  It is important in
> these discussion to talk about individual component memory usage.  In
> my experience most HBase clusters only need 1 - 2 GB of heap space for the
> DN process.  I am not a Map Reduce expert, but typically the actual TT
> process only needs 1GB of memory then you control everything else through
> max slots and child heap.  What is your current block count per DN?
>
> On Sun, Jan 27, 2013 at 9:28 AM, Jean-Marc Spaggiari <
> [hidden email]> wrote:
>
>> Hi,
>>
>> I saw on another message that hadoop only need 1GB...
>>
>> Today, I have configured my nodes with 45% memory for HBase, 45%
>> memory for Hadoop. The last 10% are for the OS.
>>
>> Should I move that with 1GB for Hadoop, 10% for the OS and the rest
>> for HBase? Even if running MR jobs?
>>
>> Thanks,
>>
>> JM
>>
>
>
>
> --
> Kevin O'Dell
> Customer Operations Engineer, Cloudera
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HBase vs Hadoop memory configuration.

Kevin O'dell
JM,

  That is probably correct.  You can check the NN UI and confirm that
number, but it doesn't seem too far off for an HBase cluster.  You will be
fine with just 1GB of heap for the DN with a block count that low.
 Typically you don't need to raise the heap until you are looking at a
couple hundred thousand blocks per DN.

On Sun, Jan 27, 2013 at 10:43 AM, Jean-Marc Spaggiari <
[hidden email]> wrote:

> Hi Kevin,
>
> What do you mean by "current block count per DN"? I kept the standard
> settings.
>
> fsck is telling me that I have 10893 titak blocks. Since I have 8
> nodes, it's giving me 1361 blocks per node.
>
> It that what you are asking?
>
> JM
>
> 2013/1/27, Kevin O'dell <[hidden email]>:
> > Hey JM,
> >
> >   I suspect they are referring to the DN process only.  It is important
> in
> > these discussion to talk about individual component memory usage.  In
> > my experience most HBase clusters only need 1 - 2 GB of heap space for
> the
> > DN process.  I am not a Map Reduce expert, but typically the actual TT
> > process only needs 1GB of memory then you control everything else through
> > max slots and child heap.  What is your current block count per DN?
> >
> > On Sun, Jan 27, 2013 at 9:28 AM, Jean-Marc Spaggiari <
> > [hidden email]> wrote:
> >
> >> Hi,
> >>
> >> I saw on another message that hadoop only need 1GB...
> >>
> >> Today, I have configured my nodes with 45% memory for HBase, 45%
> >> memory for Hadoop. The last 10% are for the OS.
> >>
> >> Should I move that with 1GB for Hadoop, 10% for the OS and the rest
> >> for HBase? Even if running MR jobs?
> >>
> >> Thanks,
> >>
> >> JM
> >>
> >
> >
> >
> > --
> > Kevin O'Dell
> > Customer Operations Engineer, Cloudera
> >
>



--
Kevin O'Dell
Customer Operations Engineer, Cloudera
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HBase vs Hadoop memory configuration.

Jean-Marc Spaggiari
From the UI:
15790 files and directories, 11292 blocks = 27082 total. Heap Size is
179.12 MB / 910.25 MB (19%)

I'm setting the memory into the hadoop-env.sh file using:
export HADOOP_HEAPSIZE=1024

I think that's fine for the datanodes, but does it mean also each task
traker, job tracker and name node will take 1G? So 2GB to 4GB on each
server? (1 NN+JB+DN+TT and 7 DN+TT) Or it will be 1GB in total?

And if we say 1GB for the DN, how much should we reserved for the
other deamons? I want to make sure I give the maximum I can give to
HBase without starving Hadoop...

JM

2013/1/27, Kevin O'dell <[hidden email]>:

> JM,
>
>   That is probably correct.  You can check the NN UI and confirm that
> number, but it doesn't seem too far off for an HBase cluster.  You will be
> fine with just 1GB of heap for the DN with a block count that low.
>  Typically you don't need to raise the heap until you are looking at a
> couple hundred thousand blocks per DN.
>
> On Sun, Jan 27, 2013 at 10:43 AM, Jean-Marc Spaggiari <
> [hidden email]> wrote:
>
>> Hi Kevin,
>>
>> What do you mean by "current block count per DN"? I kept the standard
>> settings.
>>
>> fsck is telling me that I have 10893 titak blocks. Since I have 8
>> nodes, it's giving me 1361 blocks per node.
>>
>> It that what you are asking?
>>
>> JM
>>
>> 2013/1/27, Kevin O'dell <[hidden email]>:
>> > Hey JM,
>> >
>> >   I suspect they are referring to the DN process only.  It is important
>> in
>> > these discussion to talk about individual component memory usage.  In
>> > my experience most HBase clusters only need 1 - 2 GB of heap space for
>> the
>> > DN process.  I am not a Map Reduce expert, but typically the actual TT
>> > process only needs 1GB of memory then you control everything else
>> > through
>> > max slots and child heap.  What is your current block count per DN?
>> >
>> > On Sun, Jan 27, 2013 at 9:28 AM, Jean-Marc Spaggiari <
>> > [hidden email]> wrote:
>> >
>> >> Hi,
>> >>
>> >> I saw on another message that hadoop only need 1GB...
>> >>
>> >> Today, I have configured my nodes with 45% memory for HBase, 45%
>> >> memory for Hadoop. The last 10% are for the OS.
>> >>
>> >> Should I move that with 1GB for Hadoop, 10% for the OS and the rest
>> >> for HBase? Even if running MR jobs?
>> >>
>> >> Thanks,
>> >>
>> >> JM
>> >>
>> >
>> >
>> >
>> > --
>> > Kevin O'Dell
>> > Customer Operations Engineer, Cloudera
>> >
>>
>
>
>
> --
> Kevin O'Dell
> Customer Operations Engineer, Cloudera
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HBase vs Hadoop memory configuration.

Kevin O'dell
JM,

  You would control those through the hadoop-env.sh using JOBTRACKER_OPTS,
TASKTRACKER_OPTS and then setting xmx for the desired heap.

On Sun, Jan 27, 2013 at 11:33 AM, Jean-Marc Spaggiari <
[hidden email]> wrote:

> From the UI:
> 15790 files and directories, 11292 blocks = 27082 total. Heap Size is
> 179.12 MB / 910.25 MB (19%)
>
> I'm setting the memory into the hadoop-env.sh file using:
> export HADOOP_HEAPSIZE=1024
>
> I think that's fine for the datanodes, but does it mean also each task
> traker, job tracker and name node will take 1G? So 2GB to 4GB on each
> server? (1 NN+JB+DN+TT and 7 DN+TT) Or it will be 1GB in total?
>
> And if we say 1GB for the DN, how much should we reserved for the
> other deamons? I want to make sure I give the maximum I can give to
> HBase without starving Hadoop...
>
> JM
>
> 2013/1/27, Kevin O'dell <[hidden email]>:
> > JM,
> >
> >   That is probably correct.  You can check the NN UI and confirm that
> > number, but it doesn't seem too far off for an HBase cluster.  You will
> be
> > fine with just 1GB of heap for the DN with a block count that low.
> >  Typically you don't need to raise the heap until you are looking at a
> > couple hundred thousand blocks per DN.
> >
> > On Sun, Jan 27, 2013 at 10:43 AM, Jean-Marc Spaggiari <
> > [hidden email]> wrote:
> >
> >> Hi Kevin,
> >>
> >> What do you mean by "current block count per DN"? I kept the standard
> >> settings.
> >>
> >> fsck is telling me that I have 10893 titak blocks. Since I have 8
> >> nodes, it's giving me 1361 blocks per node.
> >>
> >> It that what you are asking?
> >>
> >> JM
> >>
> >> 2013/1/27, Kevin O'dell <[hidden email]>:
> >> > Hey JM,
> >> >
> >> >   I suspect they are referring to the DN process only.  It is
> important
> >> in
> >> > these discussion to talk about individual component memory usage.  In
> >> > my experience most HBase clusters only need 1 - 2 GB of heap space for
> >> the
> >> > DN process.  I am not a Map Reduce expert, but typically the actual TT
> >> > process only needs 1GB of memory then you control everything else
> >> > through
> >> > max slots and child heap.  What is your current block count per DN?
> >> >
> >> > On Sun, Jan 27, 2013 at 9:28 AM, Jean-Marc Spaggiari <
> >> > [hidden email]> wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> I saw on another message that hadoop only need 1GB...
> >> >>
> >> >> Today, I have configured my nodes with 45% memory for HBase, 45%
> >> >> memory for Hadoop. The last 10% are for the OS.
> >> >>
> >> >> Should I move that with 1GB for Hadoop, 10% for the OS and the rest
> >> >> for HBase? Even if running MR jobs?
> >> >>
> >> >> Thanks,
> >> >>
> >> >> JM
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Kevin O'Dell
> >> > Customer Operations Engineer, Cloudera
> >> >
> >>
> >
> >
> >
> > --
> > Kevin O'Dell
> > Customer Operations Engineer, Cloudera
> >
>



--
Kevin O'Dell
Customer Operations Engineer, Cloudera
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HBase vs Hadoop memory configuration.

karunakar
In reply to this post by Jean-Marc Spaggiari
Hi Jean,

AFAIK !!

The namenode can handle 1 million blocks for 1GB of namenode heap size ! It depends on the configuration
dfs.block.size*1 milion blocks = 128 TB of data [considering 128 MB is the default block size].

Using this command :export HADOOP_HEAPSIZE="-Xmx2g" will change across all the daemons. Rather than using that, use the below configurations for individual daemons.

You can set the namenode, datanode, jobtracker, tasktracker 2 gb heap size for each daemon by using the following lines in hadoop-env.sh: Example

export HADOOP_NAMENODE_OPTS="-Xmx2g"
export HADOOP_DATANODE_OPTS="-Xmx2g"
export HADOOP_JOBTRACKER_OPTS="-Xmx2g"
export HADOOP_TASKTRACKER_OPTS="-Xmx2g"

Ex: If you have a server of 16 GB and concentrating more on HBase, and if you are running datanode, tasktracker and regionserver on one node: then give 4 GB for datanode, 2-3 GB for tasktracker [setting child jvm's] and 6-8 GB for regionserver.

Thanks,
karunakar.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HBase vs Hadoop memory configuration.

Jean-Marc Spaggiari
Thanks all for this information.

I have try to adjust my setting to make sure the memory is used efficiently.

JM

2013/1/28, karunakar <[hidden email]>:

> Hi Jean,
>
> AFAIK !!
>
> The namenode can handle 1 million blocks for 1GB of namenode heap size ! It
> depends on the configuration
> dfs.block.size*1 milion blocks = 128 TB of data [considering 128 MB is the
> default block size].
>
> Using this command :export HADOOP_HEAPSIZE="-Xmx2g" will change across all
> the daemons. Rather than using that, use the below configurations for
> individual daemons.
>
> You can set the namenode, datanode, jobtracker, tasktracker 2 gb heap size
> for each daemon by using the following lines in hadoop-env.sh: Example
>
> export HADOOP_NAMENODE_OPTS="-Xmx2g"
> export HADOOP_DATANODE_OPTS="-Xmx2g"
> export HADOOP_JOBTRACKER_OPTS="-Xmx2g"
> export HADOOP_TASKTRACKER_OPTS="-Xmx2g"
>
> Ex: If you have a server of 16 GB and concentrating more on HBase, and if
> you are running datanode, tasktracker and regionserver on one node: then
> give 4 GB for datanode, 2-3 GB for tasktracker [setting child jvm's] and
> 6-8
> GB for regionserver.
>
> Thanks,
> karunakar.
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/HBase-vs-Hadoop-memory-configuration-tp4037436p4037573.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
Loading...