Re: Should I pass on HBase for this project? (for now)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Should I pass on HBase for this project? (for now)

stack-3
Hey Daniel.  How are things going over there?  You coming to the user
group meeting tomorrow evening?
St.Ack

Daniel Leffel wrote:

> Hi All (and St.Ack),
> I've spent the last few weeks figuring out how to use HBase for my project.
> HBase at it's surface has seemed like the dream solution for this project
> and had me very excited from the beginning.
>
> However, from the moment I've begun to implement the project, I've had a
> frustrating go at it. I've spent weeks just simply trying to construct the
> environment under which my application will need to run. I've sent countless
> messages to this group (and thank you all so much for answering so many of
> them, especially St.Ack).
>
> At this point, I can't seem to tell which one(s) of the following is true:
>
>    - Maybe I'm just a freaking idiot
>    - Maybe HBase is just not equipped to do what I want it to do
>    - Maybe HBase is just still too unstable and it will do what I need it to
>    do at some point in the future
>    - Maybe I have the wrong expectations for the amount of hardware I need
>    to throw at the situation.
>
> I have Hadoop 0.16.3 running on 4 boxes (all 4 running DFS and 3 of them
> running MapRed). I'm running HBase 0.1.2 (most recent release candidate)
> with the master running on the same box as namenode and 3 region servers
> (running on the same MapRed boxes).
>
> My first and very simple task is to load a sparce table with 220 million
> rows. The average row has 2 columns or so (very low byte count per row). I
> have attempted to do this with a simple MapReduce job. In the Map phase, I'm
> simply parsing through a text file and using the standard TableReduce to
> load the table.
>
> I've attempted to do this with various numbers of reduce tasks and various
> configurations of which machines run each dameon.
>
> The end result is always the same. At some point, Regionservers go offline -
> the most recent behavior is that region servers just quit responding and
> logs set to debug give no useful information. If I had to guess, this was
> typical deadlock behavior.
>
> A simple table scan (just so I can find out how rows were successfully
> inserted before all the region servers died) usually causes the same
> behavior (one by one, region servers just die - even with no MapRed jobs
> running).
>
> At this point, I'm at a crossroads and beginning to think that I will need
> to leave HBase behind because I can't spend another week with no progress on
> this project.
>
> So, I ask the question(s) I posed in the beginning.
>
>    - Maybe I'm just a freaking idiot
>    - Maybe HBase is just not equipped to do what I want it to do
>    - Maybe HBase is just still too unstable and it will do what I need it to
>    do at some point in the future
>    - Maybe I have the wrong expectations for the amount of hardware I need
>    to throw at the situation.
>
> Can someone please point me in the right direction?
>
> Danny
>
>  

Reply | Threaded
Open this post in threaded view
|

Re: Should I pass on HBase for this project? (for now)

dleffel
Hi,
Planning on being there (short of any fires).

Things going relatively well. I've learned quite a bit about massaging the
setup during high load (write-intensive) jobs. Everything seems to be
working well now and HBase is going to be a piviotal part of my project.

See you tomorrow!

Danny


On Mon, May 19, 2008 at 2:17 PM, stack <[hidden email]> wrote:

> Hey Daniel.  How are things going over there?  You coming to the user group
> meeting tomorrow evening?
> St.Ack
>
>
> Daniel Leffel wrote:
>
>> Hi All (and St.Ack),
>> I've spent the last few weeks figuring out how to use HBase for my
>> project.
>> HBase at it's surface has seemed like the dream solution for this project
>> and had me very excited from the beginning.
>>
>> However, from the moment I've begun to implement the project, I've had a
>> frustrating go at it. I've spent weeks just simply trying to construct the
>> environment under which my application will need to run. I've sent
>> countless
>> messages to this group (and thank you all so much for answering so many of
>> them, especially St.Ack).
>>
>> At this point, I can't seem to tell which one(s) of the following is true:
>>
>>   - Maybe I'm just a freaking idiot
>>   - Maybe HBase is just not equipped to do what I want it to do
>>   - Maybe HBase is just still too unstable and it will do what I need it
>> to
>>   do at some point in the future
>>   - Maybe I have the wrong expectations for the amount of hardware I need
>>   to throw at the situation.
>>
>> I have Hadoop 0.16.3 running on 4 boxes (all 4 running DFS and 3 of them
>> running MapRed). I'm running HBase 0.1.2 (most recent release candidate)
>> with the master running on the same box as namenode and 3 region servers
>> (running on the same MapRed boxes).
>>
>> My first and very simple task is to load a sparce table with 220 million
>> rows. The average row has 2 columns or so (very low byte count per row). I
>> have attempted to do this with a simple MapReduce job. In the Map phase,
>> I'm
>> simply parsing through a text file and using the standard TableReduce to
>> load the table.
>>
>> I've attempted to do this with various numbers of reduce tasks and various
>> configurations of which machines run each dameon.
>>
>> The end result is always the same. At some point, Regionservers go offline
>> -
>> the most recent behavior is that region servers just quit responding and
>> logs set to debug give no useful information. If I had to guess, this was
>> typical deadlock behavior.
>>
>> A simple table scan (just so I can find out how rows were successfully
>> inserted before all the region servers died) usually causes the same
>> behavior (one by one, region servers just die - even with no MapRed jobs
>> running).
>>
>> At this point, I'm at a crossroads and beginning to think that I will need
>> to leave HBase behind because I can't spend another week with no progress
>> on
>> this project.
>>
>> So, I ask the question(s) I posed in the beginning.
>>
>>   - Maybe I'm just a freaking idiot
>>   - Maybe HBase is just not equipped to do what I want it to do
>>   - Maybe HBase is just still too unstable and it will do what I need it
>> to
>>   do at some point in the future
>>   - Maybe I have the wrong expectations for the amount of hardware I need
>>   to throw at the situation.
>>
>> Can someone please point me in the right direction?
>>
>> Danny
>>
>>
>>
>
>