Re: Does HBase support single-row transaction?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Does HBase support single-row transaction?

Clint Morgan-2
Zookeeper makes good sense for distributed locking to get isolation.
But we still need transaction start, commit, and rollback to get
atomicity. I think this properly belongs in hbase.

So suppose I want to read two rows, and then update them as an
isolated, atomic action:

try {
  getZookeeperLock(table)
  tranId = table.beginTransaction();
  row1 = table.get() // Normal get, but isolated due to distributed lock
  row2 = table.get()
  BatchUpdate b1 = new BatchUpdate(row1)
  b1.put(...)
  table.addUpdate(tranId, b1);
  BatchUpdate b2 = new BatchUpdate(row2)
  b2.put(...);
  table.addUpdate(tranId, b2);
  table.commit(tranId);
} catch(Exception e) {
  table.rollback(tranId);
} finally {
  releaseZookeeperLock(table)
}

So then on the hbase side we hold on to the batchUpdates until the
table.commit is called. Then we roll through and apply the updates.

I'm sure rollback()/commit() is tricky to implement, as the updates
could be on different region servers, so we need a failure on one to
trigger a rollback on others. We could use timestamp/old versions to
implement rollback on batchUpdates we have already applied.

Alternatively, this may all be implemented above hbase. The client
keeps track of updates, and trys to roll back using timestamps.
Problem here is if the client dies midway through we have half the
transaction committed and loose atomicity/consistency.

We will eventually want/need atomic transactions on hbase, so I'll
look into this further. Any input would be appreciated. Would be
interesting to know how/what google provides...

cheers,
-clint


On Sun, May 11, 2008 at 7:48 AM, Bryan Duxbury <[hidden email]> wrote:

> Currently, it's not on our list of things to do. There are a number of
> reasons why it would be better to use Zookeeper here than to try and build
> it into HBase.
>
> That said, I think you could get everything you need if you tried Zookeeper,
> using that to acquire locks on the row you need a transaction on. It's
> supposedly very high performance and supports your use case precisely.
>
> -Bryan
>
> On May 10, 2008, at 11:52 PM, Zhou Wei wrote:
>
>> Bryan Duxbury 写道:
>>>
>>> startUpdate is deprecated in TRUNK. Also, it doesn't do what you are
>>> thinking it does. Committing a BatchUpdate is atomic across the whole row,
>>> however. There is currently no way to make a get and a commit transactional,
>>> though there is an issue open for write-if-not-modified-since support. If
>>> this is something you need we can talk about how it might be supported.
>>
>> Thanks for answering my questions.
>>
>> So currently HBase is not suitable for transactional web applications.
>> A simple counting transaction can not work by concurrent accesses:
>> transaction{
>> get(x);
>> x++;
>> write(x);
>> }
>>
>> In my opinion, "write-if-not-modified-since" support may not be the best
>> idea of implement single-row transaction.
>> Because if write can not be performed, application has to try again and
>> again, or just return error and leave user to choose again or abort.
>> Probably locking, waiting and scheduling at region server might be
>> preferable in this case.
>> Is the single-row transaction feature currently in the roadmap of HBase?
>>
>> Zhou
>>>
>>> -Bryan
>>>
>>> On May 7, 2008, at 7:48 PM, Zhou Wei wrote:
>>>
>>>> Hi
>>>> Does HBase support single-row transaction as described in Bigtable
>>>> paper?
>>>>
>>>> "Bigtable supports single-row transactions, which can be
>>>> used to perform atomic read-modify-write sequences on
>>>> data stored under a single row key." --Bigtable paper
>>>>
>>>> If so, how can I define a transaction in HBase,
>>>> is it looks like this:
>>>>
>>>> lid=startUpdate
>>>> get(lid)
>>>> ..
>>>> put(lid)
>>>> ...
>>>> commit(lid)
>>>>
>>>> Are these transactions isolated with each other?
>>>> If not, is there a way to achieve that?
>>>>
>>>> Thanks
>>>>
>>>> Zhou
>>>
>>>
>>>
>>
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Does HBase support single-row transaction?

Bryan Duxbury-2
It seems like if you wanted to do some manner of multi-row  
transactional put, the only real way to manage it is with deletes.  
That is, if the first put succeeds but the second fails, you can  
"invert" the first put into a bunch of deletes.

Trying to make the regions themselves maintain the transactional  
state seems like a terrible idea. You'd have to not allow a region to  
get migrated to another server if it's serving a transaction. This  
would introduce a lot of potential performance problems, I think.

Can you help me understand why atomic transactions are needed? Can't  
the atomicity problems be sort of resolved by the whole row  
versioning thing? Other databases that do transactions and rollbacks  
use versioning to accomplish that, I think.

-Bryan

On May 27, 2008, at 12:29 PM, Clint Morgan wrote:

> Zookeeper makes good sense for distributed locking to get isolation.
> But we still need transaction start, commit, and rollback to get
> atomicity. I think this properly belongs in hbase.
>
> So suppose I want to read two rows, and then update them as an
> isolated, atomic action:
>
> try {
>   getZookeeperLock(table)
>   tranId = table.beginTransaction();
>   row1 = table.get() // Normal get, but isolated due to distributed  
> lock
>   row2 = table.get()
>   BatchUpdate b1 = new BatchUpdate(row1)
>   b1.put(...)
>   table.addUpdate(tranId, b1);
>   BatchUpdate b2 = new BatchUpdate(row2)
>   b2.put(...);
>   table.addUpdate(tranId, b2);
>   table.commit(tranId);
> } catch(Exception e) {
>   table.rollback(tranId);
> } finally {
>   releaseZookeeperLock(table)
> }
>
> So then on the hbase side we hold on to the batchUpdates until the
> table.commit is called. Then we roll through and apply the updates.
>
> I'm sure rollback()/commit() is tricky to implement, as the updates
> could be on different region servers, so we need a failure on one to
> trigger a rollback on others. We could use timestamp/old versions to
> implement rollback on batchUpdates we have already applied.
>
> Alternatively, this may all be implemented above hbase. The client
> keeps track of updates, and trys to roll back using timestamps.
> Problem here is if the client dies midway through we have half the
> transaction committed and loose atomicity/consistency.
>
> We will eventually want/need atomic transactions on hbase, so I'll
> look into this further. Any input would be appreciated. Would be
> interesting to know how/what google provides...
>
> cheers,
> -clint
>
>
> On Sun, May 11, 2008 at 7:48 AM, Bryan Duxbury <[hidden email]>  
> wrote:
>> Currently, it's not on our list of things to do. There are a  
>> number of
>> reasons why it would be better to use Zookeeper here than to try  
>> and build
>> it into HBase.
>>
>> That said, I think you could get everything you need if you tried  
>> Zookeeper,
>> using that to acquire locks on the row you need a transaction on.  
>> It's
>> supposedly very high performance and supports your use case  
>> precisely.
>>
>> -Bryan
>>
>> On May 10, 2008, at 11:52 PM, Zhou Wei wrote:
>>
>>> Bryan Duxbury 写道:
>>>>
>>>> startUpdate is deprecated in TRUNK. Also, it doesn't do what you  
>>>> are
>>>> thinking it does. Committing a BatchUpdate is atomic across the  
>>>> whole row,
>>>> however. There is currently no way to make a get and a commit  
>>>> transactional,
>>>> though there is an issue open for write-if-not-modified-since  
>>>> support. If
>>>> this is something you need we can talk about how it might be  
>>>> supported.
>>>
>>> Thanks for answering my questions.
>>>
>>> So currently HBase is not suitable for transactional web  
>>> applications.
>>> A simple counting transaction can not work by concurrent accesses:
>>> transaction{
>>> get(x);
>>> x++;
>>> write(x);
>>> }
>>>
>>> In my opinion, "write-if-not-modified-since" support may not be  
>>> the best
>>> idea of implement single-row transaction.
>>> Because if write can not be performed, application has to try  
>>> again and
>>> again, or just return error and leave user to choose again or abort.
>>> Probably locking, waiting and scheduling at region server might be
>>> preferable in this case.
>>> Is the single-row transaction feature currently in the roadmap of  
>>> HBase?
>>>
>>> Zhou
>>>>
>>>> -Bryan
>>>>
>>>> On May 7, 2008, at 7:48 PM, Zhou Wei wrote:
>>>>
>>>>> Hi
>>>>> Does HBase support single-row transaction as described in Bigtable
>>>>> paper?
>>>>>
>>>>> "Bigtable supports single-row transactions, which can be
>>>>> used to perform atomic read-modify-write sequences on
>>>>> data stored under a single row key." --Bigtable paper
>>>>>
>>>>> If so, how can I define a transaction in HBase,
>>>>> is it looks like this:
>>>>>
>>>>> lid=startUpdate
>>>>> get(lid)
>>>>> ..
>>>>> put(lid)
>>>>> ...
>>>>> commit(lid)
>>>>>
>>>>> Are these transactions isolated with each other?
>>>>> If not, is there a way to achieve that?
>>>>>
>>>>> Thanks
>>>>>
>>>>> Zhou
>>>>
>>>>
>>>>
>>>
>>
>>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Does HBase support single-row transaction?

Clint Morgan-2
Responses inline:

2008/5/27 Bryan Duxbury <[hidden email]>:
> It seems like if you wanted to do some manner of multi-row transactional
> put, the only real way to manage it is with deletes. That is, if the first
> put succeeds but the second fails, you can "invert" the first put into a
> bunch of deletes.

Yes, this is what I was thinking by using the timestamp/multiple
versions. To roll back you delete everything you wrote and then we get
back to the previous version. Alternatively you could save the
original values before they are overwritten.

> Trying to make the regions themselves maintain the transactional state seems
> like a terrible idea. You'd have to not allow a region to get migrated to
> another server if it's serving a transaction. This would introduce a lot of
> potential performance problems, I think.

I'm envisioning transactions being relatively short-lived: 100 ms to a
few seconds. I don't see this getting in the way of eg region
migration any more than scanners do. But maybe I'm missing something.

So the transactional state for a region is (roughly) a transaction
lease, and a collection of the corresponding BatchUpdates.

> Can you help me understand why atomic transactions are needed? Can't the
> atomicity problems be sort of resolved by the whole row versioning thing?

Simply, we need to ensure that all updates happen together. Otherwise,
the data is in an inconsistent state. Take the standard example of
debiting one account and crediting another. If only one of these rows
gets updated, then the resulting table is corrupted and will not make
sense to the application. (Money has been created or destroyed)

So that is why one needs atomicity: the application-level semantics demand it.

When we encounter an exception midway through the transaction, we can
recover the old state of the modified row(s) by reverting to the
previous version. So the question is who recognizes this and does the
rollback? I'd like hbase to do it because it seems like a logical
place to put the behavior. So if the client crashed halfway through
the transaction, then when his transaction lease expires, hbase will
revert the relevant BatchUpdates. And the integrity of our table is
preserved!

> Other databases that do transactions and rollbacks use versioning to
> accomplish that, I think.

I don't know much about this. But however other (R)DBMS implement it,
it is provided as a primitive rather than implemented on top of
underlying versioning functionality (by users). This way the database
will maintain the consistency rather than the user having to recognize
problems and revert the state itself.

-clint
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Does HBase support single-row transaction?

Bryan Duxbury-2
I see what you're saying. I need to think on this. Stack, care to  
weigh in?

-Bryan

On May 27, 2008, at 1:56 PM, Clint Morgan wrote:

> Responses inline:
>
> 2008/5/27 Bryan Duxbury <[hidden email]>:
>> It seems like if you wanted to do some manner of multi-row  
>> transactional
>> put, the only real way to manage it is with deletes. That is, if  
>> the first
>> put succeeds but the second fails, you can "invert" the first put  
>> into a
>> bunch of deletes.
>
> Yes, this is what I was thinking by using the timestamp/multiple
> versions. To roll back you delete everything you wrote and then we get
> back to the previous version. Alternatively you could save the
> original values before they are overwritten.
>
>> Trying to make the regions themselves maintain the transactional  
>> state seems
>> like a terrible idea. You'd have to not allow a region to get  
>> migrated to
>> another server if it's serving a transaction. This would introduce  
>> a lot of
>> potential performance problems, I think.
>
> I'm envisioning transactions being relatively short-lived: 100 ms to a
> few seconds. I don't see this getting in the way of eg region
> migration any more than scanners do. But maybe I'm missing something.
>
> So the transactional state for a region is (roughly) a transaction
> lease, and a collection of the corresponding BatchUpdates.
>
>> Can you help me understand why atomic transactions are needed?  
>> Can't the
>> atomicity problems be sort of resolved by the whole row versioning  
>> thing?
>
> Simply, we need to ensure that all updates happen together. Otherwise,
> the data is in an inconsistent state. Take the standard example of
> debiting one account and crediting another. If only one of these rows
> gets updated, then the resulting table is corrupted and will not make
> sense to the application. (Money has been created or destroyed)
>
> So that is why one needs atomicity: the application-level semantics  
> demand it.
>
> When we encounter an exception midway through the transaction, we can
> recover the old state of the modified row(s) by reverting to the
> previous version. So the question is who recognizes this and does the
> rollback? I'd like hbase to do it because it seems like a logical
> place to put the behavior. So if the client crashed halfway through
> the transaction, then when his transaction lease expires, hbase will
> revert the relevant BatchUpdates. And the integrity of our table is
> preserved!
>
>> Other databases that do transactions and rollbacks use versioning to
>> accomplish that, I think.
>
> I don't know much about this. But however other (R)DBMS implement it,
> it is provided as a primitive rather than implemented on top of
> underlying versioning functionality (by users). This way the database
> will maintain the consistency rather than the user having to recognize
> problems and revert the state itself.
>
> -clint

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Does HBase support single-row transaction?

stack-3
In reply to this post by Clint Morgan-2
Clint Morgan wrote:
> Zookeeper makes good sense for distributed locking to get isolation.
> But we still need transaction start, commit, and rollback to get
> atomicity. I think this properly belongs in hbase.
>  
Since all clients are going via zookeeper anyways ('isolation'), maybe
it'd be better to just run the whole transaction management out of
zookeeper? Clients would open a transaction on zookeeper and put their
edits there so they were available for rollback and/or commit. If client
died midway, could ask zookeeper for outstanding transactions and pickup
whereever it'd left off. Otherwise, on success (or rollback), clean up
the transaction log.

Alternatively, all clients would have to go via the hbase master so it
could orchestrate row access. Master would need to hold outstanding
transactions somewhere either in an in-memory transactions catalog table
or itself over in zookeeper.

St.Ack
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Does HBase support single-row transaction?

stack-3
In reply to this post by Clint Morgan-2
Clint Morgan wrote:

> Responses inline:
>
> 2008/5/27 Bryan Duxbury <[hidden email]>:
>  
>> It seems like if you wanted to do some manner of multi-row transactional
>> put, the only real way to manage it is with deletes. That is, if the first
>> put succeeds but the second fails, you can "invert" the first put into a
>> bunch of deletes.
>>    
>
> Yes, this is what I was thinking by using the timestamp/multiple
> versions. To roll back you delete everything you wrote and then we get
> back to the previous version. Alternatively you could save the
> original values before they are overwritten.
>  

Deletes would be the way to go I'd say (what to do if we can't insert
the delete for the very reason the transactions failing?).

We'd have to do a bit of work to support this case first though.  IIRC,
deletes X-out cells of same timestamp when getting but when scanning, if
we encounter a delete, it blocks being able to see whats behind the delete.

St.Ack
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Does HBase support single-row transaction?

Clint Morgan-2
In reply to this post by stack-3
So if we wrote all operations for a transaction first to ZooKeeper, we
still need something like a Distributed Transaction Manager to
orchestrate the commit process: Send BatchUpdates to each
RegionServer, ask them to commit, then commit or rollback based on
results from all participating RegionServers. Or is there some more
clever way to use ZooKeeper? Maybe encoding a commit protocol into the
Zookeeper nodes...

Looks like google's datastore has a mechanism for keeping groups of
rows (entity groups) together on the same server (datastore node).
Then they allow transactions only on rows in the same group. This way
they don't have to worry about distributed transactions. Rather than
locking, they use optimistic concurrency control. This means they do
the transaction in a sandbox, then check for conflicts from other
transactions before committing.

-clint

On Tue, May 27, 2008 at 2:13 PM, stack <[hidden email]> wrote:

> Clint Morgan wrote:
>> Zookeeper makes good sense for distributed locking to get isolation.
>> But we still need transaction start, commit, and rollback to get
>> atomicity. I think this properly belongs in hbase.
>>
> Since all clients are going via zookeeper anyways ('isolation'), maybe
> it'd be better to just run the whole transaction management out of
> zookeeper? Clients would open a transaction on zookeeper and put their
> edits there so they were available for rollback and/or commit. If client
> died midway, could ask zookeeper for outstanding transactions and pickup
> whereever it'd left off. Otherwise, on success (or rollback), clean up
> the transaction log.
>
> Alternatively, all clients would have to go via the hbase master so it
> could orchestrate row access. Master would need to hold outstanding
> transactions somewhere either in an in-memory transactions catalog table
> or itself over in zookeeper.
>
> St.Ack
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Does HBase support single-row transaction?

stack-3
Clint Morgan wrote:
> So if we wrote all operations for a transaction first to ZooKeeper, we
> still need something like a Distributed Transaction Manager to
> orchestrate the commit process: Send BatchUpdates to each
> RegionServer, ask them to commit, then commit or rollback based on
> results from all participating RegionServers.
Yes.

> Or is there some more
> clever way to use ZooKeeper? Maybe encoding a commit protocol into the
> Zookeeper nodes...
>
>  
This page is interesting discussing how you can build various
cluster-wide primitives such as locks and two-phase commit using
zookeeper:  http://zookeeper.wiki.sourceforge.net/ZooKeeperRecipes.  
Still would need a transaction orchestrator of some sort.

> Looks like google's datastore has a mechanism for keeping groups of
> rows (entity groups) together on the same server (datastore node).
>  
From
http://code.google.com/appengine/docs/datastore/keysandentitygroups.html:

"When the application creates an entity, it can assign another entity as
the parent of the new entity. Assigning a parent to a new entity puts
the new entity in the same entity group as the parent entity."

I think I need to sign up for app engine and use it to see if I can
figure how the above is done.
> Then they allow transactions only on rows in the same group. This way
> they don't have to worry about distributed transactions. Rather than
> locking, they use optimistic concurrency control. This means they do
> the transaction in a sandbox, then check for conflicts from other
> transactions before committing.
We'd need to have HBASE-493 in place building any kind of OCC.

St.Ack



> -clint
>
> On Tue, May 27, 2008 at 2:13 PM, stack <[hidden email]> wrote:
>  
>> Clint Morgan wrote:
>>    
>>> Zookeeper makes good sense for distributed locking to get isolation.
>>> But we still need transaction start, commit, and rollback to get
>>> atomicity. I think this properly belongs in hbase.
>>>
>>>      
>> Since all clients are going via zookeeper anyways ('isolation'), maybe
>> it'd be better to just run the whole transaction management out of
>> zookeeper? Clients would open a transaction on zookeeper and put their
>> edits there so they were available for rollback and/or commit. If client
>> died midway, could ask zookeeper for outstanding transactions and pickup
>> whereever it'd left off. Otherwise, on success (or rollback), clean up
>> the transaction log.
>>
>> Alternatively, all clients would have to go via the hbase master so it
>> could orchestrate row access. Master would need to hold outstanding
>> transactions somewhere either in an in-memory transactions catalog table
>> or itself over in zookeeper.
>>
>> St.Ack
>>
>>    

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Does HBase support single-row transaction?

Clint Morgan-2
> "When the application creates an entity, it can assign another entity as the
> parent of the new entity. Assigning a parent to a new entity puts the new
> entity in the same entity group as the parent entity."
>
> I think I need to sign up for app engine and use it to see if I can figure
> how the above is done.

Was thinking this may be done with row key prefix. So all members of
an entity group have the same prefix and are collocated. Then the
regions (or tablets or datastore nodes) must know not to split in the
middle of such a prefix.

Also, it would make sense that they have one table per app engine
user, and each table stores all the kinds (types) that the application
uses...

> We'd need to have HBASE-493 in place building any kind of OCC.
I see the value of 493 for OCC with single row transactions, but for
multi-row transactions i think its not useful. Basically we would have
to hold of on all row puts if any relevant row has conflicts.
cheers,
-clint
Loading...