Hi,
Is there any plan or roadmap of secondary index in hbase? Zhou Shuaifeng(Frank) This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! |
There's this jira: https://issues.apache.org/jira/browse/HBASE-3340
But currently I don't know of anyone actively working on the feature. J-D On Thu, Feb 24, 2011 at 11:56 PM, Zhou Shuaifeng <[hidden email]> wrote: > Hi, > > Is there any plan or roadmap of secondary index in hbase? > > > > Zhou Shuaifeng(Frank) > > > > > This e-mail and its attachments contain confidential information from > HUAWEI, which > is intended only for the person or entity whose address is listed above. Any > use of the > information contained herein in any way (including, but not limited to, > total or partial > disclosure, reproduction, or dissemination) by persons other than the > intended > recipient(s) is prohibited. If you receive this e-mail in error, please > notify the sender by > phone or email immediately and delete it! > > > > |
Just had a discussion regarding new secondary indexing design with Jon Gray this Tuesday at HUG-12. He said he would write up a wiki to describe his thoughts regarding design. In the meaning while, we will have some resources to work on it starting from next week. So there will be something happening soon. Thanks J-D. Yes, HBase-3340 will be the right jira to keep track of. The new secondary indexing will be based on coprocessors. Further discussion will be at HBase-3340. Comments are welcome. Thanks, Mingjie On 02/25/2011 10:36 AM, Jean-Daniel Cryans wrote: > There's this jira: https://issues.apache.org/jira/browse/HBASE-3340 > > But currently I don't know of anyone actively working on the feature. > > J-D > > On Thu, Feb 24, 2011 at 11:56 PM, Zhou Shuaifeng > <[hidden email]> wrote: >> Hi, >> >> Is there any plan or roadmap of secondary index in hbase? >> >> >> >> Zhou Shuaifeng(Frank) >> >> >> >> >> This e-mail and its attachments contain confidential information from >> HUAWEI, which >> is intended only for the person or entity whose address is listed above. Any >> use of the >> information contained herein in any way (including, but not limited to, >> total or partial >> disclosure, reproduction, or dissemination) by persons other than the >> intended >> recipient(s) is prohibited. If you receive this e-mail in error, please >> notify the sender by >> phone or email immediately and delete it! >> >> >> >> > |
I've started my write-up. Hopefully will have it posted by Monday night. There's also some people at FB who may want to work on this.
There are a few different ways that secondary indexing can go so there might be an opportunity to work on a few different mechanisms if many people are interested in working on it. JG > -----Original Message----- > From: Mingjie Lai [mailto:[hidden email]] > Sent: Friday, February 25, 2011 11:34 AM > To: [hidden email] > Subject: Re: what's the roadmap of secondary index of hbase? > > > Just had a discussion regarding new secondary indexing design with Jon Gray > this Tuesday at HUG-12. He said he would write up a wiki to describe his > thoughts regarding design. In the meaning while, we will have some > resources to work on it starting from next week. So there will be something > happening soon. > > Thanks J-D. Yes, HBase-3340 will be the right jira to keep track of. The new > secondary indexing will be based on coprocessors. Further discussion will be > at HBase-3340. Comments are welcome. > > Thanks, > Mingjie > > > On 02/25/2011 10:36 AM, Jean-Daniel Cryans wrote: > > There's this jira: https://issues.apache.org/jira/browse/HBASE-3340 > > > > But currently I don't know of anyone actively working on the feature. > > > > J-D > > > > On Thu, Feb 24, 2011 at 11:56 PM, Zhou Shuaifeng > > <[hidden email]> wrote: > >> Hi, > >> > >> Is there any plan or roadmap of secondary index in hbase? > >> > >> > >> > >> Zhou Shuaifeng(Frank) > >> > >> > >> > >> > >> This e-mail and its attachments contain confidential information from > >> HUAWEI, which is intended only for the person or entity whose address > >> is listed above. Any use of the information contained herein in any > >> way (including, but not limited to, total or partial disclosure, > >> reproduction, or dissemination) by persons other than the intended > >> recipient(s) is prohibited. If you receive this e-mail in error, > >> please notify the sender by phone or email immediately and delete it! > >> > >> > >> > >> > > |
Hi Jon,
We have a team of three in our Nanjing office considering this problem and the guys you already know in Cupertino can get involved. If you'd like to influence what happens through write up or some shared volunteer effort, be advised we have something getting underway now. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) --- On Fri, 2/25/11, Jonathan Gray <[hidden email]> wrote: > From: Jonathan Gray <[hidden email]> > Subject: RE: what's the roadmap of secondary index of hbase? > To: "[hidden email]" <[hidden email]> > Date: Friday, February 25, 2011, 12:25 PM > I've started my write-up. > Hopefully will have it posted by Monday night. There's > also some people at FB who may want to work on this. > > There are a few different ways that secondary indexing can > go so there might be an opportunity to work on a few > different mechanisms if many people are interested in > working on it. > > JG > > > -----Original Message----- > > From: Mingjie Lai [mailto:[hidden email]] > > Sent: Friday, February 25, 2011 11:34 AM > > To: [hidden email] > > Subject: Re: what's the roadmap of secondary index of > hbase? > > > > > > Just had a discussion regarding new secondary indexing > design with Jon Gray > > this Tuesday at HUG-12. He said he would write up a > wiki to describe his > > thoughts regarding design. In the meaning while, we > will have some > > resources to work on it starting from next week. So > there will be something > > happening soon. > > > > Thanks J-D. Yes, HBase-3340 will be the right jira to > keep track of. The new > > secondary indexing will be based on coprocessors. > Further discussion will be > > at HBase-3340. Comments are welcome. > > > > Thanks, > > Mingjie > > > > > > On 02/25/2011 10:36 AM, Jean-Daniel Cryans wrote: > > > There's this jira: https://issues.apache.org/jira/browse/HBASE-3340 > > > > > > But currently I don't know of anyone actively > working on the feature. > > > > > > J-D > > > > > > On Thu, Feb 24, 2011 at 11:56 PM, Zhou Shuaifeng > > > <[hidden email]> > wrote: > > >> Hi, > > >> > > >> Is there any plan or roadmap of secondary > index in hbase? > > >> > > >> > > >> > > >> Zhou Shuaifeng(Frank) > > >> > > >> > > >> > > >> > > >> This e-mail and its attachments contain > confidential information from > > >> HUAWEI, which is intended only for the person > or entity whose address > > >> is listed above. Any use of the information > contained herein in any > > >> way (including, but not limited to, total or > partial disclosure, > > >> reproduction, or dissemination) by persons > other than the intended > > >> recipient(s) is prohibited. If you receive > this e-mail in error, > > >> please notify the sender by phone or email > immediately and delete it! > > >> > > >> > > >> > > >> > > > > |
Cool. Plans for a design phase that we can collaborate on?
> -----Original Message----- > From: Andrew Purtell [mailto:[hidden email]] > Sent: Friday, February 25, 2011 12:35 PM > To: [hidden email] > Subject: RE: what's the roadmap of secondary index of hbase? > > Hi Jon, > > We have a team of three in our Nanjing office considering this problem and > the guys you already know in Cupertino can get involved. If you'd like to > influence what happens through write up or some shared volunteer effort, > be advised we have something getting underway now. > > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. > - Piet Hein (via Tom White) > > > --- On Fri, 2/25/11, Jonathan Gray <[hidden email]> wrote: > > > From: Jonathan Gray <[hidden email]> > > Subject: RE: what's the roadmap of secondary index of hbase? > > To: "[hidden email]" <[hidden email]> > > Date: Friday, February 25, 2011, 12:25 PM I've started my write-up. > > Hopefully will have it posted by Monday night. There's also some > > people at FB who may want to work on this. > > > > There are a few different ways that secondary indexing can go so there > > might be an opportunity to work on a few different mechanisms if many > > people are interested in working on it. > > > > JG > > > > > -----Original Message----- > > > From: Mingjie Lai [mailto:[hidden email]] > > > Sent: Friday, February 25, 2011 11:34 AM > > > To: [hidden email] > > > Subject: Re: what's the roadmap of secondary index of > > hbase? > > > > > > > > > Just had a discussion regarding new secondary indexing > > design with Jon Gray > > > this Tuesday at HUG-12. He said he would write up a > > wiki to describe his > > > thoughts regarding design. In the meaning while, we > > will have some > > > resources to work on it starting from next week. So > > there will be something > > > happening soon. > > > > > > Thanks J-D. Yes, HBase-3340 will be the right jira to > > keep track of. The new > > > secondary indexing will be based on coprocessors. > > Further discussion will be > > > at HBase-3340. Comments are welcome. > > > > > > Thanks, > > > Mingjie > > > > > > > > > On 02/25/2011 10:36 AM, Jean-Daniel Cryans wrote: > > > > There's this jira: > > > > https://issues.apache.org/jira/browse/HBASE-3340 > > > > > > > > But currently I don't know of anyone actively > > working on the feature. > > > > > > > > J-D > > > > > > > > On Thu, Feb 24, 2011 at 11:56 PM, Zhou Shuaifeng > > > > <[hidden email]> > > wrote: > > > >> Hi, > > > >> > > > >> Is there any plan or roadmap of secondary > > index in hbase? > > > >> > > > >> > > > >> > > > >> Zhou Shuaifeng(Frank) > > > >> > > > >> > > > >> > > > >> > > > >> This e-mail and its attachments contain > > confidential information from > > > >> HUAWEI, which is intended only for the person > > or entity whose address > > > >> is listed above. Any use of the information > > contained herein in any > > > >> way (including, but not limited to, total or > > partial disclosure, > > > >> reproduction, or dissemination) by persons > > other than the intended > > > >> recipient(s) is prohibited. If you receive > > this e-mail in error, > > > >> please notify the sender by phone or email > > immediately and delete it! > > > >> > > > >> > > > >> > > > >> > > > > > > > > > |
On 2/25/11 12:43 PM, Jonathan Gray wrote:
> Cool. Plans for a design phase that we can collaborate on? > Hi Jon, I'm thinking that we could use a coprocessor that watches the Write-Ahead Log (using the WAL-edit operations https://issues.apache.org/jira/browse/HBASE-3257 "Coprocessors: Extend server side integration API to include HLog operations"). This coprocessor would write these edits, perhaps filtering or transforming them, and enqueing the results in a global queue. A separate process would be responsible for pulling operations off the queue and using HBase client operations to do the insert into a secondary index table appropriate for that operation. Perhaps we could use some of the work that the Lily people have done with HBase indexing (see http://www.lilyproject.org/lily/about/playground/hbaseindexes.html) in order to do the edit->hbase operation transformations and the secondary index table creation. -Eugene |
On Fri, Feb 25, 2011 at 1:47 PM, Eugene Koontz <[hidden email]> wrote:
> I'm thinking that we could use a coprocessor that watches the Write-Ahead > Log (using the WAL-edit operations > https://issues.apache.org/jira/browse/HBASE-3257 "Coprocessors: Extend > server side integration API to include HLog operations"). This coprocessor > would write these edits, perhaps filtering or transforming them, and > enqueing the results in a global queue. A separate process would be > responsible for pulling operations off the queue and using HBase client > operations to do the insert into a secondary index table appropriate for > that operation. > Perhaps we could use some of the work that the Lily people have done with > HBase indexing (see > http://www.lilyproject.org/lily/about/playground/hbaseindexes.html) in order > to do the edit->hbase operation transformations and the secondary index > table creation. This sounds good as first approach (including lily part). St.Ack |
The MegaStore paper,
http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf, in section 3.2.2, lists secondary indexing options MegaStore provides on top of BigTable. For example, MS allows specifying secondary index on protobuf cell content or duplicating data into secondary index so you have the data to hand to satisfy first query and only if the client wants more do you go dig in the primary table. It also talks about how secondary indices can be described using their schema which might be of use. Might be worth a gander. St.Ack On Fri, Feb 25, 2011 at 3:32 PM, Stack <[hidden email]> wrote: > On Fri, Feb 25, 2011 at 1:47 PM, Eugene Koontz <[hidden email]> wrote: >> I'm thinking that we could use a coprocessor that watches the Write-Ahead >> Log (using the WAL-edit operations >> https://issues.apache.org/jira/browse/HBASE-3257 "Coprocessors: Extend >> server side integration API to include HLog operations"). This coprocessor >> would write these edits, perhaps filtering or transforming them, and >> enqueing the results in a global queue. A separate process would be >> responsible for pulling operations off the queue and using HBase client >> operations to do the insert into a secondary index table appropriate for >> that operation. >> Perhaps we could use some of the work that the Lily people have done with >> HBase indexing (see >> http://www.lilyproject.org/lily/about/playground/hbaseindexes.html) in order >> to do the edit->hbase operation transformations and the secondary index >> table creation. > > This sounds good as first approach (including lily part). > > St.Ack > |
I've started a wiki page: http://wiki.apache.org/hadoop/Hbase/SecondaryIndexing
I gave a basic description of the idea I had and the open questions. Let's get all our thoughts in there. > -----Original Message----- > From: [hidden email] [mailto:[hidden email]] On Behalf Of Stack > Sent: Friday, February 25, 2011 4:07 PM > To: [hidden email] > Subject: Re: what's the roadmap of secondary index of hbase? > > The MegaStore paper, > http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf, in section > 3.2.2, lists secondary indexing options MegaStore provides on top of > BigTable. For example, MS allows specifying secondary index on protobuf > cell content or duplicating data into secondary index so you have the data to > hand to satisfy first query and only if the client wants more do you go dig in > the primary table. It also talks about how secondary indices can be described > using their schema which might be of use. Might be worth a gander. > > St.Ack > > On Fri, Feb 25, 2011 at 3:32 PM, Stack <[hidden email]> wrote: > > On Fri, Feb 25, 2011 at 1:47 PM, Eugene Koontz <[hidden email]> > wrote: > >> I'm thinking that we could use a coprocessor that watches the > >> Write-Ahead Log (using the WAL-edit operations > >> https://issues.apache.org/jira/browse/HBASE-3257 "Coprocessors: > >> Extend server side integration API to include HLog operations"). This > >> coprocessor would write these edits, perhaps filtering or > >> transforming them, and enqueing the results in a global queue. A > >> separate process would be responsible for pulling operations off the > >> queue and using HBase client operations to do the insert into a > >> secondary index table appropriate for that operation. > >> Perhaps we could use some of the work that the Lily people have > >> done with HBase indexing (see > >> http://www.lilyproject.org/lily/about/playground/hbaseindexes.html) > >> in order to do the edit->hbase operation transformations and the > >> secondary index table creation. > > > > This sounds good as first approach (including lily part). > > > > St.Ack > > |
Hi Jonathan,
Nice wiki. You mention that > 2. Generate a new, special kind of WALEdit for secondary table update is it possible to store this secondary-wal-edits as the contents on another hbase table(say indexTable)? The advantage is that then this can be implemented as a pure layer wrapping the regionserver (via co-processors and such). The disadvantage is that a non-remote log (for indexTable) might need to be updated. Is there a easy way to enhance hbase to co-locate regions on the same machine? thanks, dhruba On Mon, Feb 28, 2011 at 12:05 PM, Jonathan Gray <[hidden email]> wrote: > I've started a wiki page: > http://wiki.apache.org/hadoop/Hbase/SecondaryIndexing > > I gave a basic description of the idea I had and the open questions. > > Let's get all our thoughts in there. > > > -----Original Message----- > > From: [hidden email] [mailto:[hidden email]] On Behalf Of > Stack > > Sent: Friday, February 25, 2011 4:07 PM > > To: [hidden email] > > Subject: Re: what's the roadmap of secondary index of hbase? > > > > The MegaStore paper, > > http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf, in section > > 3.2.2, lists secondary indexing options MegaStore provides on top of > > BigTable. For example, MS allows specifying secondary index on protobuf > > cell content or duplicating data into secondary index so you have the > data to > > hand to satisfy first query and only if the client wants more do you go > dig in > > the primary table. It also talks about how secondary indices can be > described > > using their schema which might be of use. Might be worth a gander. > > > > St.Ack > > > > On Fri, Feb 25, 2011 at 3:32 PM, Stack <[hidden email]> wrote: > > > On Fri, Feb 25, 2011 at 1:47 PM, Eugene Koontz <[hidden email]> > > wrote: > > >> I'm thinking that we could use a coprocessor that watches the > > >> Write-Ahead Log (using the WAL-edit operations > > >> https://issues.apache.org/jira/browse/HBASE-3257 "Coprocessors: > > >> Extend server side integration API to include HLog operations"). This > > >> coprocessor would write these edits, perhaps filtering or > > >> transforming them, and enqueing the results in a global queue. A > > >> separate process would be responsible for pulling operations off the > > >> queue and using HBase client operations to do the insert into a > > >> secondary index table appropriate for that operation. > > >> Perhaps we could use some of the work that the Lily people have > > >> done with HBase indexing (see > > >> http://www.lilyproject.org/lily/about/playground/hbaseindexes.html) > > >> in order to do the edit->hbase operation transformations and the > > >> secondary index table creation. > > > > > > This sounds good as first approach (including lily part). > > > > > > St.Ack > > > > -- Connect to me at http://www.facebook.com/dhruba |
Hi,
What is the reason not to use JDs replication WLA tracking (using ZK) and add another "scope" like attribute to the coldefs to define the indexing? Just curious because when I read the questions on the Wiki it mentions issues that JD I think had to solve already, being "here did we left off?", "no aggressive reapplying edits" etc. Lars On Tue, Mar 1, 2011 at 6:24 AM, Dhruba Borthakur <[hidden email]> wrote: > Hi Jonathan, > > Nice wiki. You mention that >> 2. Generate a new, special kind of WALEdit for secondary table update > > is it possible to store this secondary-wal-edits as the contents on another > hbase table(say indexTable)? The advantage is that then this can be > implemented as a pure layer wrapping the regionserver (via co-processors and > such). The disadvantage is that a non-remote log (for indexTable) might need > to be updated. Is there a easy way to enhance hbase to co-locate regions on > the same machine? > > thanks, > dhruba > > On Mon, Feb 28, 2011 at 12:05 PM, Jonathan Gray <[hidden email]> wrote: > >> I've started a wiki page: >> http://wiki.apache.org/hadoop/Hbase/SecondaryIndexing >> >> I gave a basic description of the idea I had and the open questions. >> >> Let's get all our thoughts in there. >> >> > -----Original Message----- >> > From: [hidden email] [mailto:[hidden email]] On Behalf Of >> Stack >> > Sent: Friday, February 25, 2011 4:07 PM >> > To: [hidden email] >> > Subject: Re: what's the roadmap of secondary index of hbase? >> > >> > The MegaStore paper, >> > http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf, in section >> > 3.2.2, lists secondary indexing options MegaStore provides on top of >> > BigTable. For example, MS allows specifying secondary index on protobuf >> > cell content or duplicating data into secondary index so you have the >> data to >> > hand to satisfy first query and only if the client wants more do you go >> dig in >> > the primary table. It also talks about how secondary indices can be >> described >> > using their schema which might be of use. Might be worth a gander. >> > >> > St.Ack >> > >> > On Fri, Feb 25, 2011 at 3:32 PM, Stack <[hidden email]> wrote: >> > > On Fri, Feb 25, 2011 at 1:47 PM, Eugene Koontz <[hidden email]> >> > wrote: >> > >> I'm thinking that we could use a coprocessor that watches the >> > >> Write-Ahead Log (using the WAL-edit operations >> > >> https://issues.apache.org/jira/browse/HBASE-3257 "Coprocessors: >> > >> Extend server side integration API to include HLog operations"). This >> > >> coprocessor would write these edits, perhaps filtering or >> > >> transforming them, and enqueing the results in a global queue. A >> > >> separate process would be responsible for pulling operations off the >> > >> queue and using HBase client operations to do the insert into a >> > >> secondary index table appropriate for that operation. >> > >> Perhaps we could use some of the work that the Lily people have >> > >> done with HBase indexing (see >> > >> http://www.lilyproject.org/lily/about/playground/hbaseindexes.html) >> > >> in order to do the edit->hbase operation transformations and the >> > >> secondary index table creation. >> > > >> > > This sounds good as first approach (including lily part). >> > > >> > > St.Ack >> > > >> > > > > -- > Connect to me at http://www.facebook.com/dhruba > |
> [Lars]
> What is the reason not to use JDs replication WLA tracking (using ZK) > and add another "scope" like attribute to the coldefs to define the > indexing? This is similar to what I proposed in the original description on HBASE-3257 (http://mail-archives.apache.org/mod_mbox/hbase-issues/201011.mbox/%3C25844899.223551290369437343.JavaMail.jira@thor%3E) Regarding Dhruba's comment, using a table as a log+workqueue of secondary index updates is pretty much what Lily does: http://www.lilyproject.org/lily/about/playground/hbaserowlog.html Some extra support in the master to colocate secondary index regions with primary table regions is an interesting idea. The logic should be equally pluggable as that for (secondary) index key generators. Perhaps a balance/migration calculation with a pluggable term that can contribute some affinity given a list of regions already on the RS. - Andy |
>> Some extra support in the master to colocate secondary index regions with
primary table regions is an interesting idea. Google Megastore can embed index within the primary table. If we can mark index rows, this approach would be feasible. On Tue, Mar 1, 2011 at 9:14 AM, Andrew Purtell <[hidden email]> wrote: > > [Lars] > > What is the reason not to use JDs replication WLA tracking (using ZK) > > and add another "scope" like attribute to the coldefs to define the > > indexing? > > This is similar to what I proposed in the original description on > HBASE-3257 ( > http://mail-archives.apache.org/mod_mbox/hbase-issues/201011.mbox/%3C25844899.223551290369437343.JavaMail.jira@thor%3E > ) > > Regarding Dhruba's comment, using a table as a log+workqueue of secondary > index updates is pretty much what Lily does: > http://www.lilyproject.org/lily/about/playground/hbaserowlog.html > > Some extra support in the master to colocate secondary index regions with > primary table regions is an interesting idea. The logic should be equally > pluggable as that for (secondary) index key generators. Perhaps a > balance/migration calculation with a pluggable term that can contribute some > affinity given a list of regions already on the RS. > > - Andy > > > > > |
Right. Rather than another table, you use another column family in the primary table a la Megastore and Lily.
> -----Original Message----- > From: Ted Yu [mailto:[hidden email]] > Sent: Tuesday, March 01, 2011 9:42 AM > To: [hidden email]; [hidden email] > Subject: Re: what's the roadmap of secondary index of hbase? > > >> Some extra support in the master to colocate secondary index regions > >> with > primary table regions is an interesting idea. > Google Megastore can embed index within the primary table. > If we can mark index rows, this approach would be feasible. > > On Tue, Mar 1, 2011 at 9:14 AM, Andrew Purtell <[hidden email]> > wrote: > > > > [Lars] > > > What is the reason not to use JDs replication WLA tracking (using > > > ZK) and add another "scope" like attribute to the coldefs to define > > > the indexing? > > > > This is similar to what I proposed in the original description on > > HBASE-3257 ( > > http://mail-archives.apache.org/mod_mbox/hbase- > issues/201011.mbox/%3C2 > > 5844899.223551290369437343.JavaMail.jira@thor%3E > > ) > > > > Regarding Dhruba's comment, using a table as a log+workqueue of > > secondary index updates is pretty much what Lily does: > > http://www.lilyproject.org/lily/about/playground/hbaserowlog.html > > > > Some extra support in the master to colocate secondary index regions > > with primary table regions is an interesting idea. The logic should be > > equally pluggable as that for (secondary) index key generators. > > Perhaps a balance/migration calculation with a pluggable term that can > > contribute some affinity given a list of regions already on the RS. > > > > - Andy > > > > > > > > > > |
In reply to this post by Jonathan Gray
Have you thought of how the update of a secondary index would go?
For example, suppose currently for a row with key 1 the value A is indexed, so in the index there's a row with key "A-1". Then row 1 gets updated to value B. This means in the index you have to remove the entry "A-1" and insert a new entry "B-1". The problem I see here is that you have to know the previous value that was indexed for the row, thus A in this case. This information could maybe be put in the waledit, but that would assume that a read is done before the write so that the previous value is known. I think this will also require some consideration of how this will work in case of recovery. The old values could also be retrieved from the older versions of the cell, but since the update of secondary indexes would be done asynchronously, there's no guarantee that those will still be there. Another alternative, which we use for the link-index in Lily (but is rather expensive), is to keep the index in both directions, thus the real index containing "A-1" and a 'forward index' containing "1-A". An index update first reads the entries from the forward index, then removes them from the real index, then removes them from the forward index, then inserts the new entries in the forward index, and finally in the index. On Mon, Feb 28, 2011 at 9:05 PM, Jonathan Gray <[hidden email]> wrote: > I've started a wiki page: > http://wiki.apache.org/hadoop/Hbase/SecondaryIndexing > > I gave a basic description of the idea I had and the open questions. > > Let's get all our thoughts in there. > > > -----Original Message----- > > From: [hidden email] [mailto:[hidden email]] On Behalf Of > Stack > > Sent: Friday, February 25, 2011 4:07 PM > > To: [hidden email] > > Subject: Re: what's the roadmap of secondary index of hbase? > > > > The MegaStore paper, > > http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf, in section > > 3.2.2, lists secondary indexing options MegaStore provides on top of > > BigTable. For example, MS allows specifying secondary index on protobuf > > cell content or duplicating data into secondary index so you have the > data to > > hand to satisfy first query and only if the client wants more do you go > dig in > > the primary table. It also talks about how secondary indices can be > described > > using their schema which might be of use. Might be worth a gander. > > > > St.Ack > > > > On Fri, Feb 25, 2011 at 3:32 PM, Stack <[hidden email]> wrote: > > > On Fri, Feb 25, 2011 at 1:47 PM, Eugene Koontz <[hidden email]> > > wrote: > > >> I'm thinking that we could use a coprocessor that watches the > > >> Write-Ahead Log (using the WAL-edit operations > > >> https://issues.apache.org/jira/browse/HBASE-3257 "Coprocessors: > > >> Extend server side integration API to include HLog operations"). This > > >> coprocessor would write these edits, perhaps filtering or > > >> transforming them, and enqueing the results in a global queue. A > > >> separate process would be responsible for pulling operations off the > > >> queue and using HBase client operations to do the insert into a > > >> secondary index table appropriate for that operation. > > >> Perhaps we could use some of the work that the Lily people have > > >> done with HBase indexing (see > > >> http://www.lilyproject.org/lily/about/playground/hbaseindexes.html) > > >> in order to do the edit->hbase operation transformations and the > > >> secondary index table creation. > > > > > > This sounds good as first approach (including lily part). > > > > > > St.Ack > > > > -- Bruno Dumon Outerthought http://outerthought.org/ |
I was wondering if it wouldn't be simpler to start with the synchronous
version of secondary indexes. It's more complex at the time of the Put, but at least you don't have to worry about all the edge cases where things are getting out of sync (or are there still some?). Also seems like it's possible for people to build their own async indexes, while it's very difficult to do the sync version. I have a feeling that most people on the mailing list who bring up indexes are assuming that they'd be synchronous because that's how they are in relational databases. The problems I see for the sync version are the slow down you would encounter from two phase commit and the read-before-write required to delete the previous index row. But, many people may be perfectly happy to sacrifice performance for such valuable consistency. As for the read-before-write issue, maybe the API could optionally let the client specify the previous value if it knows it, which is often the case for applications that read a row, modify it, then write the whole thing back. Matt On Tue, Mar 1, 2011 at 4:11 PM, Bruno Dumon <[hidden email]> wrote: > Have you thought of how the update of a secondary index would go? > > For example, suppose currently for a row with key 1 the value A is indexed, > so in the index there's a row with key "A-1". Then row 1 gets updated to > value B. This means in the index you have to remove the entry "A-1" and > insert a new entry "B-1". > > The problem I see here is that you have to know the previous value that was > indexed for the row, thus A in this case. > > This information could maybe be put in the waledit, but that would assume > that a read is done before the write so that the previous value is known. I > think this will also require some consideration of how this will work in > case of recovery. > > The old values could also be retrieved from the older versions of the cell, > but since the update of secondary indexes would be done asynchronously, > there's no guarantee that those will still be there. > > Another alternative, which we use for the link-index in Lily (but is rather > expensive), is to keep the index in both directions, thus the real index > containing "A-1" and a 'forward index' containing "1-A". An index update > first reads the entries from the forward index, then removes them from the > real index, then removes them from the forward index, then inserts the new > entries in the forward index, and finally in the index. > > On Mon, Feb 28, 2011 at 9:05 PM, Jonathan Gray <[hidden email]> wrote: > > > I've started a wiki page: > > http://wiki.apache.org/hadoop/Hbase/SecondaryIndexing > > > > I gave a basic description of the idea I had and the open questions. > > > > Let's get all our thoughts in there. > > > > > -----Original Message----- > > > From: [hidden email] [mailto:[hidden email]] On Behalf Of > > Stack > > > Sent: Friday, February 25, 2011 4:07 PM > > > To: [hidden email] > > > Subject: Re: what's the roadmap of secondary index of hbase? > > > > > > The MegaStore paper, > > > http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf, in section > > > 3.2.2, lists secondary indexing options MegaStore provides on top of > > > BigTable. For example, MS allows specifying secondary index on > protobuf > > > cell content or duplicating data into secondary index so you have the > > data to > > > hand to satisfy first query and only if the client wants more do you go > > dig in > > > the primary table. It also talks about how secondary indices can be > > described > > > using their schema which might be of use. Might be worth a gander. > > > > > > St.Ack > > > > > > On Fri, Feb 25, 2011 at 3:32 PM, Stack <[hidden email]> wrote: > > > > On Fri, Feb 25, 2011 at 1:47 PM, Eugene Koontz <[hidden email] > > > > > wrote: > > > >> I'm thinking that we could use a coprocessor that watches the > > > >> Write-Ahead Log (using the WAL-edit operations > > > >> https://issues.apache.org/jira/browse/HBASE-3257 "Coprocessors: > > > >> Extend server side integration API to include HLog operations"). > This > > > >> coprocessor would write these edits, perhaps filtering or > > > >> transforming them, and enqueing the results in a global queue. A > > > >> separate process would be responsible for pulling operations off the > > > >> queue and using HBase client operations to do the insert into a > > > >> secondary index table appropriate for that operation. > > > >> Perhaps we could use some of the work that the Lily people have > > > >> done with HBase indexing (see > > > >> http://www.lilyproject.org/lily/about/playground/hbaseindexes.html) > > > >> in order to do the edit->hbase operation transformations and the > > > >> secondary index table creation. > > > > > > > > This sounds good as first approach (including lily part). > > > > > > > > St.Ack > > > > > > > > > > -- > Bruno Dumon > Outerthought > http://outerthought.org/ > |
> From: Matt Corgan <[hidden email]>
> > I was wondering if it wouldn't be simpler to start with the > synchronous version of secondary indexes. It's more complex at > the time of the Put, but at least you don't have to worry about > all the edge cases where things are getting out of sync (or are > there still some?). We did this before in 0.20 with the transactional index contrib. It's not really what we want I think*. 2PC produces poor performance relative to that for basic mutations. Holding RPC threads while doing updates on a secondary table leads to deadlock, as the worker pools are fixed. * -- I could be wrong about that but above are the reasons it was pulled. - Andy |
Free forum by Nabble | Edit this page |