Quantcast

what's the roadmap of secondary index of hbase?

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

what's the roadmap of secondary index of hbase?

zhoushuaifeng-2
Hi,

Is there any plan or roadmap of secondary index in hbase?

 

Zhou Shuaifeng(Frank)




This e-mail and its attachments contain confidential information from
HUAWEI, which
is intended only for the person or entity whose address is listed above. Any
use of the
information contained herein in any way (including, but not limited to,
total or partial
disclosure, reproduction, or dissemination) by persons other than the
intended
recipient(s) is prohibited. If you receive this e-mail in error, please
notify the sender by
phone or email immediately and delete it!

 

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: what's the roadmap of secondary index of hbase?

Jean-Daniel Cryans
There's this jira: https://issues.apache.org/jira/browse/HBASE-3340

But currently I don't know of anyone actively working on the feature.

J-D

On Thu, Feb 24, 2011 at 11:56 PM, Zhou Shuaifeng
<[hidden email]> wrote:

> Hi,
>
> Is there any plan or roadmap of secondary index in hbase?
>
>
>
> Zhou Shuaifeng(Frank)
>
>
>
>
> This e-mail and its attachments contain confidential information from
> HUAWEI, which
> is intended only for the person or entity whose address is listed above. Any
> use of the
> information contained herein in any way (including, but not limited to,
> total or partial
> disclosure, reproduction, or dissemination) by persons other than the
> intended
> recipient(s) is prohibited. If you receive this e-mail in error, please
> notify the sender by
> phone or email immediately and delete it!
>
>
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: what's the roadmap of secondary index of hbase?

Mingjie Lai

Just had a discussion regarding new secondary indexing design with Jon
Gray this Tuesday at HUG-12. He said he would write up a wiki to
describe his thoughts regarding design. In the meaning while, we will
have some resources to work on it starting from next week. So there will
be something happening soon.

Thanks J-D. Yes, HBase-3340 will be the right jira to keep track of. The
new secondary indexing will be based on coprocessors. Further discussion
will be at HBase-3340. Comments are welcome.

Thanks,
Mingjie


On 02/25/2011 10:36 AM, Jean-Daniel Cryans wrote:

> There's this jira: https://issues.apache.org/jira/browse/HBASE-3340
>
> But currently I don't know of anyone actively working on the feature.
>
> J-D
>
> On Thu, Feb 24, 2011 at 11:56 PM, Zhou Shuaifeng
> <[hidden email]>  wrote:
>> Hi,
>>
>> Is there any plan or roadmap of secondary index in hbase?
>>
>>
>>
>> Zhou Shuaifeng(Frank)
>>
>>
>>
>>
>> This e-mail and its attachments contain confidential information from
>> HUAWEI, which
>> is intended only for the person or entity whose address is listed above. Any
>> use of the
>> information contained herein in any way (including, but not limited to,
>> total or partial
>> disclosure, reproduction, or dissemination) by persons other than the
>> intended
>> recipient(s) is prohibited. If you receive this e-mail in error, please
>> notify the sender by
>> phone or email immediately and delete it!
>>
>>
>>
>>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: what's the roadmap of secondary index of hbase?

Jonathan Gray
I've started my write-up.  Hopefully will have it posted by Monday night.  There's also some people at FB who may want to work on this.

There are a few different ways that secondary indexing can go so there might be an opportunity to work on a few different mechanisms if many people are interested in working on it.

JG

> -----Original Message-----
> From: Mingjie Lai [mailto:[hidden email]]
> Sent: Friday, February 25, 2011 11:34 AM
> To: [hidden email]
> Subject: Re: what's the roadmap of secondary index of hbase?
>
>
> Just had a discussion regarding new secondary indexing design with Jon Gray
> this Tuesday at HUG-12. He said he would write up a wiki to describe his
> thoughts regarding design. In the meaning while, we will have some
> resources to work on it starting from next week. So there will be something
> happening soon.
>
> Thanks J-D. Yes, HBase-3340 will be the right jira to keep track of. The new
> secondary indexing will be based on coprocessors. Further discussion will be
> at HBase-3340. Comments are welcome.
>
> Thanks,
> Mingjie
>
>
> On 02/25/2011 10:36 AM, Jean-Daniel Cryans wrote:
> > There's this jira: https://issues.apache.org/jira/browse/HBASE-3340
> >
> > But currently I don't know of anyone actively working on the feature.
> >
> > J-D
> >
> > On Thu, Feb 24, 2011 at 11:56 PM, Zhou Shuaifeng
> > <[hidden email]>  wrote:
> >> Hi,
> >>
> >> Is there any plan or roadmap of secondary index in hbase?
> >>
> >>
> >>
> >> Zhou Shuaifeng(Frank)
> >>
> >>
> >>
> >>
> >> This e-mail and its attachments contain confidential information from
> >> HUAWEI, which is intended only for the person or entity whose address
> >> is listed above. Any use of the information contained herein in any
> >> way (including, but not limited to, total or partial disclosure,
> >> reproduction, or dissemination) by persons other than the intended
> >> recipient(s) is prohibited. If you receive this e-mail in error,
> >> please notify the sender by phone or email immediately and delete it!
> >>
> >>
> >>
> >>
> >
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: what's the roadmap of secondary index of hbase?

Andrew Purtell
Hi Jon,

We have a team of three in our Nanjing office considering this problem and the guys you already know in Cupertino can get involved. If you'd like to influence what happens through write up or some shared volunteer effort, be advised we have something getting underway now.

Best regards,

    - Andy

Problems worthy of attack prove their worth by hitting back.
  - Piet Hein (via Tom White)


--- On Fri, 2/25/11, Jonathan Gray <[hidden email]> wrote:

> From: Jonathan Gray <[hidden email]>
> Subject: RE: what's the roadmap of secondary index of hbase?
> To: "[hidden email]" <[hidden email]>
> Date: Friday, February 25, 2011, 12:25 PM
> I've started my write-up. 
> Hopefully will have it posted by Monday night.  There's
> also some people at FB who may want to work on this.
>
> There are a few different ways that secondary indexing can
> go so there might be an opportunity to work on a few
> different mechanisms if many people are interested in
> working on it.
>
> JG
>
> > -----Original Message-----
> > From: Mingjie Lai [mailto:[hidden email]]
> > Sent: Friday, February 25, 2011 11:34 AM
> > To: [hidden email]
> > Subject: Re: what's the roadmap of secondary index of
> hbase?
> >
> >
> > Just had a discussion regarding new secondary indexing
> design with Jon Gray
> > this Tuesday at HUG-12. He said he would write up a
> wiki to describe his
> > thoughts regarding design. In the meaning while, we
> will have some
> > resources to work on it starting from next week. So
> there will be something
> > happening soon.
> >
> > Thanks J-D. Yes, HBase-3340 will be the right jira to
> keep track of. The new
> > secondary indexing will be based on coprocessors.
> Further discussion will be
> > at HBase-3340. Comments are welcome.
> >
> > Thanks,
> > Mingjie
> >
> >
> > On 02/25/2011 10:36 AM, Jean-Daniel Cryans wrote:
> > > There's this jira: https://issues.apache.org/jira/browse/HBASE-3340
> > >
> > > But currently I don't know of anyone actively
> working on the feature.
> > >
> > > J-D
> > >
> > > On Thu, Feb 24, 2011 at 11:56 PM, Zhou Shuaifeng
> > > <[hidden email]
> wrote:
> > >> Hi,
> > >>
> > >> Is there any plan or roadmap of secondary
> index in hbase?
> > >>
> > >>
> > >>
> > >> Zhou Shuaifeng(Frank)
> > >>
> > >>
> > >>
> > >>
> > >> This e-mail and its attachments contain
> confidential information from
> > >> HUAWEI, which is intended only for the person
> or entity whose address
> > >> is listed above. Any use of the information
> contained herein in any
> > >> way (including, but not limited to, total or
> partial disclosure,
> > >> reproduction, or dissemination) by persons
> other than the intended
> > >> recipient(s) is prohibited. If you receive
> this e-mail in error,
> > >> please notify the sender by phone or email
> immediately and delete it!
> > >>
> > >>
> > >>
> > >>
> > >
>



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: what's the roadmap of secondary index of hbase?

Jonathan Gray
Cool.  Plans for a design phase that we can collaborate on?

> -----Original Message-----
> From: Andrew Purtell [mailto:[hidden email]]
> Sent: Friday, February 25, 2011 12:35 PM
> To: [hidden email]
> Subject: RE: what's the roadmap of secondary index of hbase?
>
> Hi Jon,
>
> We have a team of three in our Nanjing office considering this problem and
> the guys you already know in Cupertino can get involved. If you'd like to
> influence what happens through write up or some shared volunteer effort,
> be advised we have something getting underway now.
>
> Best regards,
>
>     - Andy
>
> Problems worthy of attack prove their worth by hitting back.
>   - Piet Hein (via Tom White)
>
>
> --- On Fri, 2/25/11, Jonathan Gray <[hidden email]> wrote:
>
> > From: Jonathan Gray <[hidden email]>
> > Subject: RE: what's the roadmap of secondary index of hbase?
> > To: "[hidden email]" <[hidden email]>
> > Date: Friday, February 25, 2011, 12:25 PM I've started my write-up.
> > Hopefully will have it posted by Monday night.  There's also some
> > people at FB who may want to work on this.
> >
> > There are a few different ways that secondary indexing can go so there
> > might be an opportunity to work on a few different mechanisms if many
> > people are interested in working on it.
> >
> > JG
> >
> > > -----Original Message-----
> > > From: Mingjie Lai [mailto:[hidden email]]
> > > Sent: Friday, February 25, 2011 11:34 AM
> > > To: [hidden email]
> > > Subject: Re: what's the roadmap of secondary index of
> > hbase?
> > >
> > >
> > > Just had a discussion regarding new secondary indexing
> > design with Jon Gray
> > > this Tuesday at HUG-12. He said he would write up a
> > wiki to describe his
> > > thoughts regarding design. In the meaning while, we
> > will have some
> > > resources to work on it starting from next week. So
> > there will be something
> > > happening soon.
> > >
> > > Thanks J-D. Yes, HBase-3340 will be the right jira to
> > keep track of. The new
> > > secondary indexing will be based on coprocessors.
> > Further discussion will be
> > > at HBase-3340. Comments are welcome.
> > >
> > > Thanks,
> > > Mingjie
> > >
> > >
> > > On 02/25/2011 10:36 AM, Jean-Daniel Cryans wrote:
> > > > There's this jira:
> > > > https://issues.apache.org/jira/browse/HBASE-3340
> > > >
> > > > But currently I don't know of anyone actively
> > working on the feature.
> > > >
> > > > J-D
> > > >
> > > > On Thu, Feb 24, 2011 at 11:56 PM, Zhou Shuaifeng
> > > > <[hidden email]>
> > wrote:
> > > >> Hi,
> > > >>
> > > >> Is there any plan or roadmap of secondary
> > index in hbase?
> > > >>
> > > >>
> > > >>
> > > >> Zhou Shuaifeng(Frank)
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> This e-mail and its attachments contain
> > confidential information from
> > > >> HUAWEI, which is intended only for the person
> > or entity whose address
> > > >> is listed above. Any use of the information
> > contained herein in any
> > > >> way (including, but not limited to, total or
> > partial disclosure,
> > > >> reproduction, or dissemination) by persons
> > other than the intended
> > > >> recipient(s) is prohibited. If you receive
> > this e-mail in error,
> > > >> please notify the sender by phone or email
> > immediately and delete it!
> > > >>
> > > >>
> > > >>
> > > >>
> > > >
> >
>
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: what's the roadmap of secondary index of hbase?

Eugene Koontz
On 2/25/11 12:43 PM, Jonathan Gray wrote:
> Cool.  Plans for a design phase that we can collaborate on?
>
Hi Jon,

I'm thinking that we could use a coprocessor that watches the
Write-Ahead Log (using the WAL-edit operations  
https://issues.apache.org/jira/browse/HBASE-3257 "Coprocessors: Extend
server side integration API to include HLog operations"). This
coprocessor would write these edits, perhaps filtering or transforming
them, and enqueing the results in a global queue. A separate process
would be responsible for pulling operations off the queue and using
HBase client operations to do the insert into a secondary index table
appropriate for that operation.
     Perhaps we could use some of the work that the Lily people have
done with HBase indexing (see
http://www.lilyproject.org/lily/about/playground/hbaseindexes.html) in
order to do the edit->hbase operation transformations and the secondary
index table creation.
     -Eugene

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: what's the roadmap of secondary index of hbase?

stack-3
On Fri, Feb 25, 2011 at 1:47 PM, Eugene Koontz <[hidden email]> wrote:

> I'm thinking that we could use a coprocessor that watches the Write-Ahead
> Log (using the WAL-edit operations
>  https://issues.apache.org/jira/browse/HBASE-3257 "Coprocessors: Extend
> server side integration API to include HLog operations"). This coprocessor
> would write these edits, perhaps filtering or transforming them, and
> enqueing the results in a global queue. A separate process would be
> responsible for pulling operations off the queue and using HBase client
> operations to do the insert into a secondary index table appropriate for
> that operation.
>    Perhaps we could use some of the work that the Lily people have done with
> HBase indexing (see
> http://www.lilyproject.org/lily/about/playground/hbaseindexes.html) in order
> to do the edit->hbase operation transformations and the secondary index
> table creation.

This sounds good as first approach (including lily part).

St.Ack
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: what's the roadmap of secondary index of hbase?

stack-3
The MegaStore paper,
http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf, in section
3.2.2, lists secondary indexing options MegaStore provides on top of
BigTable.  For example, MS allows specifying secondary index on
protobuf cell content or duplicating data into secondary index so you
have the data to hand to satisfy first query and only if the client
wants more do you go dig in the primary table.  It also talks about
how secondary indices can be described using their schema which might
be of use.  Might be worth a gander.

St.Ack

On Fri, Feb 25, 2011 at 3:32 PM, Stack <[hidden email]> wrote:

> On Fri, Feb 25, 2011 at 1:47 PM, Eugene Koontz <[hidden email]> wrote:
>> I'm thinking that we could use a coprocessor that watches the Write-Ahead
>> Log (using the WAL-edit operations
>>  https://issues.apache.org/jira/browse/HBASE-3257 "Coprocessors: Extend
>> server side integration API to include HLog operations"). This coprocessor
>> would write these edits, perhaps filtering or transforming them, and
>> enqueing the results in a global queue. A separate process would be
>> responsible for pulling operations off the queue and using HBase client
>> operations to do the insert into a secondary index table appropriate for
>> that operation.
>>    Perhaps we could use some of the work that the Lily people have done with
>> HBase indexing (see
>> http://www.lilyproject.org/lily/about/playground/hbaseindexes.html) in order
>> to do the edit->hbase operation transformations and the secondary index
>> table creation.
>
> This sounds good as first approach (including lily part).
>
> St.Ack
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: what's the roadmap of secondary index of hbase?

Jonathan Gray
I've started a wiki page:  http://wiki.apache.org/hadoop/Hbase/SecondaryIndexing

I gave a basic description of the idea I had and the open questions.

Let's get all our thoughts in there.

> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Stack
> Sent: Friday, February 25, 2011 4:07 PM
> To: [hidden email]
> Subject: Re: what's the roadmap of secondary index of hbase?
>
> The MegaStore paper,
> http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf, in section
> 3.2.2, lists secondary indexing options MegaStore provides on top of
> BigTable.  For example, MS allows specifying secondary index on protobuf
> cell content or duplicating data into secondary index so you have the data to
> hand to satisfy first query and only if the client wants more do you go dig in
> the primary table.  It also talks about how secondary indices can be described
> using their schema which might be of use.  Might be worth a gander.
>
> St.Ack
>
> On Fri, Feb 25, 2011 at 3:32 PM, Stack <[hidden email]> wrote:
> > On Fri, Feb 25, 2011 at 1:47 PM, Eugene Koontz <[hidden email]>
> wrote:
> >> I'm thinking that we could use a coprocessor that watches the
> >> Write-Ahead Log (using the WAL-edit operations
> >>  https://issues.apache.org/jira/browse/HBASE-3257 "Coprocessors:
> >> Extend server side integration API to include HLog operations"). This
> >> coprocessor would write these edits, perhaps filtering or
> >> transforming them, and enqueing the results in a global queue. A
> >> separate process would be responsible for pulling operations off the
> >> queue and using HBase client operations to do the insert into a
> >> secondary index table appropriate for that operation.
> >>    Perhaps we could use some of the work that the Lily people have
> >> done with HBase indexing (see
> >> http://www.lilyproject.org/lily/about/playground/hbaseindexes.html)
> >> in order to do the edit->hbase operation transformations and the
> >> secondary index table creation.
> >
> > This sounds good as first approach (including lily part).
> >
> > St.Ack
> >
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: what's the roadmap of secondary index of hbase?

Dhruba Borthakur
Hi Jonathan,

Nice wiki. You mention that
> 2. Generate a new, special kind of WALEdit for secondary table update

is it possible to store this secondary-wal-edits as the contents on another
hbase table(say indexTable)? The advantage is that then this can be
implemented as a pure layer wrapping the regionserver (via co-processors and
such). The disadvantage is that a non-remote log (for indexTable) might need
to be updated. Is there a easy way to enhance hbase to co-locate regions on
the same machine?

thanks,
dhruba

On Mon, Feb 28, 2011 at 12:05 PM, Jonathan Gray <[hidden email]> wrote:

> I've started a wiki page:
> http://wiki.apache.org/hadoop/Hbase/SecondaryIndexing
>
> I gave a basic description of the idea I had and the open questions.
>
> Let's get all our thoughts in there.
>
> > -----Original Message-----
> > From: [hidden email] [mailto:[hidden email]] On Behalf Of
> Stack
> > Sent: Friday, February 25, 2011 4:07 PM
> > To: [hidden email]
> > Subject: Re: what's the roadmap of secondary index of hbase?
> >
> > The MegaStore paper,
> > http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf, in section
> > 3.2.2, lists secondary indexing options MegaStore provides on top of
> > BigTable.  For example, MS allows specifying secondary index on protobuf
> > cell content or duplicating data into secondary index so you have the
> data to
> > hand to satisfy first query and only if the client wants more do you go
> dig in
> > the primary table.  It also talks about how secondary indices can be
> described
> > using their schema which might be of use.  Might be worth a gander.
> >
> > St.Ack
> >
> > On Fri, Feb 25, 2011 at 3:32 PM, Stack <[hidden email]> wrote:
> > > On Fri, Feb 25, 2011 at 1:47 PM, Eugene Koontz <[hidden email]>
> > wrote:
> > >> I'm thinking that we could use a coprocessor that watches the
> > >> Write-Ahead Log (using the WAL-edit operations
> > >>  https://issues.apache.org/jira/browse/HBASE-3257 "Coprocessors:
> > >> Extend server side integration API to include HLog operations"). This
> > >> coprocessor would write these edits, perhaps filtering or
> > >> transforming them, and enqueing the results in a global queue. A
> > >> separate process would be responsible for pulling operations off the
> > >> queue and using HBase client operations to do the insert into a
> > >> secondary index table appropriate for that operation.
> > >>    Perhaps we could use some of the work that the Lily people have
> > >> done with HBase indexing (see
> > >> http://www.lilyproject.org/lily/about/playground/hbaseindexes.html)
> > >> in order to do the edit->hbase operation transformations and the
> > >> secondary index table creation.
> > >
> > > This sounds good as first approach (including lily part).
> > >
> > > St.Ack
> > >
>



--
Connect to me at http://www.facebook.com/dhruba
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: what's the roadmap of secondary index of hbase?

Lars George-2
Hi,

What is the reason not to use JDs replication WLA tracking (using ZK)
and add another "scope" like attribute to the coldefs to define the
indexing? Just curious because when I read the questions on the Wiki
it mentions issues that JD I think had to solve already, being "here
did we left off?", "no aggressive reapplying edits" etc.

Lars

On Tue, Mar 1, 2011 at 6:24 AM, Dhruba Borthakur <[hidden email]> wrote:

> Hi Jonathan,
>
> Nice wiki. You mention that
>> 2. Generate a new, special kind of WALEdit for secondary table update
>
> is it possible to store this secondary-wal-edits as the contents on another
> hbase table(say indexTable)? The advantage is that then this can be
> implemented as a pure layer wrapping the regionserver (via co-processors and
> such). The disadvantage is that a non-remote log (for indexTable) might need
> to be updated. Is there a easy way to enhance hbase to co-locate regions on
> the same machine?
>
> thanks,
> dhruba
>
> On Mon, Feb 28, 2011 at 12:05 PM, Jonathan Gray <[hidden email]> wrote:
>
>> I've started a wiki page:
>> http://wiki.apache.org/hadoop/Hbase/SecondaryIndexing
>>
>> I gave a basic description of the idea I had and the open questions.
>>
>> Let's get all our thoughts in there.
>>
>> > -----Original Message-----
>> > From: [hidden email] [mailto:[hidden email]] On Behalf Of
>> Stack
>> > Sent: Friday, February 25, 2011 4:07 PM
>> > To: [hidden email]
>> > Subject: Re: what's the roadmap of secondary index of hbase?
>> >
>> > The MegaStore paper,
>> > http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf, in section
>> > 3.2.2, lists secondary indexing options MegaStore provides on top of
>> > BigTable.  For example, MS allows specifying secondary index on protobuf
>> > cell content or duplicating data into secondary index so you have the
>> data to
>> > hand to satisfy first query and only if the client wants more do you go
>> dig in
>> > the primary table.  It also talks about how secondary indices can be
>> described
>> > using their schema which might be of use.  Might be worth a gander.
>> >
>> > St.Ack
>> >
>> > On Fri, Feb 25, 2011 at 3:32 PM, Stack <[hidden email]> wrote:
>> > > On Fri, Feb 25, 2011 at 1:47 PM, Eugene Koontz <[hidden email]>
>> > wrote:
>> > >> I'm thinking that we could use a coprocessor that watches the
>> > >> Write-Ahead Log (using the WAL-edit operations
>> > >>  https://issues.apache.org/jira/browse/HBASE-3257 "Coprocessors:
>> > >> Extend server side integration API to include HLog operations"). This
>> > >> coprocessor would write these edits, perhaps filtering or
>> > >> transforming them, and enqueing the results in a global queue. A
>> > >> separate process would be responsible for pulling operations off the
>> > >> queue and using HBase client operations to do the insert into a
>> > >> secondary index table appropriate for that operation.
>> > >>    Perhaps we could use some of the work that the Lily people have
>> > >> done with HBase indexing (see
>> > >> http://www.lilyproject.org/lily/about/playground/hbaseindexes.html)
>> > >> in order to do the edit->hbase operation transformations and the
>> > >> secondary index table creation.
>> > >
>> > > This sounds good as first approach (including lily part).
>> > >
>> > > St.Ack
>> > >
>>
>
>
>
> --
> Connect to me at http://www.facebook.com/dhruba
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: what's the roadmap of secondary index of hbase?

Andrew Purtell
> [Lars]
> What is the reason not to use JDs replication WLA tracking (using ZK)
> and add another "scope" like attribute to the coldefs to define the
> indexing?

This is similar to what I proposed in the original description on HBASE-3257 (http://mail-archives.apache.org/mod_mbox/hbase-issues/201011.mbox/%3C25844899.223551290369437343.JavaMail.jira@thor%3E)

Regarding Dhruba's comment, using a table as a log+workqueue of secondary index updates is pretty much what Lily does: http://www.lilyproject.org/lily/about/playground/hbaserowlog.html

Some extra support in the master to colocate secondary index regions with primary table regions is an interesting idea. The logic should be equally pluggable as that for (secondary) index key generators. Perhaps a balance/migration calculation with a pluggable term that can contribute some affinity given a list of regions already on the RS.

    - Andy



     
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: what's the roadmap of secondary index of hbase?

Ted Yu-3
>> Some extra support in the master to colocate secondary index regions with
primary table regions is an interesting idea.
Google Megastore can embed index within the primary table.
If we can mark index rows, this approach would be feasible.

On Tue, Mar 1, 2011 at 9:14 AM, Andrew Purtell <[hidden email]> wrote:

> > [Lars]
> > What is the reason not to use JDs replication WLA tracking (using ZK)
> > and add another "scope" like attribute to the coldefs to define the
> > indexing?
>
> This is similar to what I proposed in the original description on
> HBASE-3257 (
> http://mail-archives.apache.org/mod_mbox/hbase-issues/201011.mbox/%3C25844899.223551290369437343.JavaMail.jira@thor%3E
> )
>
> Regarding Dhruba's comment, using a table as a log+workqueue of secondary
> index updates is pretty much what Lily does:
> http://www.lilyproject.org/lily/about/playground/hbaserowlog.html
>
> Some extra support in the master to colocate secondary index regions with
> primary table regions is an interesting idea. The logic should be equally
> pluggable as that for (secondary) index key generators. Perhaps a
> balance/migration calculation with a pluggable term that can contribute some
> affinity given a list of regions already on the RS.
>
>    - Andy
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: what's the roadmap of secondary index of hbase?

Jonathan Gray
Right.  Rather than another table, you use another column family in the primary table a la Megastore and Lily.

> -----Original Message-----
> From: Ted Yu [mailto:[hidden email]]
> Sent: Tuesday, March 01, 2011 9:42 AM
> To: [hidden email]; [hidden email]
> Subject: Re: what's the roadmap of secondary index of hbase?
>
> >> Some extra support in the master to colocate secondary index regions
> >> with
> primary table regions is an interesting idea.
> Google Megastore can embed index within the primary table.
> If we can mark index rows, this approach would be feasible.
>
> On Tue, Mar 1, 2011 at 9:14 AM, Andrew Purtell <[hidden email]>
> wrote:
>
> > > [Lars]
> > > What is the reason not to use JDs replication WLA tracking (using
> > > ZK) and add another "scope" like attribute to the coldefs to define
> > > the indexing?
> >
> > This is similar to what I proposed in the original description on
> > HBASE-3257 (
> > http://mail-archives.apache.org/mod_mbox/hbase-
> issues/201011.mbox/%3C2
> > 5844899.223551290369437343.JavaMail.jira@thor%3E
> > )
> >
> > Regarding Dhruba's comment, using a table as a log+workqueue of
> > secondary index updates is pretty much what Lily does:
> > http://www.lilyproject.org/lily/about/playground/hbaserowlog.html
> >
> > Some extra support in the master to colocate secondary index regions
> > with primary table regions is an interesting idea. The logic should be
> > equally pluggable as that for (secondary) index key generators.
> > Perhaps a balance/migration calculation with a pluggable term that can
> > contribute some affinity given a list of regions already on the RS.
> >
> >    - Andy
> >
> >
> >
> >
> >
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: what's the roadmap of secondary index of hbase?

Bruno Dumon
In reply to this post by Jonathan Gray
Have you thought of how the update of a secondary index would go?

For example, suppose currently for a row with key 1 the value A is indexed,
so in the index there's a row with key "A-1". Then row 1 gets updated to
value B. This means in the index you have to remove the entry "A-1" and
insert a new entry "B-1".

The problem I see here is that you have to know the previous value that was
indexed for the row, thus A in this case.

This information could maybe be put in the waledit, but that would assume
that a read is done before the write so that the previous value is known. I
think this will also require some consideration of how this will work in
case of recovery.

The old values could also be retrieved from the older versions of the cell,
but since the update of secondary indexes would be done asynchronously,
there's no guarantee that those will still be there.

Another alternative, which we use for the link-index in Lily (but is rather
expensive), is to keep the index in both directions, thus the real index
containing "A-1" and a 'forward index' containing "1-A". An index update
first reads the entries from the forward index, then removes them from the
real index, then removes them from the forward index, then inserts the new
entries in the forward index, and finally in the index.

On Mon, Feb 28, 2011 at 9:05 PM, Jonathan Gray <[hidden email]> wrote:

> I've started a wiki page:
> http://wiki.apache.org/hadoop/Hbase/SecondaryIndexing
>
> I gave a basic description of the idea I had and the open questions.
>
> Let's get all our thoughts in there.
>
> > -----Original Message-----
> > From: [hidden email] [mailto:[hidden email]] On Behalf Of
> Stack
> > Sent: Friday, February 25, 2011 4:07 PM
> > To: [hidden email]
> > Subject: Re: what's the roadmap of secondary index of hbase?
> >
> > The MegaStore paper,
> > http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf, in section
> > 3.2.2, lists secondary indexing options MegaStore provides on top of
> > BigTable.  For example, MS allows specifying secondary index on protobuf
> > cell content or duplicating data into secondary index so you have the
> data to
> > hand to satisfy first query and only if the client wants more do you go
> dig in
> > the primary table.  It also talks about how secondary indices can be
> described
> > using their schema which might be of use.  Might be worth a gander.
> >
> > St.Ack
> >
> > On Fri, Feb 25, 2011 at 3:32 PM, Stack <[hidden email]> wrote:
> > > On Fri, Feb 25, 2011 at 1:47 PM, Eugene Koontz <[hidden email]>
> > wrote:
> > >> I'm thinking that we could use a coprocessor that watches the
> > >> Write-Ahead Log (using the WAL-edit operations
> > >>  https://issues.apache.org/jira/browse/HBASE-3257 "Coprocessors:
> > >> Extend server side integration API to include HLog operations"). This
> > >> coprocessor would write these edits, perhaps filtering or
> > >> transforming them, and enqueing the results in a global queue. A
> > >> separate process would be responsible for pulling operations off the
> > >> queue and using HBase client operations to do the insert into a
> > >> secondary index table appropriate for that operation.
> > >>    Perhaps we could use some of the work that the Lily people have
> > >> done with HBase indexing (see
> > >> http://www.lilyproject.org/lily/about/playground/hbaseindexes.html)
> > >> in order to do the edit->hbase operation transformations and the
> > >> secondary index table creation.
> > >
> > > This sounds good as first approach (including lily part).
> > >
> > > St.Ack
> > >
>



--
Bruno Dumon
Outerthought
http://outerthought.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: what's the roadmap of secondary index of hbase?

Matt Corgan
I was wondering if it wouldn't be simpler to start with the synchronous
version of secondary indexes.  It's more complex at the time of the Put, but
at least you don't have to worry about all the edge cases where things are
getting out of sync (or are there still some?).

Also seems like it's possible for people to build their own async indexes,
while it's very difficult to do the sync version.  I have a feeling that
most people on the mailing list who bring up indexes are assuming that
they'd be synchronous because that's how they are in relational databases.

The problems I see for the sync version are the slow down you would
encounter from two phase commit and the read-before-write required to delete
the previous index row.  But, many people may be perfectly happy to
sacrifice performance for such valuable consistency.

As for the read-before-write issue, maybe the API could optionally let the
client specify the previous value if it knows it, which is often the case
for applications that read a row, modify it, then write the whole thing
back.

Matt

On Tue, Mar 1, 2011 at 4:11 PM, Bruno Dumon <[hidden email]> wrote:

> Have you thought of how the update of a secondary index would go?
>
> For example, suppose currently for a row with key 1 the value A is indexed,
> so in the index there's a row with key "A-1". Then row 1 gets updated to
> value B. This means in the index you have to remove the entry "A-1" and
> insert a new entry "B-1".
>
> The problem I see here is that you have to know the previous value that was
> indexed for the row, thus A in this case.
>
> This information could maybe be put in the waledit, but that would assume
> that a read is done before the write so that the previous value is known. I
> think this will also require some consideration of how this will work in
> case of recovery.
>
> The old values could also be retrieved from the older versions of the cell,
> but since the update of secondary indexes would be done asynchronously,
> there's no guarantee that those will still be there.
>
> Another alternative, which we use for the link-index in Lily (but is rather
> expensive), is to keep the index in both directions, thus the real index
> containing "A-1" and a 'forward index' containing "1-A". An index update
> first reads the entries from the forward index, then removes them from the
> real index, then removes them from the forward index, then inserts the new
> entries in the forward index, and finally in the index.
>
> On Mon, Feb 28, 2011 at 9:05 PM, Jonathan Gray <[hidden email]> wrote:
>
> > I've started a wiki page:
> > http://wiki.apache.org/hadoop/Hbase/SecondaryIndexing
> >
> > I gave a basic description of the idea I had and the open questions.
> >
> > Let's get all our thoughts in there.
> >
> > > -----Original Message-----
> > > From: [hidden email] [mailto:[hidden email]] On Behalf Of
> > Stack
> > > Sent: Friday, February 25, 2011 4:07 PM
> > > To: [hidden email]
> > > Subject: Re: what's the roadmap of secondary index of hbase?
> > >
> > > The MegaStore paper,
> > > http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf, in section
> > > 3.2.2, lists secondary indexing options MegaStore provides on top of
> > > BigTable.  For example, MS allows specifying secondary index on
> protobuf
> > > cell content or duplicating data into secondary index so you have the
> > data to
> > > hand to satisfy first query and only if the client wants more do you go
> > dig in
> > > the primary table.  It also talks about how secondary indices can be
> > described
> > > using their schema which might be of use.  Might be worth a gander.
> > >
> > > St.Ack
> > >
> > > On Fri, Feb 25, 2011 at 3:32 PM, Stack <[hidden email]> wrote:
> > > > On Fri, Feb 25, 2011 at 1:47 PM, Eugene Koontz <[hidden email]
> >
> > > wrote:
> > > >> I'm thinking that we could use a coprocessor that watches the
> > > >> Write-Ahead Log (using the WAL-edit operations
> > > >>  https://issues.apache.org/jira/browse/HBASE-3257 "Coprocessors:
> > > >> Extend server side integration API to include HLog operations").
> This
> > > >> coprocessor would write these edits, perhaps filtering or
> > > >> transforming them, and enqueing the results in a global queue. A
> > > >> separate process would be responsible for pulling operations off the
> > > >> queue and using HBase client operations to do the insert into a
> > > >> secondary index table appropriate for that operation.
> > > >>    Perhaps we could use some of the work that the Lily people have
> > > >> done with HBase indexing (see
> > > >> http://www.lilyproject.org/lily/about/playground/hbaseindexes.html)
> > > >> in order to do the edit->hbase operation transformations and the
> > > >> secondary index table creation.
> > > >
> > > > This sounds good as first approach (including lily part).
> > > >
> > > > St.Ack
> > > >
> >
>
>
>
> --
> Bruno Dumon
> Outerthought
> http://outerthought.org/
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: what's the roadmap of secondary index of hbase?

Andrew Purtell
> From: Matt Corgan <[hidden email]>
>
> I was wondering if it wouldn't be simpler to start with the
> synchronous version of secondary indexes.  It's more complex at
> the time of the Put, but at least you don't have to worry about
> all the edge cases where things are getting out of sync (or are
> there still some?).

We did this before in 0.20 with the transactional index contrib.

It's not really what we want I think*.

2PC produces poor performance relative to that for basic mutations.

Holding RPC threads while doing updates on a secondary table leads to deadlock, as the worker pools are fixed.

* -- I could be wrong about that but above are the reasons it was pulled.

    - Andy






Loading...