Proposal for HBASE-16415 (Replication in different namespace)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Proposal for HBASE-16415 (Replication in different namespace)

Jan Kunigk
Hi, with regards to the above JIRA I would like to make the following
contribution.
I am looking very much forward to feedback and comments.

ReplicationSourceWALReaderThread continuously follows WALEntries to be
replicated for a specified WAL via WAL.Reader's next() method and adds them
to WALEntryBatches

As far as I can see, those WALEntries are copies of the originally
persisted local WALs. In order to direct these Entries to TableNames,
different to the source, I propose to intercept the copied WALEntries on
the source cluster and probe if they belong to a TableName, which is to be
re-written.

If such a probe is successful, then the WALKey of any such WALEntry needs
to be changed accordingly. WALKey provides a getTableName() method, but
currently not a setTableName() method, which would simply have to be added
to change the private TableName member.

I propose to intercept the entries via a new method redirectEntry(), which
is invoked shortly before the entry is added to its WALEntryBatch and
immediately after the entry has been filtered by filterEntry() like so:

            Entry entry = entryStream.next();
            if (updateSerialReplPos(batch, entry)) {
              batch.lastWalPosition = entryStream.getPosition();
              break;
            }
            entry = filterEntry(entry);
            entry = redirectEntry(entry); // <--
            if (entry != null) {
              WALEdit edit = entry.getEdit();
              if (edit != null && !edit.isEmpty()) {
                long entrySize = getEntrySize(entry);
                batch.addEntry(entry);

redirectEntry() bases its decisions on a 'Map<TableName, TableName>
redirections', where the keys are the source table name and the values the
destination table name. The Map would be included in the
ReplicationPeerConfig, which can be obtained from within
ReplicationSourceWALReaderThread via the instance of
ReplicationSourceManager, which is in turn passed as an argument to both
available constructors.

When a TableName object from a WALKey from the WALEntryStream matches the
key of any of the entries in the redirections map, that WALKey's TableName
is replaced by the the value of that entry.

The rationale for intercepting on the sending side is that the setup and
peer management is performed on the source today already and there is no
mechanism I can see which would carry the redirection rules themselves
across.

Similarly to the way that the hbase shell allows to specify the tables and
column families to be replicated (set_peer_table_CFs), I propose a new
command (also on the sending side) 'set_peer_table_redirections', which
accepts a map of Strings, corresponding to the required final specification
of the redirections as TableNames:

set_peer_redirections['ns_source1:table_source1' : 'ns_dest1:table_dest1',
'ns_source2:table_source2' : 'ns_dest2:table_dest2', ...
'ns_sourcen:table_sourcen' : 'ns_destn:table_destn', ]

Thanks, best, J
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal for HBASE-16415 (Replication in different namespace)

Ted Yu-3
Mind putting the below proposal on HBASE-16415 ?

Thanks

On Thu, Jun 8, 2017 at 3:24 PM, Jan Kunigk <[hidden email]> wrote:

> Hi, with regards to the above JIRA I would like to make the following
> contribution.
> I am looking very much forward to feedback and comments.
>
> ReplicationSourceWALReaderThread continuously follows WALEntries to be
> replicated for a specified WAL via WAL.Reader's next() method and adds them
> to WALEntryBatches
>
> As far as I can see, those WALEntries are copies of the originally
> persisted local WALs. In order to direct these Entries to TableNames,
> different to the source, I propose to intercept the copied WALEntries on
> the source cluster and probe if they belong to a TableName, which is to be
> re-written.
>
> If such a probe is successful, then the WALKey of any such WALEntry needs
> to be changed accordingly. WALKey provides a getTableName() method, but
> currently not a setTableName() method, which would simply have to be added
> to change the private TableName member.
>
> I propose to intercept the entries via a new method redirectEntry(), which
> is invoked shortly before the entry is added to its WALEntryBatch and
> immediately after the entry has been filtered by filterEntry() like so:
>
>             Entry entry = entryStream.next();
>             if (updateSerialReplPos(batch, entry)) {
>               batch.lastWalPosition = entryStream.getPosition();
>               break;
>             }
>             entry = filterEntry(entry);
>             entry = redirectEntry(entry); // <--
>             if (entry != null) {
>               WALEdit edit = entry.getEdit();
>               if (edit != null && !edit.isEmpty()) {
>                 long entrySize = getEntrySize(entry);
>                 batch.addEntry(entry);
>
> redirectEntry() bases its decisions on a 'Map<TableName, TableName>
> redirections', where the keys are the source table name and the values the
> destination table name. The Map would be included in the
> ReplicationPeerConfig, which can be obtained from within
> ReplicationSourceWALReaderThread via the instance of
> ReplicationSourceManager, which is in turn passed as an argument to both
> available constructors.
>
> When a TableName object from a WALKey from the WALEntryStream matches the
> key of any of the entries in the redirections map, that WALKey's TableName
> is replaced by the the value of that entry.
>
> The rationale for intercepting on the sending side is that the setup and
> peer management is performed on the source today already and there is no
> mechanism I can see which would carry the redirection rules themselves
> across.
>
> Similarly to the way that the hbase shell allows to specify the tables and
> column families to be replicated (set_peer_table_CFs), I propose a new
> command (also on the sending side) 'set_peer_table_redirections', which
> accepts a map of Strings, corresponding to the required final specification
> of the redirections as TableNames:
>
> set_peer_redirections['ns_source1:table_source1' : 'ns_dest1:table_dest1',
> 'ns_source2:table_source2' : 'ns_dest2:table_dest2', ...
> 'ns_sourcen:table_sourcen' : 'ns_destn:table_destn', ]
>
> Thanks, best, J
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal for HBASE-16415 (Replication in different namespace)

Yu Li
FWIW, taking usage of NamespaceGroupingStrategy introduced in HBASE-14456
together with multiwal may make the design simpler

I think the work is valuable and HBASE-16415 is waiting for its owner, so
just go ahead to take it. More discussion could be done in HBASE-16415 as
Ted suggested.

Thanks.

Best Regards,
Yu

On 9 June 2017 at 06:49, Ted Yu <[hidden email]> wrote:

> Mind putting the below proposal on HBASE-16415 ?
>
> Thanks
>
> On Thu, Jun 8, 2017 at 3:24 PM, Jan Kunigk <[hidden email]> wrote:
>
> > Hi, with regards to the above JIRA I would like to make the following
> > contribution.
> > I am looking very much forward to feedback and comments.
> >
> > ReplicationSourceWALReaderThread continuously follows WALEntries to be
> > replicated for a specified WAL via WAL.Reader's next() method and adds
> them
> > to WALEntryBatches
> >
> > As far as I can see, those WALEntries are copies of the originally
> > persisted local WALs. In order to direct these Entries to TableNames,
> > different to the source, I propose to intercept the copied WALEntries on
> > the source cluster and probe if they belong to a TableName, which is to
> be
> > re-written.
> >
> > If such a probe is successful, then the WALKey of any such WALEntry needs
> > to be changed accordingly. WALKey provides a getTableName() method, but
> > currently not a setTableName() method, which would simply have to be
> added
> > to change the private TableName member.
> >
> > I propose to intercept the entries via a new method redirectEntry(),
> which
> > is invoked shortly before the entry is added to its WALEntryBatch and
> > immediately after the entry has been filtered by filterEntry() like so:
> >
> >             Entry entry = entryStream.next();
> >             if (updateSerialReplPos(batch, entry)) {
> >               batch.lastWalPosition = entryStream.getPosition();
> >               break;
> >             }
> >             entry = filterEntry(entry);
> >             entry = redirectEntry(entry); // <--
> >             if (entry != null) {
> >               WALEdit edit = entry.getEdit();
> >               if (edit != null && !edit.isEmpty()) {
> >                 long entrySize = getEntrySize(entry);
> >                 batch.addEntry(entry);
> >
> > redirectEntry() bases its decisions on a 'Map<TableName, TableName>
> > redirections', where the keys are the source table name and the values
> the
> > destination table name. The Map would be included in the
> > ReplicationPeerConfig, which can be obtained from within
> > ReplicationSourceWALReaderThread via the instance of
> > ReplicationSourceManager, which is in turn passed as an argument to both
> > available constructors.
> >
> > When a TableName object from a WALKey from the WALEntryStream matches the
> > key of any of the entries in the redirections map, that WALKey's
> TableName
> > is replaced by the the value of that entry.
> >
> > The rationale for intercepting on the sending side is that the setup and
> > peer management is performed on the source today already and there is no
> > mechanism I can see which would carry the redirection rules themselves
> > across.
> >
> > Similarly to the way that the hbase shell allows to specify the tables
> and
> > column families to be replicated (set_peer_table_CFs), I propose a new
> > command (also on the sending side) 'set_peer_table_redirections', which
> > accepts a map of Strings, corresponding to the required final
> specification
> > of the redirections as TableNames:
> >
> > set_peer_redirections['ns_source1:table_source1' :
> 'ns_dest1:table_dest1',
> > 'ns_source2:table_source2' : 'ns_dest2:table_dest2', ...
> > 'ns_sourcen:table_sourcen' : 'ns_destn:table_destn', ]
> >
> > Thanks, best, J
> >
>
Loading...