input split process

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

input split process

Rajeshkumar J
Hi,

   I want to custom input split my hbase data. can any one tell me what are
the values I have known during this split process like only rowkey values
or any others

Thanks
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: input split process

Ted Yu-3
If you look at TableInputFormatBase#getSplits(), you would see that the
following information can be retrieved:

Table
RegionLocator
Admin
StartEndKeys (region boundaries)

You can also take a look at calculateRebalancedSplits() to see how it
rebalances the InputSplit's.

FYI

On Tue, May 30, 2017 at 11:53 PM, Rajeshkumar J <[hidden email]
> wrote:

> Hi,
>
>    I want to custom input split my hbase data. can any one tell me what are
> the values I have known during this split process like only rowkey values
> or any others
>
> Thanks
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: input split process

Rajeshkumar J
So we will know only start and end rowkey values of a region. We can't know
the other rowkey values within that region.

On Wed, May 31, 2017 at 2:59 PM, Ted Yu <[hidden email]> wrote:

> If you look at TableInputFormatBase#getSplits(), you would see that the
> following information can be retrieved:
>
> Table
> RegionLocator
> Admin
> StartEndKeys (region boundaries)
>
> You can also take a look at calculateRebalancedSplits() to see how it
> rebalances the InputSplit's.
>
> FYI
>
> On Tue, May 30, 2017 at 11:53 PM, Rajeshkumar J <
> [hidden email]
> > wrote:
>
> > Hi,
> >
> >    I want to custom input split my hbase data. can any one tell me what
> are
> > the values I have known during this split process like only rowkey values
> > or any others
> >
> > Thanks
> >
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: input split process

William Temperley
Rajesh, have you looked specifying the region split algorithm at table
creation time?

Personally I use UniformSplit and make sure my keys are psuedo-random,
usually by reversing the bits in a Long value.

https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/util/RegionSplitter.UniformSplit.html



On 31 May 2017 at 14:05, Rajeshkumar J <[hidden email]> wrote:

> So we will know only start and end rowkey values of a region. We can't know
> the other rowkey values within that region.
>
> On Wed, May 31, 2017 at 2:59 PM, Ted Yu <[hidden email]> wrote:
>
> > If you look at TableInputFormatBase#getSplits(), you would see that the
> > following information can be retrieved:
> >
> > Table
> > RegionLocator
> > Admin
> > StartEndKeys (region boundaries)
> >
> > You can also take a look at calculateRebalancedSplits() to see how it
> > rebalances the InputSplit's.
> >
> > FYI
> >
> > On Tue, May 30, 2017 at 11:53 PM, Rajeshkumar J <
> > [hidden email]
> > > wrote:
> >
> > > Hi,
> > >
> > >    I want to custom input split my hbase data. can any one tell me what
> > are
> > > the values I have known during this split process like only rowkey
> values
> > > or any others
> > >
> > > Thanks
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: input split process

Ted Yu-3
In reply to this post by Rajeshkumar J
Rajesh:
Currently there is no (persisted) statistics on the distribution of rowkey
values within one region.

You can perform sampling to get approximation.

Cheers

On Wed, May 31, 2017 at 5:05 AM, Rajeshkumar J <[hidden email]>
wrote:

> So we will know only start and end rowkey values of a region. We can't know
> the other rowkey values within that region.
>
> On Wed, May 31, 2017 at 2:59 PM, Ted Yu <[hidden email]> wrote:
>
> > If you look at TableInputFormatBase#getSplits(), you would see that the
> > following information can be retrieved:
> >
> > Table
> > RegionLocator
> > Admin
> > StartEndKeys (region boundaries)
> >
> > You can also take a look at calculateRebalancedSplits() to see how it
> > rebalances the InputSplit's.
> >
> > FYI
> >
> > On Tue, May 30, 2017 at 11:53 PM, Rajeshkumar J <
> > [hidden email]
> > > wrote:
> >
> > > Hi,
> > >
> > >    I want to custom input split my hbase data. can any one tell me what
> > are
> > > the values I have known during this split process like only rowkey
> values
> > > or any others
> > >
> > > Thanks
> > >
> >
>
Loading...