Inconsistent rows exported/counted when looking at a set, unchanged past time frame.

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Inconsistent rows exported/counted when looking at a set, unchanged past time frame.

Andrew Kettmann
First the version details:

Running HBASE/Yarn/HDFS using Cloudera manager 5.12.1.
Hbase: Version 1.2.0-cdh5.8.0
HDFS/YARN: Hadoop 2.6.0-cdh5.8.0
Hbck and hdfs fsck return healthy

15 nodes, sized down recently from 30 (other service requirements reduced. Solr, etc)


The simplest example of the inconsistency is using rowcounter. If I run the same mapreduce job twice in a row, I get different counts:

hbase org.apache.hadoop.hbase.mapreduce.Driver rowcounter -Dmapreduce.map.speculative=false TABLENAME --starttime=1485907200000 --endtime=1486058400000

Looking at org.​apache.​hadoop.​hbase.​mapreduce.​RowCounter​$RowCounterMapper​$Counters:
Run 1: 4876683
Run 2: 4866351

Similarly with exports of the same date/time. Consecutive runs of the export get different results:
hbase org.apache.hadoop.hbase.mapreduce.Export \
-Dmapred.map.tasks.speculative.execution=false \
-Dmapred.reduce.tasks.speculative.execution=false \
TABLENAME \
HDFSPATH 1 1485907200000 1486058400000

From Map Input/output records:
Run 1: 4296778
Run 2: 4297307

None of the results show anything for spilled records, no failed maps. Sometimes the row count increases, sometimes it decreases. We aren’t using any row filter queries, we just want to export chunks of the data for a specific time range. This table is actively being read/written to, but I am asking about a date range in early 2017 in this case, so that should have no impact I would have thought. Another point is that the rowcount job and the export return ridiculously different numbers. There should be no older versions of rows involved as we are set to only keep the newest, and I can confirm that there are rows that are consistently missing from the exports. Table definition is below.

hbase(main):001:0> describe 'TABLENAME'
Table TABLENAME is ENABLED
TABLENAME
COLUMN FAMILIES DESCRIPTION
{NAME => 'text', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY', VERSIONS => '1', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLO
CKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
1 row(s) in 0.2800 seconds

Any advice/suggestions would be greatly appreciated, are some of my assumptions wrong regarding import/export and that it should be consistent given consistent date/times?


Andrew Kettmann
Platform Services Group

Reply | Threaded
Open this post in threaded view
|

RE: Inconsistent rows exported/counted when looking at a set, unchanged past time frame.

Andrew Kettmann
A simpler question would be this:

Given:


  *   a set timeframe in the past (2-3 days roughly a year ago)
  *   we are NOT removing records from the table at all
  *   We ARE inserting into this table actively

Should I expect two consecutive runs of the rowcounter mapreduce job to return an identical number?


Andrew Kettmann
Consultant, Platform Services Group

From: Andrew Kettmann
Sent: Thursday, February 08, 2018 11:35 AM
To: [hidden email]
Subject: Inconsistent rows exported/counted when looking at a set, unchanged past time frame.

First the version details:

Running HBASE/Yarn/HDFS using Cloudera manager 5.12.1.
Hbase: Version 1.2.0-cdh5.8.0
HDFS/YARN: Hadoop 2.6.0-cdh5.8.0
Hbck and hdfs fsck return healthy

15 nodes, sized down recently from 30 (other service requirements reduced. Solr, etc)


The simplest example of the inconsistency is using rowcounter. If I run the same mapreduce job twice in a row, I get different counts:

hbase org.apache.hadoop.hbase.mapreduce.Driver rowcounter -Dmapreduce.map.speculative=false TABLENAME --starttime=1485907200000 --endtime=1486058400000

Looking at org.​apache.​hadoop.​hbase.​mapreduce.​RowCounter​$RowCounterMapper​$Counters:
Run 1: 4876683
Run 2: 4866351

Similarly with exports of the same date/time. Consecutive runs of the export get different results:
hbase org.apache.hadoop.hbase.mapreduce.Export \
-Dmapred.map.tasks.speculative.execution=false \
-Dmapred.reduce.tasks.speculative.execution=false \
TABLENAME \
HDFSPATH 1 1485907200000 1486058400000

From Map Input/output records:
Run 1: 4296778
Run 2: 4297307

None of the results show anything for spilled records, no failed maps. Sometimes the row count increases, sometimes it decreases. We aren’t using any row filter queries, we just want to export chunks of the data for a specific time range. This table is actively being read/written to, but I am asking about a date range in early 2017 in this case, so that should have no impact I would have thought. Another point is that the rowcount job and the export return ridiculously different numbers. There should be no older versions of rows involved as we are set to only keep the newest, and I can confirm that there are rows that are consistently missing from the exports. Table definition is below.

hbase(main):001:0> describe 'TABLENAME'
Table TABLENAME is ENABLED
TABLENAME
COLUMN FAMILIES DESCRIPTION
{NAME => 'text', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY', VERSIONS => '1', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLO
CKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
1 row(s) in 0.2800 seconds

Any advice/suggestions would be greatly appreciated, are some of my assumptions wrong regarding import/export and that it should be consistent given consistent date/times?


Andrew Kettmann
Platform Services Group

Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent rows exported/counted when looking at a set, unchanged past time frame.

Josh Elser-2
Hi Andrew,

Yes. The answer is, of course, that you should see consistent results
from HBase if there are no mutations in flight to that table. Whether
you're reading "current" or "back-in-time", as long as you're not
dealing with raw scans (where compactions may persist delete
tombstones), this should hold just the same.

Are you modifying older cells with newer data when you insert data?
Remember that MAX_VERSIONS for a table defaults to 1. Consider the
following:

* Timestamps are of the form "tX", and t1 < t2 < t3 < ..
* You are querying from the time range: [t1, t5].
* You have a cell for "row1" with at t3 with value "foo".
* RowCounter over [t1, t5] would return "1"
* Your ingest writes a new cell for "row1" of "bar" at t6.
* RowCounter over [t1, t5] would return "0" normally, or "1" is you use
RAW scans ***
* A compaction would run over the region containing "row1"
* RowCounter over [t1, t5] would return "0" (RAW or normal)

It's also possible that you're hitting some sort of bug around missing
records at query time. I'm not sure what the CDH versions you're using
line up to, but there have certainly been issues in the past around
query-time data loss (e.g. scans on RegionServers stop prematurely
before all of the data is read).

Good luck!

*** Going off of memory here. I think this is how it works, but you
should be able to test easily ;)

On 2/9/18 5:30 PM, Andrew Kettmann wrote:

> A simpler question would be this:
>
> Given:
>
>
>    *   a set timeframe in the past (2-3 days roughly a year ago)
>    *   we are NOT removing records from the table at all
>    *   We ARE inserting into this table actively
>
> Should I expect two consecutive runs of the rowcounter mapreduce job to return an identical number?
>
>
> Andrew Kettmann
> Consultant, Platform Services Group
>
> From: Andrew Kettmann
> Sent: Thursday, February 08, 2018 11:35 AM
> To: [hidden email]
> Subject: Inconsistent rows exported/counted when looking at a set, unchanged past time frame.
>
> First the version details:
>
> Running HBASE/Yarn/HDFS using Cloudera manager 5.12.1.
> Hbase: Version 1.2.0-cdh5.8.0
> HDFS/YARN: Hadoop 2.6.0-cdh5.8.0
> Hbck and hdfs fsck return healthy
>
> 15 nodes, sized down recently from 30 (other service requirements reduced. Solr, etc)
>
>
> The simplest example of the inconsistency is using rowcounter. If I run the same mapreduce job twice in a row, I get different counts:
>
> hbase org.apache.hadoop.hbase.mapreduce.Driver rowcounter -Dmapreduce.map.speculative=false TABLENAME --starttime=1485907200000 --endtime=1486058400000
>
> Looking at org.​apache.​hadoop.​hbase.​mapreduce.​RowCounter​$RowCounterMapper​$Counters:
> Run 1: 4876683
> Run 2: 4866351
>
> Similarly with exports of the same date/time. Consecutive runs of the export get different results:
> hbase org.apache.hadoop.hbase.mapreduce.Export \
> -Dmapred.map.tasks.speculative.execution=false \
> -Dmapred.reduce.tasks.speculative.execution=false \
> TABLENAME \
> HDFSPATH 1 1485907200000 1486058400000
>
>  From Map Input/output records:
> Run 1: 4296778
> Run 2: 4297307
>
> None of the results show anything for spilled records, no failed maps. Sometimes the row count increases, sometimes it decreases. We aren’t using any row filter queries, we just want to export chunks of the data for a specific time range. This table is actively being read/written to, but I am asking about a date range in early 2017 in this case, so that should have no impact I would have thought. Another point is that the rowcount job and the export return ridiculously different numbers. There should be no older versions of rows involved as we are set to only keep the newest, and I can confirm that there are rows that are consistently missing from the exports. Table definition is below.
>
> hbase(main):001:0> describe 'TABLENAME'
> Table TABLENAME is ENABLED
> TABLENAME
> COLUMN FAMILIES DESCRIPTION
> {NAME => 'text', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY', VERSIONS => '1', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLO
> CKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
> 1 row(s) in 0.2800 seconds
>
> Any advice/suggestions would be greatly appreciated, are some of my assumptions wrong regarding import/export and that it should be consistent given consistent date/times?
>
>
> Andrew Kettmann
> Platform Services Group
>
Reply | Threaded
Open this post in threaded view
|

RE: Inconsistent rows exported/counted when looking at a set, unchanged past time frame.

Andrew Kettmann
Took a dump of the involved table, reimported to the same cluster under a different name. This is a separate table now that is not being modified at all. Two consecutive/concurrent counts were different:

hbase org.apache.hadoop.hbase.mapreduce.Driver rowcounter -Dmapreduce.map.speculative=false ImportedTableName --starttime=1485907200000 --endtime=1486058400000

Count#1 3508052
Count#2 3584553

Do you happen to have more information regarding some of the query-time data loss issues you were mentioning?

My versioning SHOULD map roughly to version 1.2.0 for HBASE.

Hbase: Version 1.2.0-cdh5.8.0
HDFS/YARN: Hadoop 2.6.0-cdh5.8.0

Andrew Kettmann
Consultant, Platform Services Group

-----Original Message-----
From: Josh Elser [mailto:[hidden email]]
Sent: Monday, February 12, 2018 11:59 AM
To: [hidden email]
Subject: Re: Inconsistent rows exported/counted when looking at a set, unchanged past time frame.

Hi Andrew,

Yes. The answer is, of course, that you should see consistent results from HBase if there are no mutations in flight to that table. Whether you're reading "current" or "back-in-time", as long as you're not dealing with raw scans (where compactions may persist delete tombstones), this should hold just the same.

Are you modifying older cells with newer data when you insert data?
Remember that MAX_VERSIONS for a table defaults to 1. Consider the
following:

* Timestamps are of the form "tX", and t1 < t2 < t3 < ..
* You are querying from the time range: [t1, t5].
* You have a cell for "row1" with at t3 with value "foo".
* RowCounter over [t1, t5] would return "1"
* Your ingest writes a new cell for "row1" of "bar" at t6.
* RowCounter over [t1, t5] would return "0" normally, or "1" is you use RAW scans ***
* A compaction would run over the region containing "row1"
* RowCounter over [t1, t5] would return "0" (RAW or normal)

It's also possible that you're hitting some sort of bug around missing records at query time. I'm not sure what the CDH versions you're using line up to, but there have certainly been issues in the past around query-time data loss (e.g. scans on RegionServers stop prematurely before all of the data is read).

Good luck!

*** Going off of memory here. I think this is how it works, but you should be able to test easily ;)

On 2/9/18 5:30 PM, Andrew Kettmann wrote:

> A simpler question would be this:
>
> Given:
>
>
>    *   a set timeframe in the past (2-3 days roughly a year ago)
>    *   we are NOT removing records from the table at all
>    *   We ARE inserting into this table actively
>
> Should I expect two consecutive runs of the rowcounter mapreduce job to return an identical number?
>
>
> Andrew Kettmann
> Consultant, Platform Services Group
>
> From: Andrew Kettmann
> Sent: Thursday, February 08, 2018 11:35 AM
> To: [hidden email]
> Subject: Inconsistent rows exported/counted when looking at a set, unchanged past time frame.
>
> First the version details:
>
> Running HBASE/Yarn/HDFS using Cloudera manager 5.12.1.
> Hbase: Version 1.2.0-cdh5.8.0
> HDFS/YARN: Hadoop 2.6.0-cdh5.8.0
> Hbck and hdfs fsck return healthy
>
> 15 nodes, sized down recently from 30 (other service requirements
> reduced. Solr, etc)
>
>
> The simplest example of the inconsistency is using rowcounter. If I run the same mapreduce job twice in a row, I get different counts:
>
> hbase org.apache.hadoop.hbase.mapreduce.Driver rowcounter
> -Dmapreduce.map.speculative=false TABLENAME --starttime=1485907200000
> --endtime=1486058400000
>
> Looking at org.​apache.​hadoop.​hbase.​mapreduce.​RowCounter​$RowCounterMapper​$Counters:
> Run 1: 4876683
> Run 2: 4866351
>
> Similarly with exports of the same date/time. Consecutive runs of the export get different results:
> hbase org.apache.hadoop.hbase.mapreduce.Export \
> -Dmapred.map.tasks.speculative.execution=false \
> -Dmapred.reduce.tasks.speculative.execution=false \ TABLENAME \
> HDFSPATH 1 1485907200000 1486058400000
>
>  From Map Input/output records:
> Run 1: 4296778
> Run 2: 4297307
>
> None of the results show anything for spilled records, no failed maps. Sometimes the row count increases, sometimes it decreases. We aren’t using any row filter queries, we just want to export chunks of the data for a specific time range. This table is actively being read/written to, but I am asking about a date range in early 2017 in this case, so that should have no impact I would have thought. Another point is that the rowcount job and the export return ridiculously different numbers. There should be no older versions of rows involved as we are set to only keep the newest, and I can confirm that there are rows that are consistently missing from the exports. Table definition is below.
>
> hbase(main):001:0> describe 'TABLENAME'
> Table TABLENAME is ENABLED
> TABLENAME
> COLUMN FAMILIES DESCRIPTION
> {NAME => 'text', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW',
> REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY', VERSIONS => '1',
> MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE',
> BLO CKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
> 1 row(s) in 0.2800 seconds
>
> Any advice/suggestions would be greatly appreciated, are some of my assumptions wrong regarding import/export and that it should be consistent given consistent date/times?
>
>
> Andrew Kettmann
> Platform Services Group
>
Reply | Threaded
Open this post in threaded view
|

RE: Inconsistent rows exported/counted when looking at a set, unchanged past time frame.

Andrew Kettmann
In reply to this post by Josh Elser-2
Josh,

We upgraded from CDH 5.8.0 -> 5.8.5 seems to have fixed the issue. 3 Rowcounts in a row that were not consistent before on a static table are now consistent. We are doing some further testing but it looks like you called it with:

'scans on RegionServers stop prematurely before all of the data is read'

Thanks for the pointer in that direction, I was bashing my face against this for two weeks trying to figure out this inconsistency. I appreciate the clue!

Andrew Kettmann
Consultant, Platform Services Group

-----Original Message-----
From: Josh Elser [mailto:[hidden email]]
Sent: Monday, February 12, 2018 11:59 AM
To: [hidden email]
Subject: Re: Inconsistent rows exported/counted when looking at a set, unchanged past time frame.

Hi Andrew,

Yes. The answer is, of course, that you should see consistent results from HBase if there are no mutations in flight to that table. Whether you're reading "current" or "back-in-time", as long as you're not dealing with raw scans (where compactions may persist delete tombstones), this should hold just the same.

Are you modifying older cells with newer data when you insert data?
Remember that MAX_VERSIONS for a table defaults to 1. Consider the
following:

* Timestamps are of the form "tX", and t1 < t2 < t3 < ..
* You are querying from the time range: [t1, t5].
* You have a cell for "row1" with at t3 with value "foo".
* RowCounter over [t1, t5] would return "1"
* Your ingest writes a new cell for "row1" of "bar" at t6.
* RowCounter over [t1, t5] would return "0" normally, or "1" is you use RAW scans ***
* A compaction would run over the region containing "row1"
* RowCounter over [t1, t5] would return "0" (RAW or normal)

It's also possible that you're hitting some sort of bug around missing records at query time. I'm not sure what the CDH versions you're using line up to, but there have certainly been issues in the past around query-time data loss (e.g. scans on RegionServers stop prematurely before all of the data is read).

Good luck!

*** Going off of memory here. I think this is how it works, but you should be able to test easily ;)

On 2/9/18 5:30 PM, Andrew Kettmann wrote:

> A simpler question would be this:
>
> Given:
>
>
>    *   a set timeframe in the past (2-3 days roughly a year ago)
>    *   we are NOT removing records from the table at all
>    *   We ARE inserting into this table actively
>
> Should I expect two consecutive runs of the rowcounter mapreduce job to return an identical number?
>
>
> Andrew Kettmann
> Consultant, Platform Services Group
>
> From: Andrew Kettmann
> Sent: Thursday, February 08, 2018 11:35 AM
> To: [hidden email]
> Subject: Inconsistent rows exported/counted when looking at a set, unchanged past time frame.
>
> First the version details:
>
> Running HBASE/Yarn/HDFS using Cloudera manager 5.12.1.
> Hbase: Version 1.2.0-cdh5.8.0
> HDFS/YARN: Hadoop 2.6.0-cdh5.8.0
> Hbck and hdfs fsck return healthy
>
> 15 nodes, sized down recently from 30 (other service requirements
> reduced. Solr, etc)
>
>
> The simplest example of the inconsistency is using rowcounter. If I run the same mapreduce job twice in a row, I get different counts:
>
> hbase org.apache.hadoop.hbase.mapreduce.Driver rowcounter
> -Dmapreduce.map.speculative=false TABLENAME --starttime=1485907200000
> --endtime=1486058400000
>
> Looking at org.​apache.​hadoop.​hbase.​mapreduce.​RowCounter​$RowCounterMapper​$Counters:
> Run 1: 4876683
> Run 2: 4866351
>
> Similarly with exports of the same date/time. Consecutive runs of the export get different results:
> hbase org.apache.hadoop.hbase.mapreduce.Export \
> -Dmapred.map.tasks.speculative.execution=false \
> -Dmapred.reduce.tasks.speculative.execution=false \ TABLENAME \
> HDFSPATH 1 1485907200000 1486058400000
>
>  From Map Input/output records:
> Run 1: 4296778
> Run 2: 4297307
>
> None of the results show anything for spilled records, no failed maps. Sometimes the row count increases, sometimes it decreases. We aren’t using any row filter queries, we just want to export chunks of the data for a specific time range. This table is actively being read/written to, but I am asking about a date range in early 2017 in this case, so that should have no impact I would have thought. Another point is that the rowcount job and the export return ridiculously different numbers. There should be no older versions of rows involved as we are set to only keep the newest, and I can confirm that there are rows that are consistently missing from the exports. Table definition is below.
>
> hbase(main):001:0> describe 'TABLENAME'
> Table TABLENAME is ENABLED
> TABLENAME
> COLUMN FAMILIES DESCRIPTION
> {NAME => 'text', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW',
> REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY', VERSIONS => '1',
> MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE',
> BLO CKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
> 1 row(s) in 0.2800 seconds
>
> Any advice/suggestions would be greatly appreciated, are some of my assumptions wrong regarding import/export and that it should be consistent given consistent date/times?
>
>
> Andrew Kettmann
> Platform Services Group
>
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent rows exported/counted when looking at a set, unchanged past time frame.

Ted Yu-3
If you look at
https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_fixed_in_58.html#fixed_issues585
, you would see the following:

HBASE-15378 - Scanner cannot handle heartbeat message with no results

which fixed what you observed in previous release.

FYI

On Tue, Feb 20, 2018 at 9:07 PM, Andrew Kettmann <
[hidden email]> wrote:

> Josh,
>
> We upgraded from CDH 5.8.0 -> 5.8.5 seems to have fixed the issue. 3
> Rowcounts in a row that were not consistent before on a static table are
> now consistent. We are doing some further testing but it looks like you
> called it with:
>
> 'scans on RegionServers stop prematurely before all of the data is read'
>
> Thanks for the pointer in that direction, I was bashing my face against
> this for two weeks trying to figure out this inconsistency. I appreciate
> the clue!
>
> Andrew Kettmann
> Consultant, Platform Services Group
>
> -----Original Message-----
> From: Josh Elser [mailto:[hidden email]]
> Sent: Monday, February 12, 2018 11:59 AM
> To: [hidden email]
> Subject: Re: Inconsistent rows exported/counted when looking at a set,
> unchanged past time frame.
>
> Hi Andrew,
>
> Yes. The answer is, of course, that you should see consistent results from
> HBase if there are no mutations in flight to that table. Whether you're
> reading "current" or "back-in-time", as long as you're not dealing with raw
> scans (where compactions may persist delete tombstones), this should hold
> just the same.
>
> Are you modifying older cells with newer data when you insert data?
> Remember that MAX_VERSIONS for a table defaults to 1. Consider the
> following:
>
> * Timestamps are of the form "tX", and t1 < t2 < t3 < ..
> * You are querying from the time range: [t1, t5].
> * You have a cell for "row1" with at t3 with value "foo".
> * RowCounter over [t1, t5] would return "1"
> * Your ingest writes a new cell for "row1" of "bar" at t6.
> * RowCounter over [t1, t5] would return "0" normally, or "1" is you use
> RAW scans ***
> * A compaction would run over the region containing "row1"
> * RowCounter over [t1, t5] would return "0" (RAW or normal)
>
> It's also possible that you're hitting some sort of bug around missing
> records at query time. I'm not sure what the CDH versions you're using line
> up to, but there have certainly been issues in the past around query-time
> data loss (e.g. scans on RegionServers stop prematurely before all of the
> data is read).
>
> Good luck!
>
> *** Going off of memory here. I think this is how it works, but you should
> be able to test easily ;)
>
> On 2/9/18 5:30 PM, Andrew Kettmann wrote:
> > A simpler question would be this:
> >
> > Given:
> >
> >
> >    *   a set timeframe in the past (2-3 days roughly a year ago)
> >    *   we are NOT removing records from the table at all
> >    *   We ARE inserting into this table actively
> >
> > Should I expect two consecutive runs of the rowcounter mapreduce job to
> return an identical number?
> >
> >
> > Andrew Kettmann
> > Consultant, Platform Services Group
> >
> > From: Andrew Kettmann
> > Sent: Thursday, February 08, 2018 11:35 AM
> > To: [hidden email]
> > Subject: Inconsistent rows exported/counted when looking at a set,
> unchanged past time frame.
> >
> > First the version details:
> >
> > Running HBASE/Yarn/HDFS using Cloudera manager 5.12.1.
> > Hbase: Version 1.2.0-cdh5.8.0
> > HDFS/YARN: Hadoop 2.6.0-cdh5.8.0
> > Hbck and hdfs fsck return healthy
> >
> > 15 nodes, sized down recently from 30 (other service requirements
> > reduced. Solr, etc)
> >
> >
> > The simplest example of the inconsistency is using rowcounter. If I run
> the same mapreduce job twice in a row, I get different counts:
> >
> > hbase org.apache.hadoop.hbase.mapreduce.Driver rowcounter
> > -Dmapreduce.map.speculative=false TABLENAME --starttime=1485907200000
> > --endtime=1486058400000
> >
> > Looking at org.​apache.​hadoop.​hbase.​mapreduce.​RowCounter​$
> RowCounterMapper​$Counters:
> > Run 1: 4876683
> > Run 2: 4866351
> >
> > Similarly with exports of the same date/time. Consecutive runs of the
> export get different results:
> > hbase org.apache.hadoop.hbase.mapreduce.Export \
> > -Dmapred.map.tasks.speculative.execution=false \
> > -Dmapred.reduce.tasks.speculative.execution=false \ TABLENAME \
> > HDFSPATH 1 1485907200000 1486058400000
> >
> >  From Map Input/output records:
> > Run 1: 4296778
> > Run 2: 4297307
> >
> > None of the results show anything for spilled records, no failed maps.
> Sometimes the row count increases, sometimes it decreases. We aren’t using
> any row filter queries, we just want to export chunks of the data for a
> specific time range. This table is actively being read/written to, but I am
> asking about a date range in early 2017 in this case, so that should have
> no impact I would have thought. Another point is that the rowcount job and
> the export return ridiculously different numbers. There should be no older
> versions of rows involved as we are set to only keep the newest, and I can
> confirm that there are rows that are consistently missing from the exports.
> Table definition is below.
> >
> > hbase(main):001:0> describe 'TABLENAME'
> > Table TABLENAME is ENABLED
> > TABLENAME
> > COLUMN FAMILIES DESCRIPTION
> > {NAME => 'text', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW',
> > REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY', VERSIONS => '1',
> > MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE',
> > BLO CKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
> > 1 row(s) in 0.2800 seconds
> >
> > Any advice/suggestions would be greatly appreciated, are some of my
> assumptions wrong regarding import/export and that it should be consistent
> given consistent date/times?
> >
> >
> > Andrew Kettmann
> > Platform Services Group
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent rows exported/counted when looking at a set, unchanged past time frame.

Andrew Kettmann
Unfortunately, without already knowing that is the reason, it is difficult to get to that point. Container logs, nodemanager logs, nothing indicated anything incorrect was happening other than inconsistent exports/rowcounter results. I had reviewed all the hbase/yarn/hdfs bugs in the list but didn't see one that seemed like a smoking gun, just a bunch of possible ones. My ignorance of the inner workings of hbase/yarn likely played a big part in that though. I do appreciate you pointing out 'the one' !






From: Ted Yu
Sent: Tuesday, February 20, 11:15 PM
Subject: Re: Inconsistent rows exported/counted when looking at a set, unchanged past time frame.
To: [hidden email]


If you look at https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_fixed_in_58.html#fixed_issues585 , you would see the following: HBASE-15378 - Scanner cannot handle heartbeat message with no results which fixed what you observed in previous release. FYI On Tue, Feb 20, 2018 at 9:07 PM, Andrew Kettmann < [hidden email]> wrote: > Josh, > > We upgraded from CDH 5.8.0 -> 5.8.5 seems to have fixed the issue. 3 > Rowcounts in a row that were not consistent before on a static table are > now consistent. We are doing some further testing but it looks like you > called it with: > > 'scans on RegionServers stop prematurely before all of the data is read' > > Thanks for the pointer in that direction, I was bashing my face against > this for two weeks trying to figure out this inconsistency. I appreciate > the clue! > > Andrew Kettmann > Consultant, Platform Services Group > > -----Original Message----- > From: Josh Elser [mailto:[hidden email]] > Sent: Monday, February 12, 2018 11:59 AM > To: [hidden email] > Subject: Re: Inconsistent rows exported/counted when looking at a set, > unchanged past time frame. > > Hi Andrew, > > Yes. The answer is, of course, that you should see consistent results from > HBase if there are no mutations in flight to that table. Whether you're > reading "current" or "back-in-time", as long as you're not dealing with raw > scans (where compactions may persist delete tombstones), this should hold > just the same. > > Are you modifying older cells with newer data when you insert data? > Remember that MAX_VERSIONS for a table defaults to 1. Consider the > following: > > * Timestamps are of the form "tX", and t1 < t2 < t3 < .. > * You are querying from the time range: [t1, t5]. > * You have a cell for "row1" with at t3 with value "foo". > * RowCounter over [t1, t5] would return "1" > * Your ingest writes a new cell for "row1" of "bar" at t6. > * RowCounter over [t1, t5] would return "0" normally, or "1" is you use > RAW scans *** > * A compaction would run over the region containing "row1" > * RowCounter over [t1, t5] would return "0" (RAW or normal) > > It's also possible that you're hitting some sort of bug around missing > records at query time. I'm not sure what the CDH versions you're using line > up to, but there have certainly been issues in the past around query-time > data loss (e.g. scans on RegionServers stop prematurely before all of the > data is read). > > Good luck! > > *** Going off of memory here. I think this is how it works, but you should > be able to test easily ;) > > On 2/9/18 5:30 PM, Andrew Kettmann wrote: > > A simpler question would be this: > > > > Given: > > > > > > * a set timeframe in the past (2-3 days roughly a year ago) > > * we are NOT removing records from the table at all > > * We ARE inserting into this table actively > > > > Should I expect two consecutive runs of the rowcounter mapreduce job to > return an identical number? > > > > > > Andrew Kettmann > > Consultant, Platform Services Group > > > > From: Andrew Kettmann > > Sent: Thursday, February 08, 2018 11:35 AM > > To: [hidden email] > > Subject: Inconsistent rows exported/counted when looking at a set, > unchanged past time frame. > > > > First the version details: > > > > Running HBASE/Yarn/HDFS using Cloudera manager 5.12.1. > > Hbase: Version 1.2.0-cdh5.8.0 > > HDFS/YARN: Hadoop 2.6.0-cdh5.8.0 > > Hbck and hdfs fsck return healthy > > > > 15 nodes, sized down recently from 30 (other service requirements > > reduced. Solr, etc) > > > > > > The simplest example of the inconsistency is using rowcounter. If I run > the same mapreduce job twice in a row, I get different counts: > > > > hbase org.apache.hadoop.hbase.mapreduce.Driver rowcounter > > -Dmapreduce.map.speculative=false TABLENAME --starttime=1485907200000 > > --endtime=1486058400000 > > > > Looking at org.​apache.​hadoop.​hbase.​mapreduce.​RowCounter​$ > RowCounterMapper​$Counters: > > Run 1: 4876683 > > Run 2: 4866351 > > > > Similarly with exports of the same date/time. Consecutive runs of the > export get different results: > > hbase org.apache.hadoop.hbase.mapreduce.Export \ > > -Dmapred.map.tasks.speculative.execution=false \ > > -Dmapred.reduce.tasks.speculative.execution=false \ TABLENAME \ > > HDFSPATH 1 1485907200000 1486058400000 > > > > From Map Input/output records: > > Run 1: 4296778 > > Run 2: 4297307 > > > > None of the results show anything for spilled records, no failed maps. > Sometimes the row count increases, sometimes it decreases. We aren’t using > any row filter queries, we just want to export chunks of the data for a > specific time range. This table is actively being read/written to, but I am > asking about a date range in early 2017 in this case, so that should have > no impact I would have thought. Another point is that the rowcount job and > the export return ridiculously different numbers. There should be no older > versions of rows involved as we are set to only keep the newest, and I can > confirm that there are rows that are consistently missing from the exports. > Table definition is below. > > > > hbase(main):001:0> describe 'TABLENAME' > > Table TABLENAME is ENABLED > > TABLENAME > > COLUMN FAMILIES DESCRIPTION > > {NAME => 'text', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', > > REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY', VERSIONS => '1', > > MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', > > BLO CKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} > > 1 row(s) in 0.2800 seconds > > > > Any advice/suggestions would be greatly appreciated, are some of my > assumptions wrong regarding import/export and that it should be consistent > given consistent date/times? > > > > > > Andrew Kettmann > > Platform Services Group > > >