I did some more thinking about this while working on HBASE-18135.
A goal/feature of HBASE-16961 was that when a RegionServer fails to regularly submit size reports for a Region, if a significant percentage of the Regions are missing (e.g. >5% of regions by default), the Master will not enforce a quota violation on the table. This is meant to be a failsafe for regions stuck in transition or generic bugs/flakiness of RegionServers.
This feature is implemented by the Master aging off recorded sizes for regions after a given amount of time. As long as the size is (re)reported by a RegionServer, the master continues to acknowledge the size of a Region.
If the FileSystemUtilizationChore is removed, the Master will age-off the size reports for regions which are idle but may contain space. This would result in a situation where the Master would stop enforcing a violation policy for a table over quota and not accepting new updates. As such, we cannot implement this improvement while also doing the region size report age-off.
My feeling is to avoid the optimization described in this ticket and see what some real-life usage of the feature brings. We have metrics which will help us understand, at scale, what the impact of this chore is. If scanning the Region size on disk is of large impact, we can re-consider.
> Re-think if the FileSystemUtilizationChore is still necessary
> Key: HBASE-18134
> URL: https://issues.apache.org/jira/browse/HBASE-18134 > Project: HBase
> Issue Type: Task
> Reporter: Josh Elser
> Assignee: Josh Elser
> On the heels of HBASE-18133, we need to put some thought into whether or not there are cases in which the RegionServer should still report sizes directly from HDFS.
> The cases I have in mind are primarily in the face of RS failure/restart. Ideally, we could get rid of this chore completely.
This message was sent by Atlassian JIRA