[jira] [Created] (HBASE-18168) NoSuchElementException when rolling the log

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (HBASE-18168) NoSuchElementException when rolling the log

JIRA jira@apache.org
Allan Yang created HBASE-18168:
----------------------------------

             Summary: NoSuchElementException when rolling the log
                 Key: HBASE-18168
                 URL: https://issues.apache.org/jira/browse/HBASE-18168
             Project: HBase
          Issue Type: Bug
    Affects Versions: 1.1.11
            Reporter: Allan Yang
            Assignee: Allan Yang


Today, one of our server aborted due to the following log.
{code}
2017-06-06 05:38:47,142 ERROR [regionserver/xxxx.logRoller] regionserver.LogRoller: Log rolling failed
java.util.NoSuchElementException
        at java.util.concurrent.ConcurrentSkipListMap$Iter.advance(ConcurrentSkipListMap.java:2224)
        at java.util.concurrent.ConcurrentSkipListMap$ValueIterator.next(ConcurrentSkipListMap.java:2253)
        at java.util.Collections.min(Collections.java:628)
        at org.apache.hadoop.hbase.regionserver.wal.FSHLog.findEligibleMemstoresToFlush(FSHLog.java:861)
        at org.apache.hadoop.hbase.regionserver.wal.FSHLog.findRegionsToForceFlush(FSHLog.java:886)
        at org.apache.hadoop.hbase.regionserver.wal.FSHLog.rollWriter(FSHLog.java:728)
        at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:137)
        at java.lang.Thread.run(Thread.java:756)
2017-06-06 05:38:47,142 FATAL [regionserver/xxxx.logRoller] regionserver.HRegionServer: ABORTING region server xxxx: Log rolling failed
java.util.NoSuchElementException
......
{code}

The code is here:
{code}
private byte[][] findEligibleMemstoresToFlush(Map<byte[], Long> regionsSequenceNums) {
    List<byte[]> regionsToFlush = null;
    // Keeping the old behavior of iterating unflushedSeqNums under oldestSeqNumsLock.
    synchronized (regionSequenceIdLock) {
      for (Map.Entry<byte[], Long> e: regionsSequenceNums.entrySet()) {
        ConcurrentMap<byte[], Long> m =
            this.oldestUnflushedStoreSequenceIds.get(e.getKey());
        if (m == null) {
          continue;
        }
        long unFlushedVal = Collections.min(m.values()); //The exception is thrown here
        ......
{code}
The map 'm' is empty is the only reason I can think of why NoSuchElementException is thrown. I then looked up all code related to the update of 'oldestUnflushedStoreSequenceIds'. All update to 'oldestUnflushedStoreSequenceIds' is guarded by the synchronization of 'regionSequenceIdLock' except here:
{code}
private ConcurrentMap<byte[], Long> getOrCreateOldestUnflushedStoreSequenceIdsOfRegion(
      byte[] encodedRegionName) {
    ......
    oldestUnflushedStoreSequenceIdsOfRegion =
        new ConcurrentSkipListMap<byte[], Long>(Bytes.BYTES_COMPARATOR);
    ConcurrentMap<byte[], Long> alreadyPut =
        oldestUnflushedStoreSequenceIds.putIfAbsent(encodedRegionName,
          oldestUnflushedStoreSequenceIdsOfRegion); // Here, a empty map may put to 'oldestUnflushedStoreSequenceIds' with no synchronization
    return alreadyPut == null ? oldestUnflushedStoreSequenceIdsOfRegion : alreadyPut;
  }
{code}

It should be a very rare bug. But it can lead to server abort. It only exists in branch-1.1.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)