Class SSTableReader

  • All Implemented Interfaces:
    RefCounted<SSTableReader>, SelfRefCounted<SSTableReader>
    Direct Known Subclasses:
    BigTableReader

    public abstract class SSTableReader
    extends SSTable
    implements SelfRefCounted<SSTableReader>
    An SSTableReader can be constructed in a number of places, but typically is either read from disk at startup, or constructed from a flushed memtable, or after compaction to replace some existing sstables. However once created, an sstablereader may also be modified. A reader's OpenReason describes its current stage in its lifecycle, as follows:
     
     NORMAL
     From:       None        => Reader has been read from disk, either at startup or from a flushed memtable
                 EARLY       => Reader is the final result of a compaction
                 MOVED_START => Reader WAS being compacted, but this failed and it has been restored to NORMAL status
    
     EARLY
     From:       None        => Reader is a compaction replacement that is either incomplete and has been opened
                                to represent its partial result status, or has been finished but the compaction
                                it is a part of has not yet completed fully
                 EARLY       => Same as from None, only it is not the first time it has been
    
     MOVED_START
     From:       NORMAL      => Reader is being compacted. This compaction has not finished, but the compaction result
                                is either partially or fully opened, to either partially or fully replace this reader.
                                This reader's start key has been updated to represent this, so that reads only hit
                                one or the other reader.
    
     METADATA_CHANGE
     From:       NORMAL      => Reader has seen low traffic and the amount of memory available for index summaries is
                                constrained, so its index summary has been downsampled.
             METADATA_CHANGE => Same
      
    Note that in parallel to this, there are two different Descriptor types; TMPLINK and FINAL; the latter corresponds to NORMAL state readers and all readers that replace a NORMAL one. TMPLINK is used for EARLY state readers and no others. When a reader is being compacted, if the result is large its replacement may be opened as EARLY before compaction completes in order to present the result to consumers earlier. In this case the reader will itself be changed to a MOVED_START state, where its start no longer represents its on-disk minimum key. This is to permit reads to be directed to only one reader when the two represent the same data. The EARLY file can represent a compaction result that is either partially complete and still in-progress, or a complete and immutable sstable that is part of a larger macro compaction action that has not yet fully completed. Currently ALL compaction results at least briefly go through an EARLY open state prior to completion, regardless of if early opening is enabled. Since a reader can be created multiple times over the same shared underlying resources, and the exact resources it shares between each instance differ subtly, we track the lifetime of any underlying resource with its own reference count, which each instance takes a Ref to. Each instance then tracks references to itself, and once these all expire it releases its Refs to these underlying resources. There is some shared cleanup behaviour needed only once all sstablereaders in a certain stage of their lifecycle (i.e. EARLY or NORMAL opening), and some that must only occur once all readers of any kind over a single logical sstable have expired. These are managed by the TypeTidy and GlobalTidy classes at the bottom, and are effectively managed as another resource each instance tracks its own Ref instance to, to ensure all of these resources are cleaned up safely and can be debugged otherwise. TODO: fill in details about Tracker and lifecycle interactions for tools, and for compaction strategies
    • Field Detail

      • maxTimestampDescending

        public static final java.util.Comparator<SSTableReader> maxTimestampDescending
      • maxTimestampAscending

        public static final java.util.Comparator<SSTableReader> maxTimestampAscending
      • sstableComparator

        public static final java.util.Comparator<SSTableReader> sstableComparator
      • generationReverseComparator

        public static final java.util.Comparator<SSTableReader> generationReverseComparator
      • sstableOrdering

        public static final com.google.common.collect.Ordering<SSTableReader> sstableOrdering
      • sizeComparator

        public static final java.util.Comparator<SSTableReader> sizeComparator
      • maxDataAge

        public final long maxDataAge
        maxDataAge is a timestamp in local server time (e.g. System.currentTimeMilli) which represents an upper bound to the newest piece of data stored in the sstable. In other words, this sstable does not contain items created later than maxDataAge. The field is not serialized to disk, so relying on it for more than what truncate does is not advised. When a new sstable is flushed, maxDataAge is set to the time of creation. When a sstable is created from compaction, maxDataAge is set to max of all merged sstables. The age is in milliseconds since epoc and is local to this host.
      • isSuspect

        protected final java.util.concurrent.atomic.AtomicBoolean isSuspect
      • sstableMetadata

        protected volatile StatsMetadata sstableMetadata
      • keyCacheHit

        protected final java.util.concurrent.atomic.AtomicLong keyCacheHit
      • keyCacheRequest

        protected final java.util.concurrent.atomic.AtomicLong keyCacheRequest
    • Method Detail

      • getApproximateKeyCount

        public static long getApproximateKeyCount​(java.lang.Iterable<SSTableReader> sstables)
        Calculate approximate key count. If cardinality estimator is available on all given sstables, then this method use them to estimate key count. If not, then this uses index summaries.
        Parameters:
        sstables - SSTables to calculate key count
        Returns:
        estimated key count
      • estimateCompactionGain

        public static double estimateCompactionGain​(java.util.Set<SSTableReader> overlapping)
        Estimates how much of the keys we would keep if the sstables were compacted together
      • openForBatch

        public static SSTableReader openForBatch​(Descriptor descriptor,
                                                 java.util.Set<Component> components,
                                                 TableMetadataRef metadata)
        Open SSTable reader to be used in batch mode(such as sstableloader).
        Parameters:
        descriptor -
        components -
        metadata -
        Returns:
        opened SSTableReader
        Throws:
        java.io.IOException
      • open

        public static SSTableReader open​(Descriptor descriptor,
                                         java.util.Set<Component> components,
                                         TableMetadataRef metadata,
                                         boolean validate,
                                         boolean isOffline)
        Open an SSTable for reading
        Parameters:
        descriptor - SSTable to open
        components - Components included with this SSTable
        metadata - for this SSTables CF
        validate - Check SSTable for corruption (limited)
        isOffline - Whether we are opening this SSTable "offline", for example from an external tool or not for inclusion in queries (validations) This stops regenerating BF + Summaries and also disables tracking of hotness for the SSTable.
        Returns:
        SSTableReader
        Throws:
        java.io.IOException
      • verifyCompressionInfoExistenceIfApplicable

        public static void verifyCompressionInfoExistenceIfApplicable​(Descriptor descriptor,
                                                                      java.util.Set<Component> actualComponents)
                                                               throws CorruptSSTableException,
                                                                      FSReadError
        Best-effort checking to verify the expected compression info component exists, according to the TOC file. The verification depends on the existence of TOC file. If absent, the verification is skipped.
        Parameters:
        descriptor -
        actualComponents - , actual components listed from the file system.
        Throws:
        CorruptSSTableException
        FSReadError
      • getTotalBytes

        public static long getTotalBytes​(java.lang.Iterable<SSTableReader> sstables)
      • getTotalUncompressedBytes

        public static long getTotalUncompressedBytes​(java.lang.Iterable<SSTableReader> sstables)
      • equals

        public boolean equals​(java.lang.Object that)
        Overrides:
        equals in class java.lang.Object
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class java.lang.Object
      • getFilename

        public java.lang.String getFilename()
        Overrides:
        getFilename in class SSTable
      • setupOnline

        public void setupOnline()
      • saveBloomFilter

        public static void saveBloomFilter​(Descriptor descriptor,
                                           IFilter filter)
      • runWithLock

        public <R> R runWithLock​(CheckedFunction<Descriptor,​R,​java.io.IOException> task)
                          throws java.io.IOException
        Execute provided task with sstable lock to avoid racing with index summary redistribution, SEE CASSANDRA-15861.
        Parameters:
        task - to be guarded by sstable lock
        Throws:
        java.io.IOException
      • setReplaced

        public void setReplaced()
      • isReplaced

        public boolean isReplaced()
      • runOnClose

        public void runOnClose​(java.lang.Runnable runOnClose)
      • cloneAndReplace

        public SSTableReader cloneAndReplace​(IFilter newBloomFilter)
        Clone this reader with the new values and set the clone as replacement.
        Parameters:
        newBloomFilter - for the replacement
        Returns:
        the cloned reader. That reader is set as a replacement by the method.
      • cloneWithNewSummarySamplingLevel

        public SSTableReader cloneWithNewSummarySamplingLevel​(ColumnFamilyStore parent,
                                                              int samplingLevel)
                                                       throws java.io.IOException
        Returns a new SSTableReader with the same properties as this SSTableReader except that a new IndexSummary will be built at the target samplingLevel. This (original) SSTableReader instance will be marked as replaced, have its DeletingTask removed, and have its periodic read-meter sync task cancelled.
        Parameters:
        samplingLevel - the desired sampling level for the index summary on the new SSTableReader
        Returns:
        a new SSTableReader
        Throws:
        java.io.IOException
      • getIndexSummarySamplingLevel

        public int getIndexSummarySamplingLevel()
      • getIndexSummaryOffHeapSize

        public long getIndexSummaryOffHeapSize()
      • getMinIndexInterval

        public int getMinIndexInterval()
      • getEffectiveIndexInterval

        public double getEffectiveIndexInterval()
      • releaseSummary

        public void releaseSummary()
      • getIndexScanPosition

        public long getIndexScanPosition​(PartitionPosition key)
        Gets the position in the index file to start scanning to find the given key (at most indexInterval keys away, modulo downsampling of the index summary). Always returns a value >= 0
      • getIndexScanPositionFromBinarySearchResult

        public static long getIndexScanPositionFromBinarySearchResult​(int binarySearchResult,
                                                                      IndexSummary referencedIndexSummary)
      • getIndexSummaryIndexFromBinarySearchResult

        public static int getIndexSummaryIndexFromBinarySearchResult​(int binarySearchResult)
      • getCompressionMetadata

        public CompressionMetadata getCompressionMetadata()
        Returns the compression metadata for this sstable.
        Throws:
        java.lang.IllegalStateException - if the sstable is not compressed
      • getCompressionMetadataOffHeapSize

        public long getCompressionMetadataOffHeapSize()
        Returns the amount of memory in bytes used off heap by the compression meta-data.
        Returns:
        the amount of memory in bytes used off heap by the compression meta-data
      • getBloomFilter

        public IFilter getBloomFilter()
      • getBloomFilterSerializedSize

        public long getBloomFilterSerializedSize()
      • getBloomFilterOffHeapSize

        public long getBloomFilterOffHeapSize()
        Returns the amount of memory in bytes used off heap by the bloom filter.
        Returns:
        the amount of memory in bytes used off heap by the bloom filter
      • estimatedKeys

        public long estimatedKeys()
        Returns:
        An estimate of the number of keys in this SSTable based on the index summary.
      • estimatedKeysForRanges

        public long estimatedKeysForRanges​(java.util.Collection<Range<Token>> ranges)
        Parameters:
        ranges -
        Returns:
        An estimate of the number of keys for given ranges in this SSTable.
      • getIndexSummarySize

        public int getIndexSummarySize()
        Returns the number of entries in the IndexSummary. At full sampling, this is approximately 1/INDEX_INTERVALth of the keys in this SSTable.
      • getMaxIndexSummarySize

        public int getMaxIndexSummarySize()
        Returns the approximate number of entries the IndexSummary would contain if it were at full sampling.
      • getIndexSummaryKey

        public byte[] getIndexSummaryKey​(int index)
        Returns the key for the index summary entry at `index`.
      • getPositionsForRanges

        public java.util.List<SSTableReader.PartitionPositionBounds> getPositionsForRanges​(java.util.Collection<Range<Token>> ranges)
        Determine the minimal set of sections that can be extracted from this SSTable to cover the given ranges.
        Returns:
        A sorted list of (offset,end) pairs that cover the given ranges in the datafile for this SSTable.
      • isKeyCacheEnabled

        public boolean isKeyCacheEnabled()
      • getPosition

        public final RowIndexEntry getPosition​(PartitionPosition key,
                                               SSTableReader.Operator op)
        Retrieves the position while updating the key cache and the stats.
        Parameters:
        key - The key to apply as the rhs to the given Operator. A 'fake' key is allowed to allow key selection by token bounds but only if op != * EQ
        op - The Operator defining matching keys: the nearest key to the target matching the operator wins.
      • getPosition

        public final RowIndexEntry getPosition​(PartitionPosition key,
                                               SSTableReader.Operator op,
                                               SSTableReadsListener listener)
        Retrieves the position while updating the key cache and the stats.
        Parameters:
        key - The key to apply as the rhs to the given Operator. A 'fake' key is allowed to allow key selection by token bounds but only if op != * EQ
        op - The Operator defining matching keys: the nearest key to the target matching the operator wins.
        listener - the SSTableReaderListener that must handle the notifications.
      • getPosition

        protected abstract RowIndexEntry getPosition​(PartitionPosition key,
                                                     SSTableReader.Operator op,
                                                     boolean updateCacheAndStats,
                                                     boolean permitMatchPastLast,
                                                     SSTableReadsListener listener)
        Parameters:
        key - The key to apply as the rhs to the given Operator. A 'fake' key is allowed to allow key selection by token bounds but only if op != * EQ
        op - The Operator defining matching keys: the nearest key to the target matching the operator wins.
        updateCacheAndStats - true if updating stats and cache
        listener - a listener used to handle internal events
        Returns:
        The index entry corresponding to the key, or null if the key is not present
      • firstKeyBeyond

        public DecoratedKey firstKeyBeyond​(PartitionPosition token)
        Finds and returns the first key beyond a given token in this SSTable or null if no such key exists.
      • uncompressedLength

        public long uncompressedLength()
        Returns:
        The length in bytes of the data for this SSTable. For compressed files, this is not the same thing as the on disk size (see onDiskLength())
      • onDiskLength

        public long onDiskLength()
        Returns:
        The length in bytes of the on disk size for this SSTable. For compressed files, this is not the same thing as the data length (see length())
      • getCrcCheckChance

        public double getCrcCheckChance()
      • setCrcCheckChance

        public void setCrcCheckChance​(double crcCheckChance)
        Set the value of CRC check chance. The argument supplied is obtained from the the property of the owning CFS. Called when either the SSTR is initialized, or the CFS's property is updated via JMX
        Parameters:
        crcCheckChance -
      • markObsolete

        public void markObsolete​(java.lang.Runnable tidier)
        Mark the sstable as obsolete, i.e., compacted into newer sstables. When calling this function, the caller must ensure that the SSTableReader is not referenced anywhere except for threads holding a reference. multiple times is usually buggy (see exceptions in Tracker.unmarkCompacting and removeOldSSTablesSize).
      • isMarkedCompacted

        public boolean isMarkedCompacted()
      • markSuspect

        public void markSuspect()
      • unmarkSuspect

        public void unmarkSuspect()
      • isMarkedSuspect

        public boolean isMarkedSuspect()
      • getScanner

        public ISSTableScanner getScanner​(Range<Token> range)
        Direct I/O SSTableScanner over a defined range of tokens.
        Parameters:
        range - the range of keys to cover
        Returns:
        A Scanner for seeking over the rows of the SSTable.
      • getScanner

        public abstract ISSTableScanner getScanner()
        Direct I/O SSTableScanner over the entirety of the sstable..
        Returns:
        A Scanner over the full content of the SSTable.
      • getScanner

        public abstract ISSTableScanner getScanner​(java.util.Collection<Range<Token>> ranges)
        Direct I/O SSTableScanner over a defined collection of ranges of tokens.
        Parameters:
        ranges - the range of keys to cover
        Returns:
        A Scanner for seeking over the rows of the SSTable.
      • getScanner

        public abstract ISSTableScanner getScanner​(java.util.Iterator<AbstractBounds<PartitionPosition>> rangeIterator)
        Direct I/O SSTableScanner over an iterator of bounds.
        Parameters:
        rangeIterator - the keys to cover
        Returns:
        A Scanner for seeking over the rows of the SSTable.
      • getScanner

        public abstract ISSTableScanner getScanner​(ColumnFilter columns,
                                                   DataRange dataRange,
                                                   SSTableReadsListener listener)
        Parameters:
        columns - the columns to return.
        dataRange - filter to use when reading the columns
        listener - a listener used to handle internal read events
        Returns:
        A Scanner for seeking over the rows of the SSTable.
      • getFileDataInput

        public FileDataInput getFileDataInput​(long position)
      • newSince

        public boolean newSince​(long age)
        Tests if the sstable contains data newer than the given age param (in localhost currentMilli time). This works in conjunction with maxDataAge which is an upper bound on the create of data in this sstable.
        Parameters:
        age - The age to compare the maxDataAre of this sstable. Measured in millisec since epoc on this host
        Returns:
        True iff this sstable contains data that's newer than the given age parameter.
      • createLinks

        public void createLinks​(java.lang.String snapshotDirectoryPath)
      • createLinks

        public void createLinks​(java.lang.String snapshotDirectoryPath,
                                com.google.common.util.concurrent.RateLimiter rateLimiter)
      • createLinks

        public static void createLinks​(Descriptor descriptor,
                                       java.util.Set<Component> components,
                                       java.lang.String snapshotDirectoryPath)
      • createLinks

        public static void createLinks​(Descriptor descriptor,
                                       java.util.Set<Component> components,
                                       java.lang.String snapshotDirectoryPath,
                                       com.google.common.util.concurrent.RateLimiter limiter)
      • isRepaired

        public boolean isRepaired()
      • keyAt

        public DecoratedKey keyAt​(long indexPosition)
                           throws java.io.IOException
        Throws:
        java.io.IOException
      • isPendingRepair

        public boolean isPendingRepair()
      • getPendingRepair

        public java.util.UUID getPendingRepair()
      • getRepairedAt

        public long getRepairedAt()
      • isTransient

        public boolean isTransient()
      • intersects

        public boolean intersects​(java.util.Collection<Range<Token>> ranges)
      • getBloomFilterFalsePositiveCount

        public long getBloomFilterFalsePositiveCount()
      • getRecentBloomFilterFalsePositiveCount

        public long getRecentBloomFilterFalsePositiveCount()
      • getBloomFilterTruePositiveCount

        public long getBloomFilterTruePositiveCount()
      • getRecentBloomFilterTruePositiveCount

        public long getRecentBloomFilterTruePositiveCount()
      • getBloomFilterTrueNegativeCount

        public long getBloomFilterTrueNegativeCount()
      • getRecentBloomFilterTrueNegativeCount

        public long getRecentBloomFilterTrueNegativeCount()
      • getEstimatedCellPerPartitionCount

        public EstimatedHistogram getEstimatedCellPerPartitionCount()
      • getEstimatedDroppableTombstoneRatio

        public double getEstimatedDroppableTombstoneRatio​(int gcBefore)
      • getDroppableTombstonesBefore

        public double getDroppableTombstonesBefore​(int gcBefore)
      • getCompressionRatio

        public double getCompressionRatio()
      • getMinTimestamp

        public long getMinTimestamp()
      • getMaxTimestamp

        public long getMaxTimestamp()
      • getMinLocalDeletionTime

        public int getMinLocalDeletionTime()
      • getMaxLocalDeletionTime

        public int getMaxLocalDeletionTime()
      • mayHaveTombstones

        public boolean mayHaveTombstones()
        Whether the sstable may contain tombstones or if it is guaranteed to not contain any.

        Note that having that method return false guarantees the sstable has no tombstones whatsoever (so no cell tombstone, no range tombstone maker and no expiring columns), but having it return true doesn't guarantee it contains any as it may simply have non-expired cells.

      • getMinTTL

        public int getMinTTL()
      • getMaxTTL

        public int getMaxTTL()
      • getTotalColumnsSet

        public long getTotalColumnsSet()
      • getTotalRows

        public long getTotalRows()
      • getAvgColumnSetPerRow

        public int getAvgColumnSetPerRow()
      • getSSTableLevel

        public int getSSTableLevel()
      • mutateLevelAndReload

        public void mutateLevelAndReload​(int newLevel)
                                  throws java.io.IOException
        Mutate sstable level with a lock to avoid racing with entire-sstable-streaming and then reload sstable metadata
        Throws:
        java.io.IOException
      • mutateRepairedAndReload

        public void mutateRepairedAndReload​(long newRepairedAt,
                                            java.util.UUID newPendingRepair,
                                            boolean isTransient)
                                     throws java.io.IOException
        Mutate sstable repair metadata with a lock to avoid racing with entire-sstable-streaming and then reload sstable metadata
        Throws:
        java.io.IOException
      • reloadSSTableMetadata

        public void reloadSSTableMetadata()
                                   throws java.io.IOException
        Reloads the sstable metadata from disk. Called after level is changed on sstable, for example if the sstable is dropped to L0 Might be possible to remove in future versions
        Throws:
        java.io.IOException
      • openDataReader

        public RandomAccessReader openDataReader​(com.google.common.util.concurrent.RateLimiter limiter)
      • getCreationTimeFor

        public long getCreationTimeFor​(Component component)
        Parameters:
        component - component to get timestamp.
        Returns:
        last modified time for given component. 0 if given component does not exist or IO error occurs.
      • getKeyCacheHit

        public long getKeyCacheHit()
        Returns:
        Number of key cache hit
      • getKeyCacheRequest

        public long getKeyCacheRequest()
        Returns:
        Number of key cache request
      • incrementReadCount

        public void incrementReadCount()
        Increment the total read count and read rate for this SSTable. This should not be incremented for non-query reads, like compaction.
      • overrideReadMeter

        public void overrideReadMeter​(RestorableMeter readMeter)
      • resetTidying

        public static void resetTidying()
      • moveAndOpenSSTable

        public static SSTableReader moveAndOpenSSTable​(ColumnFamilyStore cfs,
                                                       Descriptor oldDescriptor,
                                                       Descriptor newDescriptor,
                                                       java.util.Set<Component> components,
                                                       boolean copyData)
        Moves the sstable in oldDescriptor to a new place (with generation etc) in newDescriptor. All components given will be moved/renamed
      • shutdownBlocking

        public static void shutdownBlocking​(long timeout,
                                            java.util.concurrent.TimeUnit unit)
                                     throws java.lang.InterruptedException,
                                            java.util.concurrent.TimeoutException
        Throws:
        java.lang.InterruptedException
        java.util.concurrent.TimeoutException