Class CqlOutputFormat

  • All Implemented Interfaces:
    org.apache.hadoop.mapred.OutputFormat<java.util.Map<java.lang.String,​java.nio.ByteBuffer>,​java.util.List<java.nio.ByteBuffer>>

    public class CqlOutputFormat
    extends org.apache.hadoop.mapreduce.OutputFormat<java.util.Map<java.lang.String,​java.nio.ByteBuffer>,​java.util.List<java.nio.ByteBuffer>>
    implements org.apache.hadoop.mapred.OutputFormat<java.util.Map<java.lang.String,​java.nio.ByteBuffer>,​java.util.List<java.nio.ByteBuffer>>
    The CqlOutputFormat acts as a Hadoop-specific OutputFormat that allows reduce tasks to store keys (and corresponding bound variable values) as CQL rows (and respective columns) in a given table.

    As is the case with the org.apache.cassandra.hadoop.ColumnFamilyInputFormat, you need to set the prepared statement in your Hadoop job Configuration. The CqlConfigHelper class, through its CqlConfigHelper.setOutputCql(org.apache.hadoop.conf.Configuration, java.lang.String) method, is provided to make this simple. you need to set the Keyspace. The ConfigHelper class, through its ConfigHelper.setOutputColumnFamily(org.apache.hadoop.conf.Configuration, java.lang.String) method, is provided to make this simple.

    For the sake of performance, this class employs a lazy write-back caching mechanism, where its record writer prepared statement binded variable values created based on the reduce's inputs (in a task-specific map), and periodically makes the changes official by sending a execution of prepared statement request to Cassandra.

    • Field Summary

      Fields 
      Modifier and Type Field Description
      static java.lang.String BATCH_THRESHOLD  
      static java.lang.String QUEUE_SIZE  
    • Constructor Summary

      Constructors 
      Constructor Description
      CqlOutputFormat()  
    • Method Summary

      All Methods Instance Methods Concrete Methods Deprecated Methods 
      Modifier and Type Method Description
      protected void checkOutputSpecs​(org.apache.hadoop.conf.Configuration conf)  
      void checkOutputSpecs​(org.apache.hadoop.fs.FileSystem filesystem, org.apache.hadoop.mapred.JobConf job)
      Deprecated.
      void checkOutputSpecs​(org.apache.hadoop.mapreduce.JobContext context)
      Check for validity of the output-specification for the job.
      org.apache.hadoop.mapreduce.OutputCommitter getOutputCommitter​(org.apache.hadoop.mapreduce.TaskAttemptContext context)
      The OutputCommitter for this format does not write any data to the DFS.
      org.apache.cassandra.hadoop.cql3.CqlRecordWriter getRecordWriter​(org.apache.hadoop.fs.FileSystem filesystem, org.apache.hadoop.mapred.JobConf job, java.lang.String name, org.apache.hadoop.util.Progressable progress)
      Deprecated.
      org.apache.cassandra.hadoop.cql3.CqlRecordWriter getRecordWriter​(org.apache.hadoop.mapreduce.TaskAttemptContext context)
      Get the RecordWriter for the given task.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • CqlOutputFormat

        public CqlOutputFormat()
    • Method Detail

      • checkOutputSpecs

        public void checkOutputSpecs​(org.apache.hadoop.mapreduce.JobContext context)
        Check for validity of the output-specification for the job.
        Specified by:
        checkOutputSpecs in class org.apache.hadoop.mapreduce.OutputFormat<java.util.Map<java.lang.String,​java.nio.ByteBuffer>,​java.util.List<java.nio.ByteBuffer>>
        Parameters:
        context - information about the job
      • checkOutputSpecs

        protected void checkOutputSpecs​(org.apache.hadoop.conf.Configuration conf)
      • checkOutputSpecs

        @Deprecated
        public void checkOutputSpecs​(org.apache.hadoop.fs.FileSystem filesystem,
                                     org.apache.hadoop.mapred.JobConf job)
                              throws java.io.IOException
        Deprecated.
        Fills the deprecated OutputFormat interface for streaming.
        Specified by:
        checkOutputSpecs in interface org.apache.hadoop.mapred.OutputFormat<java.util.Map<java.lang.String,​java.nio.ByteBuffer>,​java.util.List<java.nio.ByteBuffer>>
        Throws:
        java.io.IOException
      • getOutputCommitter

        public org.apache.hadoop.mapreduce.OutputCommitter getOutputCommitter​(org.apache.hadoop.mapreduce.TaskAttemptContext context)
                                                                       throws java.io.IOException,
                                                                              java.lang.InterruptedException
        The OutputCommitter for this format does not write any data to the DFS.
        Specified by:
        getOutputCommitter in class org.apache.hadoop.mapreduce.OutputFormat<java.util.Map<java.lang.String,​java.nio.ByteBuffer>,​java.util.List<java.nio.ByteBuffer>>
        Parameters:
        context - the task context
        Returns:
        an output committer
        Throws:
        java.io.IOException
        java.lang.InterruptedException
      • getRecordWriter

        @Deprecated
        public org.apache.cassandra.hadoop.cql3.CqlRecordWriter getRecordWriter​(org.apache.hadoop.fs.FileSystem filesystem,
                                                                                org.apache.hadoop.mapred.JobConf job,
                                                                                java.lang.String name,
                                                                                org.apache.hadoop.util.Progressable progress)
                                                                         throws java.io.IOException
        Deprecated.
        Fills the deprecated OutputFormat interface for streaming.
        Specified by:
        getRecordWriter in interface org.apache.hadoop.mapred.OutputFormat<java.util.Map<java.lang.String,​java.nio.ByteBuffer>,​java.util.List<java.nio.ByteBuffer>>
        Throws:
        java.io.IOException
      • getRecordWriter

        public org.apache.cassandra.hadoop.cql3.CqlRecordWriter getRecordWriter​(org.apache.hadoop.mapreduce.TaskAttemptContext context)
                                                                         throws java.io.IOException,
                                                                                java.lang.InterruptedException
        Get the RecordWriter for the given task.
        Specified by:
        getRecordWriter in class org.apache.hadoop.mapreduce.OutputFormat<java.util.Map<java.lang.String,​java.nio.ByteBuffer>,​java.util.List<java.nio.ByteBuffer>>
        Parameters:
        context - the information about the current task.
        Returns:
        a RecordWriter to write the output for the job.
        Throws:
        java.io.IOException
        java.lang.InterruptedException