Class CqlBulkOutputFormat

  • All Implemented Interfaces:
    org.apache.hadoop.mapred.OutputFormat<java.lang.Object,​java.util.List<java.nio.ByteBuffer>>

    public class CqlBulkOutputFormat
    extends org.apache.hadoop.mapreduce.OutputFormat<java.lang.Object,​java.util.List<java.nio.ByteBuffer>>
    implements org.apache.hadoop.mapred.OutputFormat<java.lang.Object,​java.util.List<java.nio.ByteBuffer>>
    The CqlBulkOutputFormat acts as a Hadoop-specific OutputFormat that allows reduce tasks to store keys (and corresponding bound variable values) as CQL rows (and respective columns) in a given table.

    As is the case with the CqlOutputFormat, you need to set the prepared statement in your Hadoop job Configuration. The CqlConfigHelper class, through its org.apache.cassandra.hadoop.ConfigHelper#setOutputPreparedStatement method, is provided to make this simple. you need to set the Keyspace. The ConfigHelper class, through its ConfigHelper.setOutputColumnFamily(org.apache.hadoop.conf.Configuration, java.lang.String) method, is provided to make this simple.

    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods 
      Modifier and Type Method Description
      void checkOutputSpecs​(org.apache.hadoop.fs.FileSystem filesystem, org.apache.hadoop.mapred.JobConf job)
      Deprecated.
      void checkOutputSpecs​(org.apache.hadoop.mapreduce.JobContext context)  
      static boolean getDeleteSourceOnSuccess​(org.apache.hadoop.conf.Configuration conf)  
      static java.util.Collection<java.lang.String> getIgnoreHosts​(org.apache.hadoop.conf.Configuration conf)
      Get the hosts to ignore as a collection of strings
      org.apache.hadoop.mapreduce.OutputCommitter getOutputCommitter​(org.apache.hadoop.mapreduce.TaskAttemptContext context)  
      CqlBulkRecordWriter getRecordWriter​(org.apache.hadoop.fs.FileSystem filesystem, org.apache.hadoop.mapred.JobConf job, java.lang.String name, org.apache.hadoop.util.Progressable progress)
      Deprecated.
      CqlBulkRecordWriter getRecordWriter​(org.apache.hadoop.mapreduce.TaskAttemptContext context)
      Get the RecordWriter for the given task.
      static java.lang.String getTableForAlias​(org.apache.hadoop.conf.Configuration conf, java.lang.String alias)  
      static java.lang.String getTableInsertStatement​(org.apache.hadoop.conf.Configuration conf, java.lang.String columnFamily)  
      static java.lang.String getTableSchema​(org.apache.hadoop.conf.Configuration conf, java.lang.String columnFamily)  
      static void setDeleteSourceOnSuccess​(org.apache.hadoop.conf.Configuration conf, boolean deleteSrc)  
      static void setIgnoreHosts​(org.apache.hadoop.conf.Configuration conf, java.lang.String ignoreNodesCsv)
      Set the hosts to ignore as comma delimited values.
      static void setIgnoreHosts​(org.apache.hadoop.conf.Configuration conf, java.lang.String... ignoreNodes)
      Set the hosts to ignore.
      static void setTableAlias​(org.apache.hadoop.conf.Configuration conf, java.lang.String alias, java.lang.String columnFamily)  
      static void setTableInsertStatement​(org.apache.hadoop.conf.Configuration conf, java.lang.String columnFamily, java.lang.String insertStatement)  
      static void setTableSchema​(org.apache.hadoop.conf.Configuration conf, java.lang.String columnFamily, java.lang.String schema)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • CqlBulkOutputFormat

        public CqlBulkOutputFormat()
    • Method Detail

      • getRecordWriter

        @Deprecated
        public CqlBulkRecordWriter getRecordWriter​(org.apache.hadoop.fs.FileSystem filesystem,
                                                   org.apache.hadoop.mapred.JobConf job,
                                                   java.lang.String name,
                                                   org.apache.hadoop.util.Progressable progress)
                                            throws java.io.IOException
        Deprecated.
        Fills the deprecated OutputFormat interface for streaming.
        Specified by:
        getRecordWriter in interface org.apache.hadoop.mapred.OutputFormat<java.lang.Object,​java.util.List<java.nio.ByteBuffer>>
        Throws:
        java.io.IOException
      • getRecordWriter

        public CqlBulkRecordWriter getRecordWriter​(org.apache.hadoop.mapreduce.TaskAttemptContext context)
                                            throws java.io.IOException,
                                                   java.lang.InterruptedException
        Get the RecordWriter for the given task.
        Specified by:
        getRecordWriter in class org.apache.hadoop.mapreduce.OutputFormat<java.lang.Object,​java.util.List<java.nio.ByteBuffer>>
        Parameters:
        context - the information about the current task.
        Returns:
        a RecordWriter to write the output for the job.
        Throws:
        java.io.IOException
        java.lang.InterruptedException
      • checkOutputSpecs

        public void checkOutputSpecs​(org.apache.hadoop.mapreduce.JobContext context)
        Specified by:
        checkOutputSpecs in class org.apache.hadoop.mapreduce.OutputFormat<java.lang.Object,​java.util.List<java.nio.ByteBuffer>>
      • checkOutputSpecs

        @Deprecated
        public void checkOutputSpecs​(org.apache.hadoop.fs.FileSystem filesystem,
                                     org.apache.hadoop.mapred.JobConf job)
                              throws java.io.IOException
        Deprecated.
        Fills the deprecated OutputFormat interface for streaming.
        Specified by:
        checkOutputSpecs in interface org.apache.hadoop.mapred.OutputFormat<java.lang.Object,​java.util.List<java.nio.ByteBuffer>>
        Throws:
        java.io.IOException
      • getOutputCommitter

        public org.apache.hadoop.mapreduce.OutputCommitter getOutputCommitter​(org.apache.hadoop.mapreduce.TaskAttemptContext context)
                                                                       throws java.io.IOException,
                                                                              java.lang.InterruptedException
        Specified by:
        getOutputCommitter in class org.apache.hadoop.mapreduce.OutputFormat<java.lang.Object,​java.util.List<java.nio.ByteBuffer>>
        Throws:
        java.io.IOException
        java.lang.InterruptedException
      • setTableSchema

        public static void setTableSchema​(org.apache.hadoop.conf.Configuration conf,
                                          java.lang.String columnFamily,
                                          java.lang.String schema)
      • setTableInsertStatement

        public static void setTableInsertStatement​(org.apache.hadoop.conf.Configuration conf,
                                                   java.lang.String columnFamily,
                                                   java.lang.String insertStatement)
      • getTableSchema

        public static java.lang.String getTableSchema​(org.apache.hadoop.conf.Configuration conf,
                                                      java.lang.String columnFamily)
      • getTableInsertStatement

        public static java.lang.String getTableInsertStatement​(org.apache.hadoop.conf.Configuration conf,
                                                               java.lang.String columnFamily)
      • setDeleteSourceOnSuccess

        public static void setDeleteSourceOnSuccess​(org.apache.hadoop.conf.Configuration conf,
                                                    boolean deleteSrc)
      • getDeleteSourceOnSuccess

        public static boolean getDeleteSourceOnSuccess​(org.apache.hadoop.conf.Configuration conf)
      • setTableAlias

        public static void setTableAlias​(org.apache.hadoop.conf.Configuration conf,
                                         java.lang.String alias,
                                         java.lang.String columnFamily)
      • getTableForAlias

        public static java.lang.String getTableForAlias​(org.apache.hadoop.conf.Configuration conf,
                                                        java.lang.String alias)
      • setIgnoreHosts

        public static void setIgnoreHosts​(org.apache.hadoop.conf.Configuration conf,
                                          java.lang.String ignoreNodesCsv)
        Set the hosts to ignore as comma delimited values. Data will not be bulk loaded onto the ignored nodes.
        Parameters:
        conf - job configuration
        ignoreNodesCsv - a comma delimited list of nodes to ignore
      • setIgnoreHosts

        public static void setIgnoreHosts​(org.apache.hadoop.conf.Configuration conf,
                                          java.lang.String... ignoreNodes)
        Set the hosts to ignore. Data will not be bulk loaded onto the ignored nodes.
        Parameters:
        conf - job configuration
        ignoreNodes - the nodes to ignore
      • getIgnoreHosts

        public static java.util.Collection<java.lang.String> getIgnoreHosts​(org.apache.hadoop.conf.Configuration conf)
        Get the hosts to ignore as a collection of strings
        Parameters:
        conf - job configuration
        Returns:
        the nodes to ignore as a collection of stirngs