Class CqlInputFormat

  • All Implemented Interfaces:
    org.apache.hadoop.mapred.InputFormat<java.lang.Long,​com.datastax.driver.core.Row>

    public class CqlInputFormat
    extends org.apache.hadoop.mapreduce.InputFormat<java.lang.Long,​com.datastax.driver.core.Row>
    implements org.apache.hadoop.mapred.InputFormat<java.lang.Long,​com.datastax.driver.core.Row>
    Hadoop InputFormat allowing map/reduce against Cassandra rows within one ColumnFamily. At minimum, you need to set the KS and CF in your Hadoop job Configuration. The ConfigHelper class is provided to make this simple: ConfigHelper.setInputColumnFamily You can also configure the number of rows per InputSplit with 1: ConfigHelper.setInputSplitSize. The default split size is 64k rows. or 2: ConfigHelper.setInputSplitSizeInMb. InputSplit size in MB with new, more precise method If no value is provided for InputSplitSizeInMb, we default to using InputSplitSize. CQLConfigHelper.setInputCQLPageRowSize. The default page row size is 1000. You should set it to "as big as possible, but no bigger." It set the LIMIT for the CQL query, so you need set it big enough to minimize the network overhead, and also not too big to avoid out of memory issue. other native protocol connection parameters in CqlConfigHelper
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static java.lang.String MAPRED_TASK_ID  
    • Constructor Summary

      Constructors 
      Constructor Description
      CqlInputFormat()  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      org.apache.hadoop.mapreduce.RecordReader<java.lang.Long,​com.datastax.driver.core.Row> createRecordReader​(org.apache.hadoop.mapreduce.InputSplit arg0, org.apache.hadoop.mapreduce.TaskAttemptContext arg1)  
      org.apache.hadoop.mapred.RecordReader<java.lang.Long,​com.datastax.driver.core.Row> getRecordReader​(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf jobConf, org.apache.hadoop.mapred.Reporter reporter)  
      org.apache.hadoop.mapred.InputSplit[] getSplits​(org.apache.hadoop.mapred.JobConf jobConf, int numSplits)  
      java.util.List<org.apache.hadoop.mapreduce.InputSplit> getSplits​(org.apache.hadoop.mapreduce.JobContext context)  
      protected void validateConfiguration​(org.apache.hadoop.conf.Configuration conf)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • CqlInputFormat

        public CqlInputFormat()
    • Method Detail

      • getRecordReader

        public org.apache.hadoop.mapred.RecordReader<java.lang.Long,​com.datastax.driver.core.Row> getRecordReader​(org.apache.hadoop.mapred.InputSplit split,
                                                                                                                        org.apache.hadoop.mapred.JobConf jobConf,
                                                                                                                        org.apache.hadoop.mapred.Reporter reporter)
                                                                                                                 throws java.io.IOException
        Specified by:
        getRecordReader in interface org.apache.hadoop.mapred.InputFormat<java.lang.Long,​com.datastax.driver.core.Row>
        Throws:
        java.io.IOException
      • createRecordReader

        public org.apache.hadoop.mapreduce.RecordReader<java.lang.Long,​com.datastax.driver.core.Row> createRecordReader​(org.apache.hadoop.mapreduce.InputSplit arg0,
                                                                                                                              org.apache.hadoop.mapreduce.TaskAttemptContext arg1)
                                                                                                                       throws java.io.IOException,
                                                                                                                              java.lang.InterruptedException
        Specified by:
        createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<java.lang.Long,​com.datastax.driver.core.Row>
        Throws:
        java.io.IOException
        java.lang.InterruptedException
      • validateConfiguration

        protected void validateConfiguration​(org.apache.hadoop.conf.Configuration conf)
      • getSplits

        public java.util.List<org.apache.hadoop.mapreduce.InputSplit> getSplits​(org.apache.hadoop.mapreduce.JobContext context)
                                                                         throws java.io.IOException
        Specified by:
        getSplits in class org.apache.hadoop.mapreduce.InputFormat<java.lang.Long,​com.datastax.driver.core.Row>
        Throws:
        java.io.IOException
      • getSplits

        public org.apache.hadoop.mapred.InputSplit[] getSplits​(org.apache.hadoop.mapred.JobConf jobConf,
                                                               int numSplits)
                                                        throws java.io.IOException
        Specified by:
        getSplits in interface org.apache.hadoop.mapred.InputFormat<java.lang.Long,​com.datastax.driver.core.Row>
        Throws:
        java.io.IOException