Package org.apache.cassandra.hadoop.cql3
Class CqlInputFormat
- java.lang.Object
-
- org.apache.hadoop.mapreduce.InputFormat<java.lang.Long,com.datastax.driver.core.Row>
-
- org.apache.cassandra.hadoop.cql3.CqlInputFormat
-
- All Implemented Interfaces:
org.apache.hadoop.mapred.InputFormat<java.lang.Long,com.datastax.driver.core.Row>
public class CqlInputFormat extends org.apache.hadoop.mapreduce.InputFormat<java.lang.Long,com.datastax.driver.core.Row> implements org.apache.hadoop.mapred.InputFormat<java.lang.Long,com.datastax.driver.core.Row>
Hadoop InputFormat allowing map/reduce against Cassandra rows within one ColumnFamily. At minimum, you need to set the KS and CF in your Hadoop job Configuration. The ConfigHelper class is provided to make this simple: ConfigHelper.setInputColumnFamily You can also configure the number of rows per InputSplit with 1: ConfigHelper.setInputSplitSize. The default split size is 64k rows. or 2: ConfigHelper.setInputSplitSizeInMb. InputSplit size in MB with new, more precise method If no value is provided for InputSplitSizeInMb, we default to using InputSplitSize. CQLConfigHelper.setInputCQLPageRowSize. The default page row size is 1000. You should set it to "as big as possible, but no bigger." It set the LIMIT for the CQL query, so you need set it big enough to minimize the network overhead, and also not too big to avoid out of memory issue. other native protocol connection parameters in CqlConfigHelper
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
MAPRED_TASK_ID
-
Constructor Summary
Constructors Constructor Description CqlInputFormat()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description org.apache.hadoop.mapreduce.RecordReader<java.lang.Long,com.datastax.driver.core.Row>
createRecordReader(org.apache.hadoop.mapreduce.InputSplit arg0, org.apache.hadoop.mapreduce.TaskAttemptContext arg1)
org.apache.hadoop.mapred.RecordReader<java.lang.Long,com.datastax.driver.core.Row>
getRecordReader(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf jobConf, org.apache.hadoop.mapred.Reporter reporter)
org.apache.hadoop.mapred.InputSplit[]
getSplits(org.apache.hadoop.mapred.JobConf jobConf, int numSplits)
java.util.List<org.apache.hadoop.mapreduce.InputSplit>
getSplits(org.apache.hadoop.mapreduce.JobContext context)
protected void
validateConfiguration(org.apache.hadoop.conf.Configuration conf)
-
-
-
Field Detail
-
MAPRED_TASK_ID
public static final java.lang.String MAPRED_TASK_ID
- See Also:
- Constant Field Values
-
-
Method Detail
-
getRecordReader
public org.apache.hadoop.mapred.RecordReader<java.lang.Long,com.datastax.driver.core.Row> getRecordReader(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf jobConf, org.apache.hadoop.mapred.Reporter reporter) throws java.io.IOException
- Specified by:
getRecordReader
in interfaceorg.apache.hadoop.mapred.InputFormat<java.lang.Long,com.datastax.driver.core.Row>
- Throws:
java.io.IOException
-
createRecordReader
public org.apache.hadoop.mapreduce.RecordReader<java.lang.Long,com.datastax.driver.core.Row> createRecordReader(org.apache.hadoop.mapreduce.InputSplit arg0, org.apache.hadoop.mapreduce.TaskAttemptContext arg1) throws java.io.IOException, java.lang.InterruptedException
- Specified by:
createRecordReader
in classorg.apache.hadoop.mapreduce.InputFormat<java.lang.Long,com.datastax.driver.core.Row>
- Throws:
java.io.IOException
java.lang.InterruptedException
-
validateConfiguration
protected void validateConfiguration(org.apache.hadoop.conf.Configuration conf)
-
getSplits
public java.util.List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context) throws java.io.IOException
- Specified by:
getSplits
in classorg.apache.hadoop.mapreduce.InputFormat<java.lang.Long,com.datastax.driver.core.Row>
- Throws:
java.io.IOException
-
getSplits
public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf jobConf, int numSplits) throws java.io.IOException
- Specified by:
getSplits
in interfaceorg.apache.hadoop.mapred.InputFormat<java.lang.Long,com.datastax.driver.core.Row>
- Throws:
java.io.IOException
-
-