Package org.apache.cassandra.hadoop.cql3
Class CqlBulkOutputFormat
- java.lang.Object
-
- org.apache.hadoop.mapreduce.OutputFormat<java.lang.Object,java.util.List<java.nio.ByteBuffer>>
-
- org.apache.cassandra.hadoop.cql3.CqlBulkOutputFormat
-
- All Implemented Interfaces:
org.apache.hadoop.mapred.OutputFormat<java.lang.Object,java.util.List<java.nio.ByteBuffer>>
public class CqlBulkOutputFormat extends org.apache.hadoop.mapreduce.OutputFormat<java.lang.Object,java.util.List<java.nio.ByteBuffer>> implements org.apache.hadoop.mapred.OutputFormat<java.lang.Object,java.util.List<java.nio.ByteBuffer>>
TheCqlBulkOutputFormat
acts as a Hadoop-specific OutputFormat that allows reduce tasks to store keys (and corresponding bound variable values) as CQL rows (and respective columns) in a given table.As is the case with the
CqlOutputFormat
, you need to set the prepared statement in your Hadoop job Configuration. TheCqlConfigHelper
class, through itsorg.apache.cassandra.hadoop.ConfigHelper#setOutputPreparedStatement
method, is provided to make this simple. you need to set the Keyspace. TheConfigHelper
class, through itsConfigHelper.setOutputColumnFamily(org.apache.hadoop.conf.Configuration, java.lang.String)
method, is provided to make this simple.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
CqlBulkOutputFormat.NullOutputCommitter
-
Constructor Summary
Constructors Constructor Description CqlBulkOutputFormat()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description void
checkOutputSpecs(org.apache.hadoop.fs.FileSystem filesystem, org.apache.hadoop.mapred.JobConf job)
Deprecated.void
checkOutputSpecs(org.apache.hadoop.mapreduce.JobContext context)
static boolean
getDeleteSourceOnSuccess(org.apache.hadoop.conf.Configuration conf)
static java.util.Collection<java.lang.String>
getIgnoreHosts(org.apache.hadoop.conf.Configuration conf)
Get the hosts to ignore as a collection of stringsorg.apache.hadoop.mapreduce.OutputCommitter
getOutputCommitter(org.apache.hadoop.mapreduce.TaskAttemptContext context)
CqlBulkRecordWriter
getRecordWriter(org.apache.hadoop.fs.FileSystem filesystem, org.apache.hadoop.mapred.JobConf job, java.lang.String name, org.apache.hadoop.util.Progressable progress)
Deprecated.CqlBulkRecordWriter
getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context)
Get theRecordWriter
for the given task.static java.lang.String
getTableForAlias(org.apache.hadoop.conf.Configuration conf, java.lang.String alias)
static java.lang.String
getTableInsertStatement(org.apache.hadoop.conf.Configuration conf, java.lang.String columnFamily)
static java.lang.String
getTableSchema(org.apache.hadoop.conf.Configuration conf, java.lang.String columnFamily)
static void
setDeleteSourceOnSuccess(org.apache.hadoop.conf.Configuration conf, boolean deleteSrc)
static void
setIgnoreHosts(org.apache.hadoop.conf.Configuration conf, java.lang.String ignoreNodesCsv)
Set the hosts to ignore as comma delimited values.static void
setIgnoreHosts(org.apache.hadoop.conf.Configuration conf, java.lang.String... ignoreNodes)
Set the hosts to ignore.static void
setTableAlias(org.apache.hadoop.conf.Configuration conf, java.lang.String alias, java.lang.String columnFamily)
static void
setTableInsertStatement(org.apache.hadoop.conf.Configuration conf, java.lang.String columnFamily, java.lang.String insertStatement)
static void
setTableSchema(org.apache.hadoop.conf.Configuration conf, java.lang.String columnFamily, java.lang.String schema)
-
-
-
Method Detail
-
getRecordWriter
@Deprecated public CqlBulkRecordWriter getRecordWriter(org.apache.hadoop.fs.FileSystem filesystem, org.apache.hadoop.mapred.JobConf job, java.lang.String name, org.apache.hadoop.util.Progressable progress) throws java.io.IOException
Deprecated.Fills the deprecated OutputFormat interface for streaming.- Specified by:
getRecordWriter
in interfaceorg.apache.hadoop.mapred.OutputFormat<java.lang.Object,java.util.List<java.nio.ByteBuffer>>
- Throws:
java.io.IOException
-
getRecordWriter
public CqlBulkRecordWriter getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context) throws java.io.IOException, java.lang.InterruptedException
Get theRecordWriter
for the given task.- Specified by:
getRecordWriter
in classorg.apache.hadoop.mapreduce.OutputFormat<java.lang.Object,java.util.List<java.nio.ByteBuffer>>
- Parameters:
context
- the information about the current task.- Returns:
- a
RecordWriter
to write the output for the job. - Throws:
java.io.IOException
java.lang.InterruptedException
-
checkOutputSpecs
public void checkOutputSpecs(org.apache.hadoop.mapreduce.JobContext context)
- Specified by:
checkOutputSpecs
in classorg.apache.hadoop.mapreduce.OutputFormat<java.lang.Object,java.util.List<java.nio.ByteBuffer>>
-
checkOutputSpecs
@Deprecated public void checkOutputSpecs(org.apache.hadoop.fs.FileSystem filesystem, org.apache.hadoop.mapred.JobConf job) throws java.io.IOException
Deprecated.Fills the deprecated OutputFormat interface for streaming.- Specified by:
checkOutputSpecs
in interfaceorg.apache.hadoop.mapred.OutputFormat<java.lang.Object,java.util.List<java.nio.ByteBuffer>>
- Throws:
java.io.IOException
-
getOutputCommitter
public org.apache.hadoop.mapreduce.OutputCommitter getOutputCommitter(org.apache.hadoop.mapreduce.TaskAttemptContext context) throws java.io.IOException, java.lang.InterruptedException
- Specified by:
getOutputCommitter
in classorg.apache.hadoop.mapreduce.OutputFormat<java.lang.Object,java.util.List<java.nio.ByteBuffer>>
- Throws:
java.io.IOException
java.lang.InterruptedException
-
setTableSchema
public static void setTableSchema(org.apache.hadoop.conf.Configuration conf, java.lang.String columnFamily, java.lang.String schema)
-
setTableInsertStatement
public static void setTableInsertStatement(org.apache.hadoop.conf.Configuration conf, java.lang.String columnFamily, java.lang.String insertStatement)
-
getTableSchema
public static java.lang.String getTableSchema(org.apache.hadoop.conf.Configuration conf, java.lang.String columnFamily)
-
getTableInsertStatement
public static java.lang.String getTableInsertStatement(org.apache.hadoop.conf.Configuration conf, java.lang.String columnFamily)
-
setDeleteSourceOnSuccess
public static void setDeleteSourceOnSuccess(org.apache.hadoop.conf.Configuration conf, boolean deleteSrc)
-
getDeleteSourceOnSuccess
public static boolean getDeleteSourceOnSuccess(org.apache.hadoop.conf.Configuration conf)
-
setTableAlias
public static void setTableAlias(org.apache.hadoop.conf.Configuration conf, java.lang.String alias, java.lang.String columnFamily)
-
getTableForAlias
public static java.lang.String getTableForAlias(org.apache.hadoop.conf.Configuration conf, java.lang.String alias)
-
setIgnoreHosts
public static void setIgnoreHosts(org.apache.hadoop.conf.Configuration conf, java.lang.String ignoreNodesCsv)
Set the hosts to ignore as comma delimited values. Data will not be bulk loaded onto the ignored nodes.- Parameters:
conf
- job configurationignoreNodesCsv
- a comma delimited list of nodes to ignore
-
setIgnoreHosts
public static void setIgnoreHosts(org.apache.hadoop.conf.Configuration conf, java.lang.String... ignoreNodes)
Set the hosts to ignore. Data will not be bulk loaded onto the ignored nodes.- Parameters:
conf
- job configurationignoreNodes
- the nodes to ignore
-
getIgnoreHosts
public static java.util.Collection<java.lang.String> getIgnoreHosts(org.apache.hadoop.conf.Configuration conf)
Get the hosts to ignore as a collection of strings- Parameters:
conf
- job configuration- Returns:
- the nodes to ignore as a collection of stirngs
-
-