Interface StorageTransportExtension
-
public interface StorageTransportExtensionThe facade interface defines the contract of the extension for cloud storage data transport. It servers as the integration point for the library consumer. - Register callbacks for data transport progress - Supply necessary information, e.g. credentials, bucket, etc., to conduct a successful data transport
Notes for the interface implementors: - Not all methods defined in the interface are invoked in both Spark driver and executors. 1. The methods inCommonStorageTransportExtensionare invoked in both places. 2. The methods inExecutorStorageTransportExtensionare invoked in Spark executors only. 3. The methods inDriverStorageTransportExtensionare invoked in Spark driver only. - The Analytics library guarantees the following sequence in Spark driver on initialization 1. Create the newStorageTransportExtensioninstance 2. Invokeinitialize(String, SparkConf, boolean)3. InvokegetStorageConfiguration()4. InvokesetCredentialChangeListener(CredentialChangeListener)5. InvokesetObjectFailureListener(ObjectFailureListener)6. InvokesetCoordinationSignalListener(CoordinationSignalListener)
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description StorageTransportConfigurationgetStorageConfiguration()Returns theStorageTransportConfigurationvoidinitialize(java.lang.String jobId, org.apache.spark.SparkConf conf, boolean isOnDriver)Initializes the instance of this class after it has been created.voidonAllObjectsPersisted(long objectsCount, long rowCount, long elapsedMillis)Notifies the extension that all the objects have been persisted to the cloud storage successfully.voidonImportFailed(java.lang.String clusterId, java.lang.Throwable cause)Notifies theCoordinatedTransportExtensionimplementation that it fails to import objects into the cluster.voidonImportSucceeded(java.lang.String clusterId, long elapsedMillis)Notifies theCoordinatedTransportExtensionimplementation that all objects have been imported into the cluster.voidonJobFailed(long elapsedMillis, java.lang.Throwable throwable)Notifies the extension that the job has failed with exceptionthrowable.voidonJobSucceeded(long elapsedMillis)Notifies the extension that the job has completed successfully.voidonObjectApplied(java.lang.String bucket, java.lang.String key, long sizeInBytes, long elapsedMillis)Notifies the extension that the object identified by the bucket and key has been applied, meaning the SSTables included in the object is imported into Cassandra and satisfies the desired consistency level.voidonObjectPersisted(java.lang.String bucket, java.lang.String key, long sizeInBytes)Notifies the extension that theobjectURIhas been successfully persisted to the blob store.voidonStageFailed(java.lang.String clusterId, java.lang.Throwable cause)Notifies theCoordinatedTransportExtensionimplementation that it fails to stage objects on the cluster.voidonStageSucceeded(java.lang.String clusterId, long elapsedMillis)Notifies theCoordinatedTransportExtensionimplementation that all objects have been staged on the cluster.voidonTransportStart(long elapsedMillis)Notifies the extension that data transport has been started.voidsetCoordinationSignalListener(CoordinationSignalListener listener)Set theCoordinationSignalListenerto receive coordination signals fromCoordinatedTransportExtensionimplementationvoidsetCredentialChangeListener(CredentialChangeListener credentialChangeListener)Sets theCredentialChangeListenerto listen for token changes.voidsetObjectFailureListener(ObjectFailureListener objectFailureListener)Sets theObjectFailureListenerto listen for token changes.
-
-
-
Method Detail
-
initialize
void initialize(java.lang.String jobId, org.apache.spark.SparkConf conf, boolean isOnDriver)Initializes the instance of this class after it has been created. The initialization implementation could differentiate based on whether is it running on Spark driver or executor- Parameters:
jobId- the unique identifier for the job. It could either be supplied by customer withWriterOptions.JOB_ID, or a unique id string generated by the job on starting up, if no jobId is supplied.conf- the spark configurationisOnDriver- indicate whether the role of the runtime is Spark driver or executor
-
getStorageConfiguration
StorageTransportConfiguration getStorageConfiguration()
Returns theStorageTransportConfiguration- Returns:
- the
StorageTransportConfiguration
-
onObjectPersisted
void onObjectPersisted(java.lang.String bucket, java.lang.String key, long sizeInBytes)Notifies the extension that theobjectURIhas been successfully persisted to the blob store. This method will be called from each task during the job execution.- Parameters:
bucket- the bucket to which the file was writtenkey- the key to the object writtensizeInBytes- the size of the object, in bytes
-
onTransportStart
void onTransportStart(long elapsedMillis)
Notifies the extension that data transport has been started. This method will be called from the driver.- Parameters:
elapsedMillis- the elapsed time from the start of the bulk write run until this step for the job in milliseconds
-
setCredentialChangeListener
void setCredentialChangeListener(CredentialChangeListener credentialChangeListener)
Sets theCredentialChangeListenerto listen for token changes. This method will be called from the driver.- Parameters:
credentialChangeListener- an implementation of theCredentialChangeListener
-
setObjectFailureListener
void setObjectFailureListener(ObjectFailureListener objectFailureListener)
Sets theObjectFailureListenerto listen for token changes. This method will be called from the driver.- Parameters:
objectFailureListener- an implementation of theObjectFailureListener
-
onAllObjectsPersisted
void onAllObjectsPersisted(long objectsCount, long rowCount, long elapsedMillis)Notifies the extension that all the objects have been persisted to the cloud storage successfully. This method is called from driver when all executor tasks complete.- Parameters:
objectsCount- the total count of objects persistedrowCount- the total count of rows persistedelapsedMillis- the elapsed time from the start of the bulk write run until this step for the job in milliseconds
-
onObjectApplied
void onObjectApplied(java.lang.String bucket, java.lang.String key, long sizeInBytes, long elapsedMillis)Notifies the extension that the object identified by the bucket and key has been applied, meaning the SSTables included in the object is imported into Cassandra and satisfies the desired consistency level.
The notification is only emitted once per object and as soon as the consistency level is satisfied.- Parameters:
bucket- the belonging bucket of the objectkey- the object keysizeInBytes- the size of the object in byteselapsedMillis- the elapsed time from the start of the bulk write run until this step for the job in milliseconds
-
onJobSucceeded
void onJobSucceeded(long elapsedMillis)
Notifies the extension that the job has completed successfully. This method will be called from the driver at the end of the Spark Bulk Writer execution when the job succeeds.- Parameters:
elapsedMillis- the elapsed time from the start of the bulk write run until this step for the job in milliseconds
-
onJobFailed
void onJobFailed(long elapsedMillis, java.lang.Throwable throwable)Notifies the extension that the job has failed with exceptionthrowable. This method will be called from the driver at the end of the Spark Bulk Writer execution when the job fails.- Parameters:
elapsedMillis- the elapsed time from the start of the bulk write run until this step for the job in millisecondsthrowable- the exception encountered by the job
-
onStageSucceeded
void onStageSucceeded(java.lang.String clusterId, long elapsedMillis)Notifies theCoordinatedTransportExtensionimplementation that all objects have been staged on the cluster. The callback should only be invoked once per cluster- Parameters:
clusterId- identifies a Cassandra clusterelapsedMillis- the elapsed time from the start of the bulk write run in milliseconds
-
onStageFailed
void onStageFailed(java.lang.String clusterId, java.lang.Throwable cause)Notifies theCoordinatedTransportExtensionimplementation that it fails to stage objects on the cluster. The callback should only be invoked once per cluster- Parameters:
clusterId- identifies a Cassandra clustercause- failure
-
onImportSucceeded
void onImportSucceeded(java.lang.String clusterId, long elapsedMillis)Notifies theCoordinatedTransportExtensionimplementation that all objects have been imported into the cluster. The callback should only be invoked once per cluster- Parameters:
clusterId- identifies a Cassandra clusterelapsedMillis- the elapsed time from the start of the bulk write run in milliseconds
-
onImportFailed
void onImportFailed(java.lang.String clusterId, java.lang.Throwable cause)Notifies theCoordinatedTransportExtensionimplementation that it fails to import objects into the cluster. The callback should only be invoked once per cluster- Parameters:
clusterId- identifies a Cassandra clustercause- failure
-
setCoordinationSignalListener
void setCoordinationSignalListener(CoordinationSignalListener listener)
Set theCoordinationSignalListenerto receive coordination signals fromCoordinatedTransportExtensionimplementationNote to
CoordinatedTransportExtensionimplementor: this method is called during setup ofCassandraBulkSourceRelation, and aCoordinationSignalListenerinstance is provided- Parameters:
listener- receives coordination signals
-
-