Class BroadcastableJobInfo
- java.lang.Object
-
- org.apache.cassandra.spark.bulkwriter.BroadcastableJobInfo
-
- All Implemented Interfaces:
java.io.Serializable
public final class BroadcastableJobInfo extends java.lang.Object implements java.io.SerializableBroadcastable wrapper for job information with ZERO transient fields to optimize Spark broadcasting.Only essential fields are broadcast; executors reconstruct CassandraJobInfo to rebuild TokenPartitioner.
Why ZERO transient fields matters:
Spark'sSizeEstimatoruses reflection to estimate object sizes before broadcasting. Each transient field forces SizeEstimator to inspect the field's type hierarchy, which is expensive. Logger references are particularly costly due to their deep object graphs (appenders, layouts, contexts). By eliminating ALL transient fields and Logger references, we:- Minimize SizeEstimator reflection overhead during broadcast preparation
- Reduce broadcast variable serialization size
- Avoid accidental serialization of non-serializable objects
- See Also:
- Serialized Form
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static BroadcastableJobInfofrom(JobInfo source, BulkSparkConf conf)Creates a BroadcastableJobInfo from a source JobInfo.BroadcastableTokenPartitionergetBroadcastableTokenPartitioner()BulkSparkConfgetConf()MultiClusterContainer<java.util.UUID>getRestoreJobIds()
-
-
-
Method Detail
-
from
public static BroadcastableJobInfo from(@NotNull JobInfo source, @NotNull BulkSparkConf conf)
Creates a BroadcastableJobInfo from a source JobInfo. Extracts partition mappings from TokenPartitioner to avoid broadcasting Logger.- Parameters:
source- the source JobInfo (typically CassandraJobInfo)conf- the BulkSparkConf needed for executors
-
getConf
public BulkSparkConf getConf()
-
getRestoreJobIds
public MultiClusterContainer<java.util.UUID> getRestoreJobIds()
-
getBroadcastableTokenPartitioner
public BroadcastableTokenPartitioner getBroadcastableTokenPartitioner()
-
-