Class

io.smartdatalake.workflow.action

SparkSubFeedAction

Related Doc: package action

Permalink

abstract class SparkSubFeedAction extends SparkAction

Linear Supertypes
SparkAction, Action, SmartDataLakeLogger, DAGNode, ParsableFromConfig[Action], SdlConfigObject, AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. SparkSubFeedAction
  2. SparkAction
  3. Action
  4. SmartDataLakeLogger
  5. DAGNode
  6. ParsableFromConfig
  7. SdlConfigObject
  8. AnyRef
  9. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new SparkSubFeedAction()

    Permalink

Abstract Value Members

  1. abstract def breakDataFrameLineage: Boolean

    Permalink

    Stop propagating input DataFrame through action and instead get a new DataFrame from DataObject.

    Stop propagating input DataFrame through action and instead get a new DataFrame from DataObject. This can help to save memory and performance if the input DataFrame includes many transformations from previous Actions. The new DataFrame will be initialized according to the SubFeed's partitionValues.

    Definition Classes
    SparkAction
  2. abstract def executionMode: Option[ExecutionMode]

    Permalink

    execution mode for this action.

    execution mode for this action.

    Definition Classes
    SparkAction
  3. abstract def factory: FromConfigFactory[Action]

    Permalink

    Returns the factory that can parse this type (that is, type CO).

    Returns the factory that can parse this type (that is, type CO).

    Typically, implementations of this method should return the companion object of the implementing class. The companion object in turn should implement FromConfigFactory.

    returns

    the factory (object) for this class.

    Definition Classes
    ParsableFromConfig
  4. abstract val id: ActionObjectId

    Permalink

    A unique identifier for this instance.

    A unique identifier for this instance.

    Definition Classes
    Action → SdlConfigObject
  5. abstract def input: DataObject with CanCreateDataFrame

    Permalink

    Input DataObject which can CanCreateDataFrame

  6. abstract def inputs: Seq[DataObject]

    Permalink

    Input DataObjects To be implemented by subclasses

    Input DataObjects To be implemented by subclasses

    Definition Classes
    Action
  7. abstract def metadata: Option[ActionMetadata]

    Permalink

    Additional metadata for the Action

    Additional metadata for the Action

    Definition Classes
    Action
  8. abstract def metricsFailCondition: Option[String]

    Permalink

    Spark SQL condition evaluated as where-clause against dataframe of metrics.

    Spark SQL condition evaluated as where-clause against dataframe of metrics. Available columns are dataObjectId, key, value. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.

    Definition Classes
    Action
  9. abstract def output: DataObject with CanWriteDataFrame

    Permalink

    Output DataObject which can CanWriteDataFrame

  10. abstract def outputs: Seq[DataObject]

    Permalink

    Output DataObjects To be implemented by subclasses

    Output DataObjects To be implemented by subclasses

    Definition Classes
    Action
  11. abstract def persist: Boolean

    Permalink

    Force persisting DataFrame on Disk.

    Force persisting DataFrame on Disk. This helps to reduce memory needed for caching the DataFrame content and can serve as a recovery point in case an task get's lost.

    Definition Classes
    SparkAction
  12. abstract def transform(subFeed: SparkSubFeed)(implicit session: SparkSession, context: ActionPipelineContext): SparkSubFeed

    Permalink

    Transform a SparkSubFeed.

    Transform a SparkSubFeed. To be implemented by subclasses.

    subFeed

    SparkSubFeed to be transformed

    returns

    transformed SparkSubFeed

Concrete Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def addRuntimeEvent(phase: ExecutionPhase, state: RuntimeEventState, msg: Option[String] = None, results: Seq[SubFeed] = Seq()): Unit

    Permalink

    Adds an action event

    Adds an action event

    Definition Classes
    Action
  5. def applyAdditional(subFeed: SparkSubFeed, additional: (SparkSubFeed, Option[DataFrame], Seq[String], LocalDateTime) ⇒ SparkSubFeed, output: TableDataObject)(implicit session: SparkSession, context: ActionPipelineContext): SparkSubFeed

    Permalink

    applies an optional additional transformation

    applies an optional additional transformation

    Definition Classes
    SparkAction
  6. def applyBlackWhitelists(subFeed: SparkSubFeed, columnBlacklist: Option[Seq[String]], columnWhitelist: Option[Seq[String]]): SparkSubFeed

    Permalink

    applies columnBlackList and columnWhitelist

    applies columnBlackList and columnWhitelist

    Definition Classes
    SparkAction
  7. def applyCastDecimal2IntegralFloat(subFeed: SparkSubFeed): SparkSubFeed

    Permalink

    applies type casting decimal -> integral/float

    applies type casting decimal -> integral/float

    Definition Classes
    SparkAction
  8. def applyCustomTransformation(inputSubFeed: SparkSubFeed, transformer: Option[CustomDfTransformerConfig])(implicit session: SparkSession): SparkSubFeed

    Permalink

    applies the transformers

    applies the transformers

    Definition Classes
    SparkAction
  9. def applyFilter(subFeed: SparkSubFeed, filterClauseExpr: Option[Column]): SparkSubFeed

    Permalink

    applies filterClauseExpr

    applies filterClauseExpr

    Definition Classes
    SparkAction
  10. def applyTransformations(inputSubFeed: SparkSubFeed, transformer: Option[CustomDfTransformerConfig], columnBlacklist: Option[Seq[String]], columnWhitelist: Option[Seq[String]], standardizeDatatypes: Boolean, output: DataObject, additional: Option[(SparkSubFeed, Option[DataFrame], Seq[String], LocalDateTime) ⇒ SparkSubFeed], filterClauseExpr: Option[Column] = None)(implicit session: SparkSession, context: ActionPipelineContext): SparkSubFeed

    Permalink

    applies all the transformations above

    applies all the transformations above

    Definition Classes
    SparkAction
  11. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  12. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  13. def enableRuntimeMetrics(): Unit

    Permalink

    Runtime metrics

    Runtime metrics

    Note: runtime metrics are disabled by default, because they are only collected when running Actions from an ActionDAG. This is not the case for Tests or other use cases. If enabled exceptions are thrown if metrics are not found.

    Definition Classes
    Action
  14. def enrichSubFeedDataFrame(input: DataObject with CanCreateDataFrame, subFeed: SparkSubFeed, phase: ExecutionPhase)(implicit session: SparkSession, context: ActionPipelineContext): SparkSubFeed

    Permalink

    Enriches SparkSubFeed with DataFrame if not existing

    Enriches SparkSubFeed with DataFrame if not existing

    input

    input data object.

    subFeed

    input SubFeed.

    Definition Classes
    SparkAction
  15. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  16. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  17. final def exec(subFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Seq[SubFeed]

    Permalink

    Action.exec implementation

    Action.exec implementation

    subFeeds

    SparkSubFeed's to be processed

    returns

    processed SparkSubFeed's

    Definition Classes
    SparkSubFeedAction → Action
  18. def filterDataFrame(df: DataFrame, partitionValues: Seq[PartitionValues], genericFilter: Option[Column]): DataFrame

    Permalink

    Filter DataFrame with given partition values

    Filter DataFrame with given partition values

    df

    DataFrame to filter

    partitionValues

    partition values to use as filter condition

    genericFilter

    filter expression to apply

    returns

    filtered DataFrame

    Definition Classes
    SparkAction
  19. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  20. def getAllLatestMetrics: Map[DataObjectId, Option[ActionMetrics]]

    Permalink
    Definition Classes
    Action
  21. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  22. def getFinalMetrics(dataObjectId: DataObjectId): Option[ActionMetrics]

    Permalink
    Definition Classes
    Action
  23. def getInputDataObject[T <: DataObject](id: DataObjectId)(implicit arg0: ClassTag[T], arg1: scala.reflect.api.JavaUniverse.TypeTag[T], registry: InstanceRegistry): T

    Permalink
    Attributes
    protected
    Definition Classes
    Action
  24. def getLatestMetrics(dataObjectId: DataObjectId): Option[ActionMetrics]

    Permalink
    Definition Classes
    Action
  25. def getOutputDataObject[T <: DataObject](id: DataObjectId)(implicit arg0: ClassTag[T], arg1: scala.reflect.api.JavaUniverse.TypeTag[T], registry: InstanceRegistry): T

    Permalink
    Attributes
    protected
    Definition Classes
    Action
  26. def getRuntimeInfo: Option[RuntimeInfo]

    Permalink

    get latest runtime information for this action

    get latest runtime information for this action

    Definition Classes
    Action
  27. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  28. final def init(subFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Seq[SubFeed]

    Permalink

    Action.init implementation

    Action.init implementation

    subFeeds

    SparkSubFeed's to be processed

    returns

    processed SparkSubFeed's

    Definition Classes
    SparkSubFeedAction → Action
  29. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  30. lazy val logger: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    SmartDataLakeLogger
  31. def multiTransformSubfeed(subFeed: SparkSubFeed, transformers: Seq[(DataFrame) ⇒ DataFrame]): SparkSubFeed

    Permalink

    applies multiple transformations to a sequence of subfeeds

    applies multiple transformations to a sequence of subfeeds

    Definition Classes
    SparkAction
  32. def multiTransformSubfeeds(subFeeds: Seq[SparkSubFeed], transformers: Seq[(DataFrame) ⇒ DataFrame]): Seq[SparkSubFeed]

    Permalink

    applies multiple transformations to a sequence of subfeeds

    applies multiple transformations to a sequence of subfeeds

    Definition Classes
    SparkAction
  33. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  34. def nodeId: String

    Permalink

    provide an implementation of the DAG node id

    provide an implementation of the DAG node id

    Definition Classes
    Action → DAGNode
  35. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  36. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  37. def onRuntimeMetrics(dataObjectId: Option[DataObjectId], metrics: ActionMetrics): Unit

    Permalink
    Definition Classes
    Action
  38. final def postExec(inputSubFeeds: Seq[SubFeed], outputSubFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink

    Executes operations needed after executing an action.

    Executes operations needed after executing an action. In this step any phase on Input- or Output-DataObjects needed after the main task is executed, e.g. JdbcTableDataObjects postWriteSql or CopyActions deleteInputData.

    Definition Classes
    SparkSubFeedAction → Action
  39. def postExecSubFeed(inputSubFeed: SubFeed, outputSubFeed: SubFeed)(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink
  40. def preExec(subFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink

    Executes operations needed before executing an action.

    Executes operations needed before executing an action. In this step any phase on Input- or Output-DataObjects needed before the main task is executed, e.g. JdbcTableDataObjects preWriteSql

    Definition Classes
    Action
  41. def prepare(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink

    Prepare DataObjects prerequisites.

    Prepare DataObjects prerequisites. In this step preconditions are prepared & tested: - connections can be created - needed structures exist, e.g Kafka topic or Jdbc table

    This runs during the "prepare" phase of the DAG.

    Definition Classes
    SparkAction → Action
  42. def prepareInputSubFeed(subFeed: SparkSubFeed, input: DataObject with CanCreateDataFrame)(implicit session: SparkSession, context: ActionPipelineContext): SparkSubFeed

    Permalink

    Applies changes to a SubFeed from a previous action in order to be used as input for this actions transformation.

    Applies changes to a SubFeed from a previous action in order to be used as input for this actions transformation.

    Definition Classes
    SparkAction
  43. def recursiveInputs: Seq[DataObject with CanCreateDataFrame]

    Permalink

    Recursive Inputs are not supported on SparkSubFeedAction (only on SparkSubFeedsAction) so set to empty Seq

    Recursive Inputs are not supported on SparkSubFeedAction (only on SparkSubFeedsAction) so set to empty Seq

    Definition Classes
    SparkSubFeedAction → Action
  44. def reset(): Unit

    Permalink

    Resets the runtime state of this Action This is mainly used for testing

    Resets the runtime state of this Action This is mainly used for testing

    Definition Classes
    Action
  45. def setSparkJobMetadata(operation: Option[String] = None)(implicit session: SparkSession): Unit

    Permalink

    Sets the util job description for better traceability in the Spark UI

    Sets the util job description for better traceability in the Spark UI

    Note: This sets Spark local properties, which are propagated to the respective executor tasks. We rely on this to match metrics back to Actions and DataObjects. As writing to a DataObject on the Driver happens uninterrupted in the same exclusive thread, this is suitable.

    operation

    phase description (be short...)

    Definition Classes
    Action
  46. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  47. final def toString(): String

    Permalink

    This is displayed in ascii graph visualization

    This is displayed in ascii graph visualization

    Definition Classes
    Action → AnyRef → Any
  48. def toStringMedium: String

    Permalink
    Definition Classes
    Action
  49. def toStringShort: String

    Permalink
    Definition Classes
    Action
  50. def transformSubfeeds(subFeeds: Seq[SparkSubFeed], transformer: (DataFrame) ⇒ DataFrame): Seq[SparkSubFeed]

    Permalink

    transform sequence of subfeeds

    transform sequence of subfeeds

    Definition Classes
    SparkAction
  51. def updateSubFeedAfterWrite(subFeed: SparkSubFeed)(implicit session: SparkSession, context: ActionPipelineContext): SparkSubFeed

    Permalink
    Definition Classes
    SparkAction
  52. def validateAndUpdateSubFeedPartitionValues(output: DataObject, subFeed: SparkSubFeed)(implicit session: SparkSession): SparkSubFeed

    Permalink

    Updates the partition values of a SubFeed to the partition columns of an output, removing not existing columns from the partition values.

    Updates the partition values of a SubFeed to the partition columns of an output, removing not existing columns from the partition values. Further the transformed DataFrame is validated to have the output's partition columns included.

    output

    output DataObject

    subFeed

    SubFeed with transformed DataFrame

    returns

    SubFeed with updated partition values.

    Definition Classes
    SparkAction
  53. def validateDataFrameContainsCols(df: DataFrame, columns: Seq[String], debugName: String): Unit

    Permalink

    Validate that DataFrame contains a given list of columns, throwing an exception otherwise.

    Validate that DataFrame contains a given list of columns, throwing an exception otherwise.

    df

    DataFrame to validate

    columns

    Columns that must exist in DataFrame

    debugName

    name to mention in exception

    Definition Classes
    SparkAction
  54. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  55. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  56. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  57. def writeSubFeed(subFeed: SparkSubFeed, output: DataObject with CanWriteDataFrame)(implicit session: SparkSession): Boolean

    Permalink

    writes subfeed to output respecting given execution mode

    writes subfeed to output respecting given execution mode

    returns

    true if no data was transfered, otherwise false

    Definition Classes
    SparkAction

Inherited from SparkAction

Inherited from Action

Inherited from SmartDataLakeLogger

Inherited from DAGNode

Inherited from ParsableFromConfig[Action]

Inherited from SdlConfigObject

Inherited from AnyRef

Inherited from Any

Ungrouped