Packages

package sources

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. Protected

Type Members

  1. class ConsoleWrite extends StreamingWrite with Logging

    Common methods used to create writes for the console sink

  2. class ContinuousMemoryStream[A] extends MemoryStreamBase[A] with ContinuousStream

    The overall strategy here is: * ContinuousMemoryStream maintains a list of records for each partition.

    The overall strategy here is: * ContinuousMemoryStream maintains a list of records for each partition. addData() will distribute records evenly-ish across partitions. * RecordEndpoint is set up as an endpoint for executor-side ContinuousMemoryStreamInputPartitionReader instances to poll. It returns the record at the specified offset within the list, or null if that offset doesn't yet have a record.

  3. case class ContinuousMemoryStreamInputPartition(driverEndpointName: String, partition: Int, startOffset: Int) extends InputPartition with Product with Serializable

    An input partition for continuous memory stream.

  4. case class ContinuousMemoryStreamOffset(partitionNums: Map[Int, Int]) extends connector.read.streaming.Offset with Product with Serializable
  5. class ContinuousMemoryStreamPartitionReader extends ContinuousPartitionReader[InternalRow]

    An input partition reader for continuous memory stream.

    An input partition reader for continuous memory stream.

    Polls the driver endpoint for new records.

  6. class ForeachBatchSink[T] extends Sink
  7. class ForeachDataWriter[T] extends DataWriter[InternalRow]

    A DataWriter which writes data in this partition to a ForeachWriter.

    A DataWriter which writes data in this partition to a ForeachWriter.

    T

    The type expected by the writer.

  8. class ForeachWrite[T] extends Write
  9. case class ForeachWriterFactory[T](writer: ForeachWriter[T], rowConverter: (InternalRow) => T) extends StreamingDataWriterFactory with Product with Serializable
  10. case class ForeachWriterTable[T](writer: ForeachWriter[T], converter: Either[ExpressionEncoder[T], (InternalRow) => T]) extends Table with SupportsWrite with Product with Serializable

    A write-only table for forwarding data into the specified ForeachWriter.

    A write-only table for forwarding data into the specified ForeachWriter.

    T

    The expected type of the sink.

    writer

    The ForeachWriter to process all data.

    converter

    An object to convert internal rows to target type T. Either it can be a ExpressionEncoder or a direct converter function.

  11. class MemoryDataWriter extends DataWriter[InternalRow] with Logging
  12. case class MemoryPlan(sink: MemorySink, output: Seq[Attribute]) extends LogicalPlan with LeafNode with Product with Serializable

    Used to query the data that has been written into a MemorySink.

  13. class MemorySink extends Table with SupportsWrite with Logging

    A sink that stores the results in memory.

    A sink that stores the results in memory. This org.apache.spark.sql.execution.streaming.Sink is primarily intended for use in unit tests and does not provide durability.

  14. class MemoryStreamingWrite extends StreamingWrite
  15. class MemoryWrite extends Write
  16. case class MemoryWriterCommitMessage(partition: Int, data: Seq[Row]) extends WriterCommitMessage with Product with Serializable
  17. case class MemoryWriterFactory(schema: StructType) extends DataWriterFactory with StreamingDataWriterFactory with Product with Serializable
  18. class MicroBatchWrite extends BatchWrite

    A BatchWrite used to hook V2 stream writers into a microbatch plan.

    A BatchWrite used to hook V2 stream writers into a microbatch plan. It implements the non-streaming interface, forwarding the epoch ID determined at construction to a wrapped streaming write support.

  19. class MicroBatchWriterFactory extends DataWriterFactory
  20. case class PackedRowCommitMessage(rows: Array[InternalRow]) extends WriterCommitMessage with Product with Serializable

    Commit message for a PackedRowDataWriter, containing all the rows written in the most recent interval.

  21. class PackedRowDataWriter extends DataWriter[InternalRow] with Logging

    A simple DataWriter that just sends all the rows it's received as a commit message.

  22. trait PythonForeachBatchFunction extends AnyRef

    Interface that is meant to be extended by Python classes via Py4J.

    Interface that is meant to be extended by Python classes via Py4J. Py4J allows Python classes to implement Java interfaces so that the JVM can call back Python objects. In this case, this allows the user-defined Python foreachBatch function to be called from JVM when the query is active.

  23. class RatePerMicroBatchProvider extends SimpleTableProvider with DataSourceRegister

    A source that generates increment long values with timestamps.

    A source that generates increment long values with timestamps. Each generated row has two columns: a timestamp column for the generated time and an auto increment long column starting with 0L.

    This source supports the following options:

    • rowsPerBatch (e.g. 100): How many rows should be generated per micro-batch.
    • numPartitions (e.g. 10, default: Spark's default parallelism): The partition number for the generated rows.
    • startTimestamp (e.g. 1000, default: 0): starting value of generated time
    • advanceMillisPerBatch (e.g. 1000, default: 1000): the amount of time being advanced in generated time on each micro-batch.

    Unlike rate data source, this data source provides a consistent set of input rows per micro-batch regardless of query execution (configuration of trigger, query being lagging, etc.), say, batch 0 will produce 0~999 and batch 1 will produce 1000~1999, and so on. Same applies to the generated time.

    As the name represents, this data source only supports micro-batch read.

  24. class RatePerMicroBatchStream extends SupportsTriggerAvailableNow with MicroBatchStream with Logging
  25. case class RatePerMicroBatchStreamInputPartition(partitionId: Int, numPartitions: Int, startOffset: Long, startTimestamp: Long, endOffset: Long, endTimestamp: Long) extends InputPartition with Product with Serializable
  26. case class RatePerMicroBatchStreamOffset(offset: Long, timestamp: Long) extends connector.read.streaming.Offset with Product with Serializable
  27. class RatePerMicroBatchStreamPartitionReader extends PartitionReader[InternalRow]
  28. class RatePerMicroBatchTable extends Table with SupportsRead
  29. case class RateStreamMicroBatchInputPartition(partitionId: Int, numPartitions: Int, rangeStart: Long, rangeEnd: Long, localStartTimeMs: Long, relativeMsPerValue: Double) extends InputPartition with Product with Serializable
  30. class RateStreamMicroBatchPartitionReader extends PartitionReader[InternalRow]
  31. class RateStreamMicroBatchStream extends MicroBatchStream with Logging
  32. class RateStreamProvider extends SimpleTableProvider with DataSourceRegister

    A source that generates increment long values with timestamps.

    A source that generates increment long values with timestamps. Each generated row has two columns: a timestamp column for the generated time and an auto increment long column starting with 0L.

    This source supports the following options:

    • rowsPerSecond (e.g. 100, default: 1): How many rows should be generated per second.
    • rampUpTime (e.g. 5s, default: 0s): How long to ramp up before the generating speed becomes rowsPerSecond. Using finer granularities than seconds will be truncated to integer seconds.
    • numPartitions (e.g. 10, default: Spark's default parallelism): The partition number for the generated rows. The source will try its best to reach rowsPerSecond, but the query may be resource constrained, and numPartitions can be tweaked to help reach the desired speed.
  33. class RateStreamTable extends Table with SupportsRead
  34. case class TextSocketInputPartition(slice: ListBuffer[(UTF8String, Long)]) extends InputPartition with Product with Serializable
  35. class TextSocketMicroBatchStream extends MicroBatchStream with Logging

    A MicroBatchReadSupport that reads text lines through a TCP socket, designed only for tutorials and debugging.

    A MicroBatchReadSupport that reads text lines through a TCP socket, designed only for tutorials and debugging. This MicroBatchReadSupport will *not* work in production applications due to multiple reasons, including no support for fault recovery.

  36. class TextSocketSourceProvider extends SimpleTableProvider with DataSourceRegister with Logging
  37. class TextSocketTable extends Table with SupportsRead
  38. case class WriteToMicroBatchDataSource(relation: Option[DataSourceV2Relation], table: SupportsWrite, query: LogicalPlan, queryId: String, writeOptions: Map[String, String], outputMode: OutputMode, batchId: Option[Long] = None) extends LogicalPlan with UnaryNode with Product with Serializable

    The logical plan for writing data to a micro-batch stream.

    The logical plan for writing data to a micro-batch stream.

    Note that this logical plan does not have a corresponding physical plan, as it will be converted to WriteToDataSourceV2 with MicroBatchWrite before execution.

  39. case class WriteToMicroBatchDataSourceV1(catalogTable: Option[CatalogTable], sink: Sink, query: LogicalPlan, queryId: String, writeOptions: Map[String, String], outputMode: OutputMode, batchId: Option[Long] = None) extends LogicalPlan with UnaryNode with Product with Serializable

    Marker node to represent a DSv1 sink on streaming query.

    Marker node to represent a DSv1 sink on streaming query.

    Despite this is expected to be the top node, this node should behave like "pass-through" since the DSv1 codepath on microbatch execution handles sink operation separately.

    This node is eliminated in streaming specific optimization phase, which means there is no matching physical node.

Value Members

  1. object ContinuousMemoryStream
  2. object ContinuousMemoryStreamReaderFactory extends ContinuousPartitionReaderFactory
  3. case object ForeachWriterCommitMessage extends WriterCommitMessage with Product with Serializable

    An empty WriterCommitMessage.

    An empty WriterCommitMessage. ForeachWriter implementations have no global coordination.

  4. object ForeachWriterTable extends Serializable
  5. case object PackedRowWriterFactory extends StreamingDataWriterFactory with Product with Serializable

    A simple org.apache.spark.sql.connector.write.DataWriterFactory whose tasks just pack rows into the commit message for delivery to a org.apache.spark.sql.connector.write.BatchWrite on the driver.

    A simple org.apache.spark.sql.connector.write.DataWriterFactory whose tasks just pack rows into the commit message for delivery to a org.apache.spark.sql.connector.write.BatchWrite on the driver.

    Note that, because it sends all rows to the driver, this factory will generally be unsuitable for production-quality sinks. It's intended for use in tests.

  6. object PythonForeachBatchHelper
  7. object RatePerMicroBatchProvider
  8. object RatePerMicroBatchStreamOffset extends Serializable
  9. object RatePerMicroBatchStreamReaderFactory extends PartitionReaderFactory
  10. object RateStreamMicroBatchReaderFactory extends PartitionReaderFactory
  11. object RateStreamProvider
  12. object TextSocketReader

Ungrouped