package shuffle
Type Members
- trait BlockFetchingListener extends BlockTransferListener
- trait BlockPushingListener extends BlockTransferListener
Callback to handle block push success and failure.
Callback to handle block push success and failure. This interface and
BlockFetchingListenerare unified underBlockTransferListenerto allow code reuse for handling block push and fetch retry. - abstract class BlockStoreClient extends Closeable
Provides an interface for reading both shuffle files and RDD blocks, either from an Executor or external service.
- trait BlockTransferListener extends EventListener
This interface unifies both
BlockFetchingListenerandBlockPushingListenerunder a single interface to allow code reuse, while also keeping the existing public interface to facilitate backward compatibility. - class Constants extends AnyRef
- trait DownloadFile extends AnyRef
A handle on the file used when fetching remote data to disk.
A handle on the file used when fetching remote data to disk. Used to ensure the lifecycle of writing the data, reading it back, and then cleaning it up is followed. Specific implementations may also handle encryption. The data can be read only via DownloadFileWritableChannel, which ensures data is not read until after the writer is closed.
- trait DownloadFileManager extends AnyRef
A manager to create temp block files used when fetching remote data to reduce the memory usage.
A manager to create temp block files used when fetching remote data to reduce the memory usage. It will clean files when they won't be used any more.
- trait DownloadFileWritableChannel extends WritableByteChannel
A channel for writing data which is fetched to disk, which allows access to the written data only after the writer has been closed.
A channel for writing data which is fetched to disk, which allows access to the written data only after the writer has been closed. Used with DownloadFile and DownloadFileManager.
- trait ErrorHandler extends AnyRef
Plugs into
RetryingBlockTransferorto further control when an exception should be retried and logged.Plugs into
RetryingBlockTransferorto further control when an exception should be retried and logged. Note:RetryingBlockTransferorwill delegate the exception to this handler only when - remaining retries < max retries - exception is an IOException- Annotations
- @Evolving()
- Since
3.1.0
- class ExecutorDiskUtils extends AnyRef
- class ExternalBlockHandler extends RpcHandler with MergedBlockMetaReqHandler
RPC Handler for a server which can serve both RDD blocks and shuffle blocks from outside of an Executor process.
RPC Handler for a server which can serve both RDD blocks and shuffle blocks from outside of an Executor process.
Handles registering executors and opening shuffle or disk persisted RDD blocks from them. Blocks are registered with the "one-for-one" strategy, meaning each Transport-layer Chunk is equivalent to one block.
- class ExternalBlockStoreClient extends BlockStoreClient
Client for reading both RDD blocks and shuffle blocks which points to an external (outside of executor) server.
Client for reading both RDD blocks and shuffle blocks which points to an external (outside of executor) server. This is instead of reading blocks directly from other executors (via BlockTransferService), which has the downside of losing the data if we lose the executors.
- class ExternalShuffleBlockResolver extends AnyRef
Manages converting shuffle BlockIds into physical segments of local files, from a process outside of Executors.
Manages converting shuffle BlockIds into physical segments of local files, from a process outside of Executors. Each Executor must register its own configuration about where it stores its files (local dirs) and how (shuffle manager). The logic for retrieval of individual files is replicated from Spark's IndexShuffleBlockResolver.
- trait MergeFinalizerListener extends EventListener
:: DeveloperApi ::
:: DeveloperApi ::
Listener providing a callback function to invoke when driver receives the response for the finalize shuffle merge request sent to remote shuffle service.
- Since
3.1.0
- class MergedBlockMeta extends AnyRef
Contains meta information for a merged block.
Contains meta information for a merged block. Currently this information constitutes: 1. Number of chunks in a merged shuffle block. 2. Bitmaps for each chunk in the merged block. A chunk bitmap contains all the mapIds that were merged to that merged block chunk.
- Since
3.1.0
- trait MergedBlocksMetaListener extends EventListener
Listener for receiving success or failure events when fetching meta of merged blocks.
Listener for receiving success or failure events when fetching meta of merged blocks.
- Since
3.2.0
- trait MergedShuffleFileManager extends AnyRef
The MergedShuffleFileManager is used to process push based shuffle when enabled.
The MergedShuffleFileManager is used to process push based shuffle when enabled. It works along side
ExternalBlockHandlerand serves as an RPCHandler fororg.apache.spark.network.server.RpcHandler#receiveStream, where it processes the remotely pushed streams of shuffle blocks to merge them into merged shuffle files. Right now, support for push based shuffle is only implemented for external shuffle service in YARN mode.- Annotations
- @Evolving()
- Since
3.1.0
- class NoOpMergedShuffleFileManager extends MergedShuffleFileManager
Dummy implementation of merged shuffle file manager.
Dummy implementation of merged shuffle file manager. Suitable for when push-based shuffle is not enabled.
- Since
3.1.0
- class OneForOneBlockFetcher extends AnyRef
Simple wrapper on top of a TransportClient which interprets each chunk as a whole block, and invokes the BlockFetchingListener appropriately.
Simple wrapper on top of a TransportClient which interprets each chunk as a whole block, and invokes the BlockFetchingListener appropriately. This class is agnostic to the actual RPC handler, as long as there is a single "open blocks" message which returns a ShuffleStreamHandle, and Java serialization is used.
Note that this typically corresponds to a
org.apache.spark.network.server.OneForOneStreamManageron the server side. - class OneForOneBlockPusher extends AnyRef
Similar to
OneForOneBlockFetcher, but for pushing blocks to remote shuffle service to be merged instead of for fetching them from remote shuffle services.Similar to
OneForOneBlockFetcher, but for pushing blocks to remote shuffle service to be merged instead of for fetching them from remote shuffle services. This is used by ShuffleWriter when the block push process is initiated. The supplied BlockFetchingListener is used to handle the success or failure in pushing each blocks.- Since
3.1.0
- class RemoteBlockPushResolver extends MergedShuffleFileManager
An implementation of
MergedShuffleFileManagerthat provides the most essential shuffle service processing logic to support push based shuffle.An implementation of
MergedShuffleFileManagerthat provides the most essential shuffle service processing logic to support push based shuffle.- Since
3.1.0
- class RetryingBlockTransferor extends AnyRef
Wraps another BlockFetcher or BlockPusher with the ability to automatically retry block transfers which fail due to IOExceptions, which we hope are due to transient network conditions.
Wraps another BlockFetcher or BlockPusher with the ability to automatically retry block transfers which fail due to IOExceptions, which we hope are due to transient network conditions.
This transferor provides stronger guarantees regarding the parent BlockTransferListener. In particular, the listener will be invoked exactly once per blockId, with a success or failure.
- class ShuffleIndexInformation extends AnyRef
Keeps the index information for a particular map output as an in-memory LongBuffer.
- class ShuffleIndexRecord extends AnyRef
Contains offset and length of the shuffle block data.
- class SimpleDownloadFile extends DownloadFile
A DownloadFile that does not take any encryption settings into account for reading and writing data.
A DownloadFile that does not take any encryption settings into account for reading and writing data.
This does *not* mean the data in the file is unencrypted -- it could be that the data is already encrypted when its written, and subsequent layer is responsible for decrypting.