Class StreamWriter
java.lang.Object
com.google.cloud.bigquery.storage.v1.StreamWriter
- All Implemented Interfaces:
AutoCloseable
A BigQuery Stream Writer that can be used to write data into BigQuery Table.
TODO: Support batching.
-
Nested Class Summary
Nested Classes -
Method Summary
Modifier and TypeMethodDescriptioncom.google.api.core.ApiFuture<AppendRowsResponse>append(ArrowRecordBatch recordBatch) Schedules the writing of Arrow record batch at the end of current stream.com.google.api.core.ApiFuture<AppendRowsResponse>append(ArrowRecordBatch recordBatch, long offset) Schedules the writing of Arrow record batch at given offset.com.google.api.core.ApiFuture<AppendRowsResponse>Schedules the writing of rows at the end of current stream.com.google.api.core.ApiFuture<AppendRowsResponse>Schedules the writing of rows at given offset.com.google.api.core.ApiFuture<AppendRowsResponse>append(org.apache.arrow.vector.ipc.message.ArrowRecordBatch recordBatch) Schedules the writing of Arrow record batch at the end of current stream.com.google.api.core.ApiFuture<AppendRowsResponse>append(org.apache.arrow.vector.ipc.message.ArrowRecordBatch recordBatch, long offset) Schedules the writing of Arrow record batch at given offset.voidclose()Close the stream writer.static longThe maximum size of one request.Returns the passed in Arrow user schema.static StringgetDefaultStreamName(TableName tableName) longReturns the wait of a request in Client side before sending to the Server.Returns the passed in Proto user schema.Thread-safe getter of updated TableSchema.booleanisClosed()booleanstatic StreamWriter.BuildernewBuilder(String streamName) Constructs a newStreamWriter.Builderusing the given stream.static StreamWriter.BuildernewBuilder(String streamName, BigQueryWriteClient client) Constructs a newStreamWriter.Builderusing the given stream and client.static voidsetMaxRequestCallbackWaitTime(Duration waitTime) Sets the maximum time a request is allowed to be waiting in request waiting queue.
-
Method Details
-
getApiMaxRequestBytes
public static long getApiMaxRequestBytes()The maximum size of one request. Defined by the API. -
append
Schedules the writing of Arrow record batch at the end of current stream. Since the StreamWriter doesn't know how many rows are in the batch, the OpenTelemetry row count metric will report 0 rows for the append. Please use the version of append method that accepts org.apache.arrow.vector.ipc.message.ArrowRecordBatch if OpenTelemetry row count is requried. Arrow schema is required to be set for the StreamWriter to use this method.- Parameters:
recordBatch- the Arrow record batch in serialized format to write to BigQuery.Since the serialized Arrow record batch doesn't contain schema, to use this method, the StreamWriter must have been created with Arrow schema.
- Returns:
- the append response wrapped in a future.
-
append
Schedules the writing of rows at the end of current stream.- Parameters:
rows- the rows in serialized format to write to BigQuery.- Returns:
- the append response wrapped in a future.
-
append
Schedules the writing of rows at given offset.Example of writing rows with specific offset.
ApiFuture<AppendRowsResponse> future = writer.append(rows, 0); ApiFutures.addCallback(future, new ApiFutureCallback<AppendRowsResponse>() { public void onSuccess(AppendRowsResponse response) { if (!response.hasError()) { System.out.println("written with offset: " + response.getAppendResult().getOffset()); } else { System.out.println("received an in stream error: " + response.getError().toString()); } } public void onFailure(Throwable t) { System.out.println("failed to write: " + t); } }, MoreExecutors.directExecutor());- Parameters:
rows- the rows in serialized format to write to BigQuery.offset- the offset of the first row. Provide -1 to write at the current end of stream.- Returns:
- the append response wrapped in a future.
-
append
public com.google.api.core.ApiFuture<AppendRowsResponse> append(ArrowRecordBatch recordBatch, long offset) Schedules the writing of Arrow record batch at given offset. Since the StreamWriter doesn't know how many rows are in the batch, the OpenTelemetry row count metric will report 0 rows for the append. Please use the version of append method that accepts org.apache.arrow.vector.ipc.message.ArrowRecordBatch if OpenTelemetry row count is requried. Arrow schema is required to be set for the StreamWriter to use this method.Example of writing Arrow record batch with specific offset.
ApiFuture<AppendRowsResponse> future = writer.append(recordBatch, 0); ApiFutures.addCallback(future, new ApiFutureCallback<AppendRowsResponse>() { public void onSuccess(AppendRowsResponse response) { if (!response.hasError()) { System.out.println("written with offset: " + response.getAppendResult().getOffset()); } else { System.out.println("received an in stream error: " + response.getError().toString()); } } public void onFailure(Throwable t) { System.out.println("failed to write: " + t); } }, MoreExecutors.directExecutor());- Parameters:
recordBatch- the ArrowRecordBatch in serialized format to write to BigQuery.offset- the offset of the first row. Provide -1 to write at the current end of stream.- Returns:
- the append response wrapped in a future.
-
append
public com.google.api.core.ApiFuture<AppendRowsResponse> append(org.apache.arrow.vector.ipc.message.ArrowRecordBatch recordBatch) Schedules the writing of Arrow record batch at the end of current stream. Arrow schema is required to be set for the StreamWriter to use this method.- Parameters:
recordBatch- the Arrow record batch to write to BigQuery.Since the serialized Arrow record batch doesn't contain schema, to use this method, the StreamWriter must have been created with Arrow schema. The ArrowRecordBatch will be closed after it is serialized.
- Returns:
- the append response wrapped in a future.
-
append
public com.google.api.core.ApiFuture<AppendRowsResponse> append(org.apache.arrow.vector.ipc.message.ArrowRecordBatch recordBatch, long offset) Schedules the writing of Arrow record batch at given offset. Arrow schema is required to be set for the StreamWriter to use this method.- Parameters:
recordBatch- the Arrow record batch to write to BigQuery.offset- the offset of the first row. Provide -1 to write at the current end of stream.The ArrowRecordBatch will be closed after it is serialized.
- Returns:
- the append response wrapped in a future.
-
getInflightWaitSeconds
public long getInflightWaitSeconds()Returns the wait of a request in Client side before sending to the Server. Request could wait in Client because it reached the client side inflight request limit (adjustable when constructing the StreamWriter). The value is the wait time for the last sent request. A constant high wait value indicates a need for more throughput, you can create a new Stream for to increase the throughput in exclusive stream case, or create a new Writer in the default stream case. -
getWriterId
- Returns:
- a unique Id for the writer.
-
getStreamName
- Returns:
- name of the Stream that this writer is working on.
-
getProtoSchema
Returns the passed in Proto user schema.- Returns:
- the passed in Proto user schema
-
getArrowSchema
Returns the passed in Arrow user schema.- Returns:
- the passed in Arrow user schema
-
getLocation
- Returns:
- the location of the destination.
-
getMissingValueInterpretationMap
- Returns:
- the missing value interpretation map used for the writer.
-
isClosed
public boolean isClosed()- Returns:
- if a stream writer can no longer be used for writing. It is due to either the StreamWriter is explicitly closed or the underlying connection is broken when connection pool is not used. Client should recreate StreamWriter in this case.
-
isUserClosed
public boolean isUserClosed()- Returns:
- if user explicitly closed the writer.
-
close
public void close()Close the stream writer. Shut down all resources.- Specified by:
closein interfaceAutoCloseable
-
newBuilder
Constructs a newStreamWriter.Builderusing the given stream and client. -
newBuilder
Constructs a newStreamWriter.Builderusing the given stream. -
getUpdatedSchema
Thread-safe getter of updated TableSchema.This will return the updated schema only when the creation timestamp of this writer is older than the updated schema.
-
setMaxRequestCallbackWaitTime
Sets the maximum time a request is allowed to be waiting in request waiting queue. Under very low chance, it's possible for append request to be waiting indefintely for request callback when Google networking SDK does not detect the networking breakage. The default timeout is 15 minutes. We are investigating the root cause for callback not triggered by networking SDK. -
getDefaultStreamName
- Returns:
- the default stream name associated with tableName
-