Package org.apache.parquet.hadoop
Class ParquetWriter.Builder<T,SELF extends ParquetWriter.Builder<T,SELF>>
- java.lang.Object
-
- org.apache.parquet.hadoop.ParquetWriter.Builder<T,SELF>
-
- Type Parameters:
T- The type of objects written by the constructed ParquetWriter.SELF- The type of this builder that is returned by builder methods
- Direct Known Subclasses:
ExampleParquetWriter.Builder
- Enclosing class:
- ParquetWriter<T>
public abstract static class ParquetWriter.Builder<T,SELF extends ParquetWriter.Builder<T,SELF>> extends Object
An abstract builder class for ParquetWriter instances. Object models should extend this builder to provide writer configuration options.
-
-
Constructor Summary
Constructors Modifier Constructor Description protectedBuilder(org.apache.hadoop.fs.Path path)protectedBuilder(OutputFile path)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Deprecated Methods Modifier and Type Method Description ParquetWriter<T>build()Build aParquetWriterwith the accumulated configuration.SELFconfig(String property, String value)Set a property that will be available to the read path.SELFenableDictionaryEncoding()Enables dictionary encoding for the constructed writer.SELFenablePageWriteChecksum()Enables writing page level checksums for the constructed writer.SELFenableValidation()Enables validation for the constructed writer.protected abstract WriteSupport<T>getWriteSupport(org.apache.hadoop.conf.Configuration conf)protected abstract SELFself()SELFwithBloomFilterEnabled(boolean enabled)Sets the bloom filter enabled/disabledSELFwithBloomFilterEnabled(String columnPath, boolean enabled)Sets the bloom filter enabled/disabled for the specified column.SELFwithBloomFilterNDV(String columnPath, long ndv)Sets the NDV (number of distinct values) for the specified column.SELFwithByteStreamSplitEncoding(boolean enableByteStreamSplit)SELFwithCompressionCodec(org.apache.parquet.hadoop.metadata.CompressionCodecName codecName)Set thecompression codecused by the constructed writer.SELFwithConf(org.apache.hadoop.conf.Configuration conf)Set theConfigurationused by the constructed writer.SELFwithDictionaryEncoding(boolean enableDictionary)Enable or disable dictionary encoding for the constructed writer.SELFwithDictionaryEncoding(String columnPath, boolean enableDictionary)Enable or disable dictionary encoding of the specified column for the constructed writer.SELFwithDictionaryPageSize(int dictionaryPageSize)Set the Parquet format dictionary page size used by the constructed writer.SELFwithEncryption(FileEncryptionProperties encryptionProperties)Set thefile encryption propertiesused by the constructed writer.SELFwithMaxPaddingSize(int maxPaddingSize)Set the maximum amount of padding, in bytes, that will be used to align row groups with blocks in the underlying filesystem.SELFwithPageRowCountLimit(int rowCount)Sets the Parquet format page row count limit used by the constructed writer.SELFwithPageSize(int pageSize)Set the Parquet format page size used by the constructed writer.SELFwithPageWriteChecksumEnabled(boolean enablePageWriteChecksum)Enables writing page level checksums for the constructed writer.SELFwithRowGroupSize(int rowGroupSize)Deprecated.UsewithRowGroupSize(long)insteadSELFwithRowGroupSize(long rowGroupSize)Set the Parquet format row group size used by the constructed writer.SELFwithValidation(boolean enableValidation)Enable or disable validation for the constructed writer.SELFwithWriteMode(ParquetFileWriter.Mode mode)Set thewrite modeused when creating the backing file for this writer.SELFwithWriterVersion(ParquetProperties.WriterVersion version)Set theformat versionused by the constructed writer.
-
-
-
Constructor Detail
-
Builder
protected Builder(org.apache.hadoop.fs.Path path)
-
Builder
protected Builder(OutputFile path)
-
-
Method Detail
-
self
protected abstract SELF self()
- Returns:
- this as the correct subclass of ParquetWriter.Builder.
-
getWriteSupport
protected abstract WriteSupport<T> getWriteSupport(org.apache.hadoop.conf.Configuration conf)
- Parameters:
conf- a configuration- Returns:
- an appropriate WriteSupport for the object model.
-
withConf
public SELF withConf(org.apache.hadoop.conf.Configuration conf)
Set theConfigurationused by the constructed writer.- Parameters:
conf- aConfiguration- Returns:
- this builder for method chaining.
-
withWriteMode
public SELF withWriteMode(ParquetFileWriter.Mode mode)
Set thewrite modeused when creating the backing file for this writer.- Parameters:
mode- aParquetFileWriter.Mode- Returns:
- this builder for method chaining.
-
withCompressionCodec
public SELF withCompressionCodec(org.apache.parquet.hadoop.metadata.CompressionCodecName codecName)
Set thecompression codecused by the constructed writer.- Parameters:
codecName- aCompressionCodecName- Returns:
- this builder for method chaining.
-
withEncryption
public SELF withEncryption(FileEncryptionProperties encryptionProperties)
Set thefile encryption propertiesused by the constructed writer.- Parameters:
encryptionProperties- aFileEncryptionProperties- Returns:
- this builder for method chaining.
-
withRowGroupSize
@Deprecated public SELF withRowGroupSize(int rowGroupSize)
Deprecated.UsewithRowGroupSize(long)insteadSet the Parquet format row group size used by the constructed writer.- Parameters:
rowGroupSize- an integer size in bytes- Returns:
- this builder for method chaining.
-
withRowGroupSize
public SELF withRowGroupSize(long rowGroupSize)
Set the Parquet format row group size used by the constructed writer.- Parameters:
rowGroupSize- an integer size in bytes- Returns:
- this builder for method chaining.
-
withPageSize
public SELF withPageSize(int pageSize)
Set the Parquet format page size used by the constructed writer.- Parameters:
pageSize- an integer size in bytes- Returns:
- this builder for method chaining.
-
withPageRowCountLimit
public SELF withPageRowCountLimit(int rowCount)
Sets the Parquet format page row count limit used by the constructed writer.- Parameters:
rowCount- limit for the number of rows stored in a page- Returns:
- this builder for method chaining
-
withDictionaryPageSize
public SELF withDictionaryPageSize(int dictionaryPageSize)
Set the Parquet format dictionary page size used by the constructed writer.- Parameters:
dictionaryPageSize- an integer size in bytes- Returns:
- this builder for method chaining.
-
withMaxPaddingSize
public SELF withMaxPaddingSize(int maxPaddingSize)
Set the maximum amount of padding, in bytes, that will be used to align row groups with blocks in the underlying filesystem. If the underlying filesystem is not a block filesystem like HDFS, this has no effect.- Parameters:
maxPaddingSize- an integer size in bytes- Returns:
- this builder for method chaining.
-
enableDictionaryEncoding
public SELF enableDictionaryEncoding()
Enables dictionary encoding for the constructed writer.- Returns:
- this builder for method chaining.
-
withDictionaryEncoding
public SELF withDictionaryEncoding(boolean enableDictionary)
Enable or disable dictionary encoding for the constructed writer.- Parameters:
enableDictionary- whether dictionary encoding should be enabled- Returns:
- this builder for method chaining.
-
withByteStreamSplitEncoding
public SELF withByteStreamSplitEncoding(boolean enableByteStreamSplit)
-
withDictionaryEncoding
public SELF withDictionaryEncoding(String columnPath, boolean enableDictionary)
Enable or disable dictionary encoding of the specified column for the constructed writer.- Parameters:
columnPath- the path of the column (dot-string)enableDictionary- whether dictionary encoding should be enabled- Returns:
- this builder for method chaining.
-
enableValidation
public SELF enableValidation()
Enables validation for the constructed writer.- Returns:
- this builder for method chaining.
-
withValidation
public SELF withValidation(boolean enableValidation)
Enable or disable validation for the constructed writer.- Parameters:
enableValidation- whether validation should be enabled- Returns:
- this builder for method chaining.
-
withWriterVersion
public SELF withWriterVersion(ParquetProperties.WriterVersion version)
Set theformat versionused by the constructed writer.- Parameters:
version- aWriterVersion- Returns:
- this builder for method chaining.
-
enablePageWriteChecksum
public SELF enablePageWriteChecksum()
Enables writing page level checksums for the constructed writer.- Returns:
- this builder for method chaining.
-
withPageWriteChecksumEnabled
public SELF withPageWriteChecksumEnabled(boolean enablePageWriteChecksum)
Enables writing page level checksums for the constructed writer.- Parameters:
enablePageWriteChecksum- whether page checksums should be written out- Returns:
- this builder for method chaining.
-
withBloomFilterNDV
public SELF withBloomFilterNDV(String columnPath, long ndv)
Sets the NDV (number of distinct values) for the specified column.- Parameters:
columnPath- the path of the column (dot-string)ndv- the NDV of the column- Returns:
- this builder for method chaining.
-
withBloomFilterEnabled
public SELF withBloomFilterEnabled(boolean enabled)
Sets the bloom filter enabled/disabled- Parameters:
enabled- whether to write bloom filters- Returns:
- this builder for method chaining
-
withBloomFilterEnabled
public SELF withBloomFilterEnabled(String columnPath, boolean enabled)
Sets the bloom filter enabled/disabled for the specified column. If not set for the column specifically the default enabled/disabled state will take place. SeewithBloomFilterEnabled(boolean).- Parameters:
columnPath- the path of the column (dot-string)enabled- whether to write bloom filter for the column- Returns:
- this builder for method chaining
-
config
public SELF config(String property, String value)
Set a property that will be available to the read path. For writers that use a Hadoop configuration, this is the recommended way to add configuration values.- Parameters:
property- a String property namevalue- a String property value- Returns:
- this builder for method chaining.
-
build
public ParquetWriter<T> build() throws IOException
Build aParquetWriterwith the accumulated configuration.- Returns:
- a configured
ParquetWriterinstance. - Throws:
IOException- if there is an error while creating the writer
-
-