Package org.apache.parquet.hadoop
Class ParquetWriter<T>
- java.lang.Object
-
- org.apache.parquet.hadoop.ParquetWriter<T>
-
- All Implemented Interfaces:
Closeable,AutoCloseable
- Direct Known Subclasses:
ExampleParquetWriter
public class ParquetWriter<T> extends Object implements Closeable
Write records to a Parquet file.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classParquetWriter.Builder<T,SELF extends ParquetWriter.Builder<T,SELF>>An abstract builder class for ParquetWriter instances.
-
Field Summary
Fields Modifier and Type Field Description static intDEFAULT_BLOCK_SIZEstatic org.apache.parquet.hadoop.metadata.CompressionCodecNameDEFAULT_COMPRESSION_CODEC_NAMEstatic booleanDEFAULT_IS_DICTIONARY_ENABLEDstatic booleanDEFAULT_IS_VALIDATING_ENABLEDstatic intDEFAULT_PAGE_SIZEstatic ParquetProperties.WriterVersionDEFAULT_WRITER_VERSIONstatic intMAX_PADDING_SIZE_DEFAULTstatic StringOBJECT_MODEL_NAME_PROP
-
Constructor Summary
Constructors Constructor Description ParquetWriter(org.apache.hadoop.fs.Path file, org.apache.hadoop.conf.Configuration conf, WriteSupport<T> writeSupport)Deprecated.ParquetWriter(org.apache.hadoop.fs.Path file, WriteSupport<T> writeSupport)Deprecated.will be removed in 2.0.0ParquetWriter(org.apache.hadoop.fs.Path file, WriteSupport<T> writeSupport, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName, int blockSize, int pageSize)Deprecated.will be removed in 2.0.0ParquetWriter(org.apache.hadoop.fs.Path file, WriteSupport<T> writeSupport, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName, int blockSize, int pageSize, boolean enableDictionary, boolean validating)Deprecated.will be removed in 2.0.0ParquetWriter(org.apache.hadoop.fs.Path file, WriteSupport<T> writeSupport, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName, int blockSize, int pageSize, int dictionaryPageSize, boolean enableDictionary, boolean validating)Deprecated.will be removed in 2.0.0ParquetWriter(org.apache.hadoop.fs.Path file, WriteSupport<T> writeSupport, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName, int blockSize, int pageSize, int dictionaryPageSize, boolean enableDictionary, boolean validating, ParquetProperties.WriterVersion writerVersion)Deprecated.will be removed in 2.0.0ParquetWriter(org.apache.hadoop.fs.Path file, WriteSupport<T> writeSupport, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName, int blockSize, int pageSize, int dictionaryPageSize, boolean enableDictionary, boolean validating, ParquetProperties.WriterVersion writerVersion, org.apache.hadoop.conf.Configuration conf)Deprecated.will be removed in 2.0.0ParquetWriter(org.apache.hadoop.fs.Path file, ParquetFileWriter.Mode mode, WriteSupport<T> writeSupport, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName, int blockSize, int pageSize, int dictionaryPageSize, boolean enableDictionary, boolean validating, ParquetProperties.WriterVersion writerVersion, org.apache.hadoop.conf.Configuration conf)Deprecated.will be removed in 2.0.0
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()longgetDataSize()ParquetMetadatagetFooter()voidwrite(T object)
-
-
-
Field Detail
-
DEFAULT_BLOCK_SIZE
public static final int DEFAULT_BLOCK_SIZE
- See Also:
- Constant Field Values
-
DEFAULT_PAGE_SIZE
public static final int DEFAULT_PAGE_SIZE
- See Also:
- Constant Field Values
-
DEFAULT_COMPRESSION_CODEC_NAME
public static final org.apache.parquet.hadoop.metadata.CompressionCodecName DEFAULT_COMPRESSION_CODEC_NAME
-
DEFAULT_IS_DICTIONARY_ENABLED
public static final boolean DEFAULT_IS_DICTIONARY_ENABLED
- See Also:
- Constant Field Values
-
DEFAULT_IS_VALIDATING_ENABLED
public static final boolean DEFAULT_IS_VALIDATING_ENABLED
- See Also:
- Constant Field Values
-
DEFAULT_WRITER_VERSION
public static final ParquetProperties.WriterVersion DEFAULT_WRITER_VERSION
-
OBJECT_MODEL_NAME_PROP
public static final String OBJECT_MODEL_NAME_PROP
- See Also:
- Constant Field Values
-
MAX_PADDING_SIZE_DEFAULT
public static final int MAX_PADDING_SIZE_DEFAULT
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
ParquetWriter
@Deprecated public ParquetWriter(org.apache.hadoop.fs.Path file, WriteSupport<T> writeSupport, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName, int blockSize, int pageSize) throws IOException
Deprecated.will be removed in 2.0.0Create a new ParquetWriter. (with dictionary encoding enabled and validation off)- Parameters:
file- the file to createwriteSupport- the implementation to write a record to a RecordConsumercompressionCodecName- the compression codec to useblockSize- the block size thresholdpageSize- the page size threshold- Throws:
IOException- if there is an error while writing
-
ParquetWriter
@Deprecated public ParquetWriter(org.apache.hadoop.fs.Path file, WriteSupport<T> writeSupport, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName, int blockSize, int pageSize, boolean enableDictionary, boolean validating) throws IOException
Deprecated.will be removed in 2.0.0Create a new ParquetWriter.- Parameters:
file- the file to createwriteSupport- the implementation to write a record to a RecordConsumercompressionCodecName- the compression codec to useblockSize- the block size thresholdpageSize- the page size threshold (both data and dictionary)enableDictionary- to turn dictionary encoding onvalidating- to turn on validation using the schema- Throws:
IOException- if there is an error while writing
-
ParquetWriter
@Deprecated public ParquetWriter(org.apache.hadoop.fs.Path file, WriteSupport<T> writeSupport, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName, int blockSize, int pageSize, int dictionaryPageSize, boolean enableDictionary, boolean validating) throws IOException
Deprecated.will be removed in 2.0.0Create a new ParquetWriter.- Parameters:
file- the file to createwriteSupport- the implementation to write a record to a RecordConsumercompressionCodecName- the compression codec to useblockSize- the block size thresholdpageSize- the page size thresholddictionaryPageSize- the page size threshold for the dictionary pagesenableDictionary- to turn dictionary encoding onvalidating- to turn on validation using the schema- Throws:
IOException- if there is an error while writing
-
ParquetWriter
@Deprecated public ParquetWriter(org.apache.hadoop.fs.Path file, WriteSupport<T> writeSupport, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName, int blockSize, int pageSize, int dictionaryPageSize, boolean enableDictionary, boolean validating, ParquetProperties.WriterVersion writerVersion) throws IOException
Deprecated.will be removed in 2.0.0Create a new ParquetWriter. Directly instantiates a HadoopConfigurationwhich reads configuration from the classpath.- Parameters:
file- the file to createwriteSupport- the implementation to write a record to a RecordConsumercompressionCodecName- the compression codec to useblockSize- the block size thresholdpageSize- the page size thresholddictionaryPageSize- the page size threshold for the dictionary pagesenableDictionary- to turn dictionary encoding onvalidating- to turn on validation using the schemawriterVersion- version of parquetWriter fromParquetProperties.WriterVersion- Throws:
IOException- if there is an error while writing
-
ParquetWriter
@Deprecated public ParquetWriter(org.apache.hadoop.fs.Path file, WriteSupport<T> writeSupport, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName, int blockSize, int pageSize, int dictionaryPageSize, boolean enableDictionary, boolean validating, ParquetProperties.WriterVersion writerVersion, org.apache.hadoop.conf.Configuration conf) throws IOException
Deprecated.will be removed in 2.0.0Create a new ParquetWriter.- Parameters:
file- the file to createwriteSupport- the implementation to write a record to a RecordConsumercompressionCodecName- the compression codec to useblockSize- the block size thresholdpageSize- the page size thresholddictionaryPageSize- the page size threshold for the dictionary pagesenableDictionary- to turn dictionary encoding onvalidating- to turn on validation using the schemawriterVersion- version of parquetWriter fromParquetProperties.WriterVersionconf- Hadoop configuration to use while accessing the filesystem- Throws:
IOException- if there is an error while writing
-
ParquetWriter
@Deprecated public ParquetWriter(org.apache.hadoop.fs.Path file, ParquetFileWriter.Mode mode, WriteSupport<T> writeSupport, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName, int blockSize, int pageSize, int dictionaryPageSize, boolean enableDictionary, boolean validating, ParquetProperties.WriterVersion writerVersion, org.apache.hadoop.conf.Configuration conf) throws IOException
Deprecated.will be removed in 2.0.0Create a new ParquetWriter.- Parameters:
file- the file to createmode- file creation modewriteSupport- the implementation to write a record to a RecordConsumercompressionCodecName- the compression codec to useblockSize- the block size thresholdpageSize- the page size thresholddictionaryPageSize- the page size threshold for the dictionary pagesenableDictionary- to turn dictionary encoding onvalidating- to turn on validation using the schemawriterVersion- version of parquetWriter fromParquetProperties.WriterVersionconf- Hadoop configuration to use while accessing the filesystem- Throws:
IOException- if there is an error while writing
-
ParquetWriter
@Deprecated public ParquetWriter(org.apache.hadoop.fs.Path file, WriteSupport<T> writeSupport) throws IOException
Deprecated.will be removed in 2.0.0Create a new ParquetWriter. The default block size is 50 MB.The default page size is 1 MB. Default compression is no compression. Dictionary encoding is disabled.- Parameters:
file- the file to createwriteSupport- the implementation to write a record to a RecordConsumer- Throws:
IOException- if there is an error while writing
-
ParquetWriter
@Deprecated public ParquetWriter(org.apache.hadoop.fs.Path file, org.apache.hadoop.conf.Configuration conf, WriteSupport<T> writeSupport) throws IOException
Deprecated.- Throws:
IOException
-
-
Method Detail
-
write
public void write(T object) throws IOException
- Throws:
IOException
-
close
public void close() throws IOException- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Throws:
IOException
-
getFooter
public ParquetMetadata getFooter()
- Returns:
- the ParquetMetadata written to the (closed) file.
-
getDataSize
public long getDataSize()
- Returns:
- the total size of data written to the file and buffered in memory
-
-