Class RewriteOptions.Builder
- java.lang.Object
-
- org.apache.parquet.hadoop.rewrite.RewriteOptions.Builder
-
- Enclosing class:
- RewriteOptions
public static class RewriteOptions.Builder extends Object
-
-
Constructor Summary
Constructors Constructor Description Builder(org.apache.hadoop.conf.Configuration conf, List<org.apache.hadoop.fs.Path> inputFiles, org.apache.hadoop.fs.Path outputFile)Create a builder to create a RewriterOptions.Builder(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path inputFile, org.apache.hadoop.fs.Path outputFile)Create a builder to create a RewriterOptions.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description RewriteOptions.BuilderaddInputFile(org.apache.hadoop.fs.Path path)Add an input file to read from.RewriteOptionsbuild()Build the RewriterOptions.RewriteOptions.Builderencrypt(List<String> encryptColumns)Set the columns to encrypt.RewriteOptions.BuilderencryptionProperties(FileEncryptionProperties fileEncryptionProperties)Set the encryption properties to use for the output file.RewriteOptions.Buildermask(Map<String,MaskMode> maskColumns)Set the columns to mask.RewriteOptions.Builderprune(List<String> columns)Set the columns to prune.RewriteOptions.Buildertransform(org.apache.parquet.hadoop.metadata.CompressionCodecName newCodecName)Set the compression codec to use for the output file.
-
-
-
Constructor Detail
-
Builder
public Builder(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path inputFile, org.apache.hadoop.fs.Path outputFile)Create a builder to create a RewriterOptions.- Parameters:
conf- configuration for reading from input files and writing to output fileinputFile- input file path to read fromoutputFile- output file path to rewrite to
-
Builder
public Builder(org.apache.hadoop.conf.Configuration conf, List<org.apache.hadoop.fs.Path> inputFiles, org.apache.hadoop.fs.Path outputFile)Create a builder to create a RewriterOptions.Please note that if merging more than one file, the schema of all files must be the same. Otherwise, the rewrite will fail.
The rewrite will keep original row groups from all input files. This may not be optimal if row groups are very small and will not solve small file problems. Instead, it will make it worse to have a large file footer in the output file. TODO: support rewrite by record to break the original row groups into reasonable ones.
- Parameters:
conf- configuration for reading from input files and writing to output fileinputFiles- list of input file paths to read fromoutputFile- output file path to rewrite to
-
-
Method Detail
-
prune
public RewriteOptions.Builder prune(List<String> columns)
Set the columns to prune.By default, all columns are kept.
- Parameters:
columns- list of columns to prune- Returns:
- self
-
transform
public RewriteOptions.Builder transform(org.apache.parquet.hadoop.metadata.CompressionCodecName newCodecName)
Set the compression codec to use for the output file.By default, the codec is the same as the input file.
- Parameters:
newCodecName- compression codec to use- Returns:
- self
-
mask
public RewriteOptions.Builder mask(Map<String,MaskMode> maskColumns)
Set the columns to mask.By default, no columns are masked.
- Parameters:
maskColumns- map of columns to mask to the masking mode- Returns:
- self
-
encrypt
public RewriteOptions.Builder encrypt(List<String> encryptColumns)
Set the columns to encrypt.By default, no columns are encrypted.
- Parameters:
encryptColumns- list of columns to encrypt- Returns:
- self
-
encryptionProperties
public RewriteOptions.Builder encryptionProperties(FileEncryptionProperties fileEncryptionProperties)
Set the encryption properties to use for the output file.This is required if encrypting columns are not empty.
- Parameters:
fileEncryptionProperties- encryption properties to use- Returns:
- self
-
addInputFile
public RewriteOptions.Builder addInputFile(org.apache.hadoop.fs.Path path)
Add an input file to read from.- Parameters:
path- input file path to read from- Returns:
- self
-
build
public RewriteOptions build()
Build the RewriterOptions.- Returns:
- a RewriterOptions
-
-