Class RewriteOptions.Builder

    • Constructor Detail

      • Builder

        public Builder​(org.apache.hadoop.conf.Configuration conf,
                       org.apache.hadoop.fs.Path inputFile,
                       org.apache.hadoop.fs.Path outputFile)
        Create a builder to create a RewriterOptions.
        Parameters:
        conf - configuration for reading from input files and writing to output file
        inputFile - input file path to read from
        outputFile - output file path to rewrite to
      • Builder

        public Builder​(org.apache.hadoop.conf.Configuration conf,
                       List<org.apache.hadoop.fs.Path> inputFiles,
                       org.apache.hadoop.fs.Path outputFile)
        Create a builder to create a RewriterOptions.

        Please note that if merging more than one file, the schema of all files must be the same. Otherwise, the rewrite will fail.

        The rewrite will keep original row groups from all input files. This may not be optimal if row groups are very small and will not solve small file problems. Instead, it will make it worse to have a large file footer in the output file. TODO: support rewrite by record to break the original row groups into reasonable ones.

        Parameters:
        conf - configuration for reading from input files and writing to output file
        inputFiles - list of input file paths to read from
        outputFile - output file path to rewrite to
    • Method Detail

      • prune

        public RewriteOptions.Builder prune​(List<String> columns)
        Set the columns to prune.

        By default, all columns are kept.

        Parameters:
        columns - list of columns to prune
        Returns:
        self
      • transform

        public RewriteOptions.Builder transform​(org.apache.parquet.hadoop.metadata.CompressionCodecName newCodecName)
        Set the compression codec to use for the output file.

        By default, the codec is the same as the input file.

        Parameters:
        newCodecName - compression codec to use
        Returns:
        self
      • mask

        public RewriteOptions.Builder mask​(Map<String,​MaskMode> maskColumns)
        Set the columns to mask.

        By default, no columns are masked.

        Parameters:
        maskColumns - map of columns to mask to the masking mode
        Returns:
        self
      • encrypt

        public RewriteOptions.Builder encrypt​(List<String> encryptColumns)
        Set the columns to encrypt.

        By default, no columns are encrypted.

        Parameters:
        encryptColumns - list of columns to encrypt
        Returns:
        self
      • encryptionProperties

        public RewriteOptions.Builder encryptionProperties​(FileEncryptionProperties fileEncryptionProperties)
        Set the encryption properties to use for the output file.

        This is required if encrypting columns are not empty.

        Parameters:
        fileEncryptionProperties - encryption properties to use
        Returns:
        self
      • addInputFile

        public RewriteOptions.Builder addInputFile​(org.apache.hadoop.fs.Path path)
        Add an input file to read from.
        Parameters:
        path - input file path to read from
        Returns:
        self
      • build

        public RewriteOptions build()
        Build the RewriterOptions.
        Returns:
        a RewriterOptions