Class UnmaterializableRecordCounter


  • public class UnmaterializableRecordCounter
    extends Object
    Tracks number of records that cannot be materialized and throws ParquetDecodingException if the rate of errors crosses a limit.

    These types of errors are meant to be recoverable record conversion errors, such as a union missing a value, or schema mismatch and so on. It's not meant to recover from corruptions in the parquet columns themselves. The intention is to skip over very rare file corruption or bugs where the write path has allowed invalid records into the file, but still catch large numbers of failures. Not turned on by default (by default, no errors are tolerated).

    • Constructor Detail

      • UnmaterializableRecordCounter

        public UnmaterializableRecordCounter​(org.apache.hadoop.conf.Configuration conf,
                                             long totalNumRecords)
      • UnmaterializableRecordCounter

        public UnmaterializableRecordCounter​(ParquetReadOptions options,
                                             long totalNumRecords)
      • UnmaterializableRecordCounter

        public UnmaterializableRecordCounter​(double errorThreshold,
                                             long totalNumRecords)
    • Method Detail

      • incErrors

        public void incErrors​(org.apache.parquet.io.api.RecordMaterializer.RecordMaterializationException cause)
                       throws org.apache.parquet.io.ParquetDecodingException
        Throws:
        org.apache.parquet.io.ParquetDecodingException