class VectorizedParquetRecordReader extends SpecificParquetRecordReaderBase[AnyRef]
A specialized RecordReader that reads into InternalRows or ColumnarBatches directly using the Parquet column APIs. This is somewhat based on parquet-mr's ColumnReader.
TODO: decimal requiring more than 8 bytes, INT96. Schema mismatch. All of these can be handled efficiently and easily with codegen.
This class can either return InternalRows or ColumnarBatches. With whole stage codegen enabled, this class returns ColumnarBatches which offers significant performance gains. TODO: make this always return ColumnarBatches.
- Alphabetic
- By Inheritance
- VectorizedParquetRecordReader
- SpecificParquetRecordReaderBase
- RecordReader
- Closeable
- AutoCloseable
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- def close(): Unit
- Definition Classes
- VectorizedParquetRecordReader → SpecificParquetRecordReaderBase → RecordReader → Closeable → AutoCloseable
- Annotations
- @Override()
- def enableReturningBatches(): Unit
Can be called before any rows are returned to enable returning columnar batches directly.
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def getCurrentKey(): Void
- Definition Classes
- SpecificParquetRecordReaderBase → RecordReader
- Annotations
- @Override()
- def getCurrentValue(): AnyRef
- Definition Classes
- VectorizedParquetRecordReader → RecordReader
- Annotations
- @Override()
- def getProgress(): Float
- Definition Classes
- VectorizedParquetRecordReader → RecordReader
- Annotations
- @Override()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def initBatch(partitionColumns: StructType, partitionValues: InternalRow): Unit
- def initialize(fileSchema: MessageType, requestedSchema: MessageType, rowGroupReader: ParquetRowGroupReader, totalRowCount: Int): Unit
- Definition Classes
- VectorizedParquetRecordReader → SpecificParquetRecordReaderBase
- Annotations
- @VisibleForTesting() @Override()
- def initialize(path: String, columns: List[String]): Unit
Utility API that will read all the data in path.
Utility API that will read all the data in path. This circumvents the need to create Hadoop objects to use this class.
columnscan contain the list of columns to project.- Definition Classes
- VectorizedParquetRecordReader → SpecificParquetRecordReaderBase
- Annotations
- @Override()
- def initialize(inputSplit: InputSplit, taskAttemptContext: TaskAttemptContext, fileFooter: Option[ParquetMetadata]): Unit
- Definition Classes
- VectorizedParquetRecordReader → SpecificParquetRecordReaderBase
- Annotations
- @Override()
- def initialize(inputSplit: InputSplit, taskAttemptContext: TaskAttemptContext): Unit
Implementation of RecordReader API.
Implementation of RecordReader API.
- Definition Classes
- VectorizedParquetRecordReader → SpecificParquetRecordReaderBase → RecordReader
- Annotations
- @Override()
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def nextBatch(): Boolean
Advances to the next batch of rows.
Advances to the next batch of rows. Returns false if there are no more.
- def nextKeyValue(): Boolean
- Definition Classes
- VectorizedParquetRecordReader → RecordReader
- Annotations
- @Override()
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- def resultBatch(): ColumnarBatch
Returns the ColumnarBatch object that will be used for all rows returned by this reader.
Returns the ColumnarBatch object that will be used for all rows returned by this reader. This object is reused. Calling this enables the vectorized reader. This should be called before any calls to nextKeyValue/nextBatch.
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()