package catalyst
Catalyst is a library for manipulating relational query plans. All classes in catalyst are considered an internal API to Spark SQL and are subject to change between minor releases.
- Alphabetic
- By Inheritance
- catalyst
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Package Members
- package analysis
Provides a logical query plan Analyzer and supporting classes for performing analysis.
Provides a logical query plan Analyzer and supporting classes for performing analysis. Analysis consists of translating UnresolvedAttributes and UnresolvedRelations into fully typed objects using information in a schema Catalog.
- package catalog
- package csv
- package dsl
A collection of implicit conversions that create a DSL for constructing catalyst data structures.
A collection of implicit conversions that create a DSL for constructing catalyst data structures.
scala> import org.apache.spark.sql.catalyst.dsl.expressions._ // Standard operators are added to expressions. scala> import org.apache.spark.sql.catalyst.expressions.Literal scala> Literal(1) + Literal(1) res0: org.apache.spark.sql.catalyst.expressions.Add = (1 + 1) // There is a conversion from 'symbols to unresolved attributes. scala> 'a.attr res1: org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute = 'a // These unresolved attributes can be used to create more complicated expressions. scala> 'a === 'b res2: org.apache.spark.sql.catalyst.expressions.EqualTo = ('a = 'b) // SQL verbs can be used to construct logical query plans. scala> import org.apache.spark.sql.catalyst.plans.logical._ scala> import org.apache.spark.sql.catalyst.dsl.plans._ scala> LocalRelation($"key".int, $"value".string).where('key === 1).select('value).analyze res3: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan = Project [value#3] Filter (key#2 = 1) LocalRelation [key#2,value#3], []
- package encoders
- package expressions
A set of classes that can be used to represent trees of relational expressions.
A set of classes that can be used to represent trees of relational expressions. A key goal of the expression library is to hide the details of naming and scoping from developers who want to manipulate trees of relational operators. As such, the library defines a special type of expression, a NamedExpression in addition to the standard collection of expressions.
Standard Expressions
A library of standard expressions (e.g., Add, EqualTo), aggregates (e.g., SUM, COUNT), and other computations (e.g. UDFs). Each expression type is capable of determining its output schema as a function of its children's output schema.
Named Expressions
Some expression are named and thus can be referenced by later operators in the dataflow graph. The two types of named expressions are AttributeReferences and Aliases. AttributeReferences refer to attributes of the input tuple for a given operator and form the leaves of some expression trees. Aliases assign a name to intermediate computations. For example, in the SQL statement
SELECT a+b AS c FROM ..., the expressionsaandbwould be represented byAttributeReferencesandcwould be represented by anAlias.During analysis, all named expressions are assigned a globally unique expression id, which can be used for equality comparisons. While the original names are kept around for debugging purposes, they should never be used to check if two attributes refer to the same value, as plan transformations can result in the introduction of naming ambiguity. For example, consider a plan that contains subqueries, both of which are reading from the same table. If an optimization removes the subqueries, scoping information would be destroyed, eliminating the ability to reason about which subquery produced a given attribute.
Evaluation
The result of expressions can be evaluated using the
Expression.apply(Row)method. - package json
- package optimizer
- package parser
- package planning
Contains classes for enumerating possible physical plans for a given logical query plan.
- package plans
A collection of common abstractions for query plans as well as a base logical plan representation.
- package rules
A framework for applying batches rewrite rules to trees, possibly to fixed point.
- package streaming
- package trees
A library for easily manipulating trees of operators.
A library for easily manipulating trees of operators. Operators that extend TreeNode are granted the following interface:
- Scala collection like methods (foreach, map, flatMap, collect, etc)
- transform - accepts a partial function that is used to generate a new tree. When the partial function can be applied to a given tree segment, that segment is replaced with the result. After attempting to apply the partial function to a given node, the transform function recursively attempts to apply the function to that node's children.
- debugging support - pretty printing, easy splicing of trees, etc.
- package types
- package util
Type Members
- case class AliasIdentifier(name: String, qualifier: Seq[String]) extends Product with Serializable
Encapsulates an identifier that is either a alias name or an identifier that has table name and a qualifier.
Encapsulates an identifier that is either a alias name or an identifier that has table name and a qualifier. The SubqueryAlias node keeps track of the qualifier using the information in this structure
- name
- Is an alias name or a table name
- qualifier
- Is a qualifier
- sealed trait CatalystIdentifier extends AnyRef
An identifier that optionally specifies a database.
An identifier that optionally specifies a database.
Format (unquoted): "name" or "db.name" Format (quoted): "
name" or "db.name" - trait DataSourceOptions extends AnyRef
Interface defines the following methods for a data source:
Interface defines the following methods for a data source:
- register a new option name
- retrieve all registered option names
- valid a given option name
- get alternative option name if any
- trait DefinedByConstructorParams extends AnyRef
A helper trait to create org.apache.spark.sql.catalyst.encoders.ExpressionEncoders for classes whose fields are entirely defined by constructor params but should not be case classes.
- class FileSourceOptions extends Serializable
Common options for the file-based data source.
- case class FunctionIdentifier(funcName: String, database: Option[String], catalog: Option[String]) extends CatalystIdentifier with Product with Serializable
Identifies a function in a database.
Identifies a function in a database. If
databaseis not defined, the current database is used. - abstract class InternalRow extends SpecializedGetters with Serializable
An abstract class for row used internally in Spark SQL, which only contains the columns as internal types.
- class NoopFilters extends StructFilters
- class OrderedFilters extends StructFilters
An instance of the class compiles filters to predicates and sorts them in the order which allows to apply the predicates to an internal row with partially initialized values, for instance converted from parsed CSV fields.
- case class ProjectingInternalRow(schema: StructType, colOrdinals: Seq[Int]) extends InternalRow with Product with Serializable
An InternalRow that projects particular columns from another InternalRow without copying the underlying data.
- case class QualifiedTableName(database: String, name: String) extends Product with Serializable
A fully qualified identifier for a table (i.e., database.tableName)
- class QueryPlanningTracker extends AnyRef
- trait SQLConfHelper extends AnyRef
Trait for getting the active SQLConf.
- trait ScalaReflection extends Logging
Support for generating catalyst schemas for scala objects.
Support for generating catalyst schemas for scala objects. Note that unlike its companion object, this trait able to work in both the runtime and the compile time (macro) universe.
- abstract class StructFilters extends AnyRef
The class provides API for applying pushed down filters to partially or fully set internal rows that have the struct schema.
The class provides API for applying pushed down filters to partially or fully set internal rows that have the struct schema.
StructFiltersassumes that:reset()is called before anyskipRow()calls for new row.
- case class TableIdentifier(table: String, database: Option[String], catalog: Option[String]) extends CatalystIdentifier with Product with Serializable
Identifies a table in a database.
Identifies a table in a database. If
databaseis not defined, the current database is used. When we register a permanent function in the FunctionRegistry, we use unquotedString as the function name. - case class WalkedTypePath(walkedPaths: Seq[String] = Nil) extends Product with Serializable
This class records the paths the serializer and deserializer walk through to reach current path.
This class records the paths the serializer and deserializer walk through to reach current path. Note that this class adds new path in prior to recorded paths so it maintains the paths as reverse order.
Value Members
- object AliasIdentifier extends Serializable
- object CatalystTypeConverters
Functions to convert Scala types to Catalyst types and vice versa.
- object CurrentUserContext
- object DeserializerBuildHelper
- object FileSourceOptions extends Serializable
- object FunctionIdentifier extends Serializable
- object InternalRow extends Serializable
- object JavaTypeInference
Type-inference utilities for POJOs and Java collections.
- object QueryPlanningTracker
A simple utility for tracking runtime and associated stats in query planning.
A simple utility for tracking runtime and associated stats in query planning.
There are two separate concepts we track:
1. Phases: These are broad scope phases in query planning, as listed below, i.e. analysis, optimization and physical planning (just planning).
2. Rules: These are the individual Catalyst rules that we track. In addition to time, we also track the number of invocations and effective invocations.
- object ScalaReflection extends ScalaReflection
A default version of ScalaReflection that uses the runtime universe.
- object SerializerBuildHelper
- object StructFilters
- object TableIdentifier extends Serializable