Commit 14c54f1
[SPARK-4213][SQL] ParquetFilters - No support for LT, LTE, GT, GTE operators
Following description is quoted from JIRA:
When I issue a hql query against a HiveContext where my predicate uses a column of string type with one of LT, LTE, GT, or GTE operator, I get the following error:
scala.MatchError: StringType (of class org.apache.spark.sql.catalyst.types.StringType$)
Looking at the code in org.apache.spark.sql.parquet.ParquetFilters, StringType is absent from the corresponding functions for creating these filters.
To reproduce, in a Hive 0.13.1 shell, I created the following table (at a specified DB):
create table sparkbug (
id int,
event string
) stored as parquet;
Insert some sample data:
insert into table sparkbug select 1, '2011-06-18' from <some table> limit 1;
insert into table sparkbug select 2, '2012-01-01' from <some table> limit 1;
Launch a spark shell and create a HiveContext to the metastore where the table above is located.
import org.apache.spark.sql._
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.hive.HiveContext
val hc = new HiveContext(sc)
hc.setConf("spark.sql.shuffle.partitions", "10")
hc.setConf("spark.sql.hive.convertMetastoreParquet", "true")
hc.setConf("spark.sql.parquet.compression.codec", "snappy")
import hc._
hc.hql("select * from <db>.sparkbug where event >= '2011-12-01'")
A scala.MatchError will appear in the output.
Author: Kousuke Saruta <[email protected]>
Closes #3083 from sarutak/SPARK-4213 and squashes the following commits:
4ab6e56 [Kousuke Saruta] WIP
b6890c6 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-4213
9a1fae7 [Kousuke Saruta] Fixed ParquetFilters so that compare Strings1 parent 68609c5 commit 14c54f1
File tree
2 files changed
+364
-11
lines changed- sql/core/src
- main/scala/org/apache/spark/sql/parquet
- test/scala/org/apache/spark/sql/parquet
2 files changed
+364
-11
lines changed
0 commit comments