Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix the bug in unittest
  • Loading branch information
chenghao-intel committed Mar 5, 2015
commit 47db754364ecd64e9fa73effc9e753d72e49647a
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,11 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging {
resolveGetField(e, fieldName, resolver)
}

// TODO the alias name is quite tricky to me, set it to _col1, _col2.. ?
// Set it as original attribute name like "a.b.c" seems still confusing,
// and we may never reference this column by its name (with "."), except
// people write SQL like: SELECT a.b.c as newCol FROM nestedTable, which
// explicitly specifying the alias name for the output column
Some(Alias(fieldExprs, name)())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The python test failure is caused by replacing aliasName with name here. Is it okay? SELECT a.b.c FROM table would get attribute named a.b.c instead of c before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @viirya I've updated the python code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant should it be that? In Hive it should be c instead of a.b.c?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not so sure how Hive handle that, but it can not be c; otherwise it may cause reference arbitrary for its parent logical plan.

e.g.
Assume we have table tbl with schema Struct < a : Struct < b : Int, c: Int>, b: int>

SELECT b FROM (SELECT a.b, b FROM tbl)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can change the default alias when extracting nested fields. I believe we match hive behaviors now, and this would break existing queries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree we shouldn't break the existed logic, but I believe this is a bug of Hive.

hive>create table struct1 as select named_struct("a",key, "b", value) as a, key as b from src limit 1;
hive>select a.b, b from struct1; -- Works
hive>create table struct2 as select a.b, b from struct1;
FAILED: SemanticException [Error 10036]: Duplicate column name: b

I am wondering if we can break the naming rule of Hive for nested data type references, which always causes ambiguous.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this a bug? These are pretty contrived examples. How often do you actually have nested structures where the outside name is the same as the inside name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I shouldn't say "always", but "possible", it maybe quite often while with join.


// No matches.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ class SQLQuerySuite extends QueryTest with BeforeAndAfterAll {
"""
|select attribute, sum(cnt)
|from (
| select nested.attribute, count(*) as cnt
| select nested.attribute as attribute, count(*) as cnt
| from rows
| group by nested.attribute) a
|group by attribute
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,15 @@ class SQLQuerySuite extends QueryTest {
Row(1) :: Row(2) :: Row(3) :: Nil)
}

test("SPARK-6145 insert into table by selecting data from a nested table") {
jsonRDD(sparkContext.parallelize(
"""{"a": {"a": {"a": 1}}, "c": 1}""" :: Nil)).registerTempTable("nestedOrder")

sql("CREATE TABLE gen_tmp_6145 (key Int)")
sql("INSERT INTO table gen_tmp_6145 SELECT a.a.a from nestedOrder")
sql("DROP TABLE gen_tmp_6145")
}

test("SPARK-4512 Fix attribute reference resolution error when using SORT BY") {
checkAnswer(
sql("SELECT * FROM (SELECT key + key AS a FROM src SORT BY value) t ORDER BY t.a"),
Expand Down