Skip to content

Conversation

@kiszk
Copy link
Member

@kiszk kiszk commented Apr 11, 2018

What changes were proposed in this pull request?

The PR adds the SQL function array_position. The behavior of the function is based on Presto's one.

The function returns the position of the first occurrence of the element in array x (or 0 if not found) using 1-based index as BigInt.

How was this patch tested?

Added UTs

@SparkQA
Copy link

SparkQA commented Apr 11, 2018

Test build #89176 has finished for PR 21037 at commit 535e511.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Apr 11, 2018

retest this please

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since :)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good catch

@SparkQA
Copy link

SparkQA commented Apr 11, 2018

Test build #89188 has finished for PR 21037 at commit 535e511.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 11, 2018

Test build #89200 has finished for PR 21037 at commit 908472c.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 11, 2018

Test build #89218 has finished for PR 21037 at commit ce5df02.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ArrayPosition(str: Expression, substr: Expression)

@bersprockets
Copy link
Contributor

Is array_position a string function? The function array_contains, for example, works on arrays.

Also, this new function seems very similar as the existing instr ("Locate the position of the first occurrence of substr column in the given string"), except the return type is BigDecimal rather than Integer.

For example, I ran a query using your branch:

scala> val df = Seq((Seq(1, 2, 3), "this is a test"), (Seq(7, 8, 9, 3), "yeah this")).toDF("a", "b")
df: org.apache.spark.sql.DataFrame = [a: array, b: string]

scala> df.select(array_contains('a, 2), array_position('b, "this"), instr('b, "this")).show
+--------------------+-----------------------+--------------+
|array_contains(a, 2)|array_position(b, this)|instr(b, this)|
+--------------------+-----------------------+--------------+
|                true|                      1|             1|
|               false|                      6|             6|
+--------------------+-----------------------+--------------+


scala> 
scala> df.select(array_position('a, 3)).show
:35: error: type mismatch;
 found   : Int(3)
 required: String
       df.select(array_position('a, 3)).show
                                    ^

scala> df.select(array_contains('a, 3)).show
+--------------------+
|array_contains(a, 3)|
+--------------------+
|                true|
|                true|
+--------------------+


scala> 

@kiszk
Copy link
Member Author

kiszk commented Apr 12, 2018

@bersprockets You are absolutely right. I made mistake. I will completely reimplement this soon.

@SparkQA
Copy link

SparkQA commented Apr 12, 2018

Test build #89238 has finished for PR 21037 at commit 3d687a0.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ArrayPosition(left: Expression, right: Expression)

@SparkQA
Copy link

SparkQA commented Apr 12, 2018

Test build #89255 has finished for PR 21037 at commit 6a464a6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Apr 12, 2018

retest this please

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is a broken sentence?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the behavior when left contains null element?

Copy link
Member Author

@kiszk kiszk Apr 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, if an array in left contains a null element, as you can see a UT, it works as an usual element.

val left = Literal.create(Seq[String](null, ""), ArrayType(StringType)) // contains null
checkEvaluation(ArrayPosition(left, Literal(""), 2L)
checkEvaluation(ArrayPosition(left, Literal.create(null, StringType)), null)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can't know the position of null in the array even if the array contains null?

Copy link
Member Author

@kiszk kiszk Apr 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to these UTs in Presto, you are right.

@SparkQA
Copy link

SparkQA commented Apr 12, 2018

Test build #89268 has finished for PR 21037 at commit 6a464a6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 13, 2018

Test build #89314 has finished for PR 21037 at commit 16ae59c.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Apr 13, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Apr 13, 2018

Test build #89322 has finished for PR 21037 at commit 16ae59c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stripMargin is missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, sorry again

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: $pos

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess left.dataType.asInstanceOf[ArrayType].containsNull is not related to the nullability of this function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, you are right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can't know the position of null in the array even if the array contains null?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need to check null for each array element?

E.g. for the test Array Position in CollectionExpressionsSuite, if a0 is as follows:

val a0 = Literal.create(Seq(1, null, 2, 3), ArrayType(IntegerType))

checkEvaluation(ArrayPosition(a0, Literal(0)), 0L) will fail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree with you. I added one test case for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove an extra line.

@SparkQA
Copy link

SparkQA commented Apr 16, 2018

Test build #89411 has finished for PR 21037 at commit aac6f64.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Apr 18, 2018

@ueshin could you please review again?

Copy link
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for some nits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returns 0 if substr could not be found in str -> Returns 0 if the value could not be found in the array or something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returns 0 if substr could not be found in str -> Returns 0 if the value could not be found in the array or something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this here because this is the same as the one in BinaryExpression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove that?
The first character in str has index 1. -> The first element in the array has index 1. or something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unnecessary import?

@SparkQA
Copy link

SparkQA commented Apr 18, 2018

Test build #89506 has finished for PR 21037 at commit 7362b1c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ueshin
Copy link
Member

ueshin commented Apr 19, 2018

Thanks! merging to master.

@asfgit asfgit closed this in d5bec48 Apr 19, 2018
Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too.

> SELECT _FUNC_(array(3, 2, 1), 1);
3
""",
since = "2.4.0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to note that we can use note here too:

note = """
Use RLIKE to match with standard regular expressions.
""")

I am mentioning this because we are adding many functions now :-).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants