Skip to content

Commit e9f1d4a

Browse files
HyukjinKwonFelix Cheung
authored andcommitted
[MINOR][DOCUMENTATION] Fix some minor descriptions in functions consistently with expressions
## What changes were proposed in this pull request? This PR proposes to improve documentation and fix some descriptions equivalent to several minor fixes identified in #15677 Also, this suggests to change `Note:` and `NOTE:` to `.. note::` consistently with the others which marks up pretty. ## How was this patch tested? Jenkins tests and manually. For PySpark, `Note:` and `NOTE:` to `.. note::` make the document as below: **From** ![2016-11-04 6 53 35](https://cloud.githubusercontent.com/assets/6477701/20002648/42989922-a2c5-11e6-8a32-b73eda49e8c3.png) ![2016-11-04 6 53 45](https://cloud.githubusercontent.com/assets/6477701/20002650/429fb310-a2c5-11e6-926b-e030d7eb0185.png) ![2016-11-04 6 54 11](https://cloud.githubusercontent.com/assets/6477701/20002649/429d570a-a2c5-11e6-9e7e-44090f337e32.png) ![2016-11-04 6 53 51](https://cloud.githubusercontent.com/assets/6477701/20002647/4297fc74-a2c5-11e6-801a-b89fbcbfca44.png) ![2016-11-04 6 53 51](https://cloud.githubusercontent.com/assets/6477701/20002697/749f5780-a2c5-11e6-835f-022e1f2f82e3.png) **To** ![2016-11-04 7 03 48](https://cloud.githubusercontent.com/assets/6477701/20002659/4961b504-a2c5-11e6-9ee0-ef0751482f47.png) ![2016-11-04 7 04 03](https://cloud.githubusercontent.com/assets/6477701/20002660/49871d3a-a2c5-11e6-85ea-d9a5d11efeff.png) ![2016-11-04 7 04 28](https://cloud.githubusercontent.com/assets/6477701/20002662/498e0f14-a2c5-11e6-803d-c0c5aeda4153.png) ![2016-11-04 7 33 39](https://cloud.githubusercontent.com/assets/6477701/20002731/a76e30d2-a2c5-11e6-993b-0481b8342d6b.png) ![2016-11-04 7 33 39](https://cloud.githubusercontent.com/assets/6477701/20002731/a76e30d2-a2c5-11e6-993b-0481b8342d6b.png) Author: hyukjinkwon <[email protected]> Closes #15765 from HyukjinKwon/minor-function-doc. (cherry picked from commit 15d3926) Signed-off-by: Felix Cheung <[email protected]>
1 parent dcbc426 commit e9f1d4a

File tree

3 files changed

+51
-36
lines changed

3 files changed

+51
-36
lines changed

R/pkg/R/functions.R

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2317,7 +2317,8 @@ setMethod("date_format", signature(y = "Column", x = "character"),
23172317

23182318
#' from_utc_timestamp
23192319
#'
2320-
#' Assumes given timestamp is UTC and converts to given timezone.
2320+
#' Given a timestamp, which corresponds to a certain time of day in UTC, returns another timestamp
2321+
#' that corresponds to the same time of day in the given timezone.
23212322
#'
23222323
#' @param y Column to compute on.
23232324
#' @param x time zone to use.
@@ -2340,7 +2341,7 @@ setMethod("from_utc_timestamp", signature(y = "Column", x = "character"),
23402341
#' Locate the position of the first occurrence of substr column in the given string.
23412342
#' Returns null if either of the arguments are null.
23422343
#'
2343-
#' NOTE: The position is not zero based, but 1 based index, returns 0 if substr
2344+
#' NOTE: The position is not zero based, but 1 based index. Returns 0 if substr
23442345
#' could not be found in str.
23452346
#'
23462347
#' @param y column to check
@@ -2391,7 +2392,8 @@ setMethod("next_day", signature(y = "Column", x = "character"),
23912392

23922393
#' to_utc_timestamp
23932394
#'
2394-
#' Assumes given timestamp is in given timezone and converts to UTC.
2395+
#' Given a timestamp, which corresponds to a certain time of day in the given timezone, returns
2396+
#' another timestamp that corresponds to the same time of day in UTC.
23952397
#'
23962398
#' @param y Column to compute on
23972399
#' @param x timezone to use
@@ -2539,7 +2541,7 @@ setMethod("shiftLeft", signature(y = "Column", x = "numeric"),
25392541

25402542
#' shiftRight
25412543
#'
2542-
#' Shift the given value numBits right. If the given value is a long value, it will return
2544+
#' (Signed) shift the given value numBits right. If the given value is a long value, it will return
25432545
#' a long value else it will return an integer value.
25442546
#'
25452547
#' @param y column to compute on.
@@ -2777,7 +2779,7 @@ setMethod("window", signature(x = "Column"),
27772779
#' locate
27782780
#'
27792781
#' Locate the position of the first occurrence of substr.
2780-
#' NOTE: The position is not zero based, but 1 based index, returns 0 if substr
2782+
#' NOTE: The position is not zero based, but 1 based index. Returns 0 if substr
27812783
#' could not be found in str.
27822784
#'
27832785
#' @param substr a character string to be matched.
@@ -2823,7 +2825,8 @@ setMethod("lpad", signature(x = "Column", len = "numeric", pad = "character"),
28232825

28242826
#' rand
28252827
#'
2826-
#' Generate a random column with i.i.d. samples from U[0.0, 1.0].
2828+
#' Generate a random column with independent and identically distributed (i.i.d.) samples
2829+
#' from U[0.0, 1.0].
28272830
#'
28282831
#' @param seed a random seed. Can be missing.
28292832
#' @family normal_funcs
@@ -2852,7 +2855,8 @@ setMethod("rand", signature(seed = "numeric"),
28522855

28532856
#' randn
28542857
#'
2855-
#' Generate a column with i.i.d. samples from the standard normal distribution.
2858+
#' Generate a column with independent and identically distributed (i.i.d.) samples from
2859+
#' the standard normal distribution.
28562860
#'
28572861
#' @param seed a random seed. Can be missing.
28582862
#' @family normal_funcs
@@ -3442,8 +3446,8 @@ setMethod("size",
34423446

34433447
#' sort_array
34443448
#'
3445-
#' Sorts the input array for the given column in ascending order,
3446-
#' according to the natural ordering of the array elements.
3449+
#' Sorts the input array in ascending or descending order according
3450+
#' to the natural ordering of the array elements.
34473451
#'
34483452
#' @param x A Column to sort
34493453
#' @param asc A logical flag indicating the sorting order.

python/pyspark/sql/functions.py

Lines changed: 20 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -359,8 +359,8 @@ def grouping_id(*cols):
359359
360360
(grouping(c1) << (n-1)) + (grouping(c2) << (n-2)) + ... + grouping(cn)
361361
362-
Note: the list of columns should match with grouping columns exactly, or empty (means all the
363-
grouping columns).
362+
.. note:: the list of columns should match with grouping columns exactly, or empty (means all
363+
the grouping columns).
364364
365365
>>> df.cube("name").agg(grouping_id(), sum("age")).orderBy("name").show()
366366
+-----+-------------+--------+
@@ -457,7 +457,8 @@ def nanvl(col1, col2):
457457

458458
@since(1.4)
459459
def rand(seed=None):
460-
"""Generates a random column with i.i.d. samples from U[0.0, 1.0].
460+
"""Generates a random column with independent and identically distributed (i.i.d.) samples
461+
from U[0.0, 1.0].
461462
"""
462463
sc = SparkContext._active_spark_context
463464
if seed is not None:
@@ -469,7 +470,8 @@ def rand(seed=None):
469470

470471
@since(1.4)
471472
def randn(seed=None):
472-
"""Generates a column with i.i.d. samples from the standard normal distribution.
473+
"""Generates a column with independent and identically distributed (i.i.d.) samples from
474+
the standard normal distribution.
473475
"""
474476
sc = SparkContext._active_spark_context
475477
if seed is not None:
@@ -518,7 +520,7 @@ def shiftLeft(col, numBits):
518520

519521
@since(1.5)
520522
def shiftRight(col, numBits):
521-
"""Shift the given value numBits right.
523+
"""(Signed) shift the given value numBits right.
522524
523525
>>> spark.createDataFrame([(42,)], ['a']).select(shiftRight('a', 1).alias('r')).collect()
524526
[Row(r=21)]
@@ -777,8 +779,8 @@ def date_format(date, format):
777779
A pattern could be for instance `dd.MM.yyyy` and could return a string like '18.03.1993'. All
778780
pattern letters of the Java class `java.text.SimpleDateFormat` can be used.
779781
780-
NOTE: Use when ever possible specialized functions like `year`. These benefit from a
781-
specialized implementation.
782+
.. note:: Use when ever possible specialized functions like `year`. These benefit from a
783+
specialized implementation.
782784
783785
>>> df = spark.createDataFrame([('2015-04-08',)], ['a'])
784786
>>> df.select(date_format('a', 'MM/dd/yyy').alias('date')).collect()
@@ -1059,7 +1061,8 @@ def unix_timestamp(timestamp=None, format='yyyy-MM-dd HH:mm:ss'):
10591061
@since(1.5)
10601062
def from_utc_timestamp(timestamp, tz):
10611063
"""
1062-
Assumes given timestamp is UTC and converts to given timezone.
1064+
Given a timestamp, which corresponds to a certain time of day in UTC, returns another timestamp
1065+
that corresponds to the same time of day in the given timezone.
10631066
10641067
>>> df = spark.createDataFrame([('1997-02-28 10:30:00',)], ['t'])
10651068
>>> df.select(from_utc_timestamp(df.t, "PST").alias('t')).collect()
@@ -1072,7 +1075,8 @@ def from_utc_timestamp(timestamp, tz):
10721075
@since(1.5)
10731076
def to_utc_timestamp(timestamp, tz):
10741077
"""
1075-
Assumes given timestamp is in given timezone and converts to UTC.
1078+
Given a timestamp, which corresponds to a certain time of day in the given timezone, returns
1079+
another timestamp that corresponds to the same time of day in UTC.
10761080
10771081
>>> df = spark.createDataFrame([('1997-02-28 10:30:00',)], ['t'])
10781082
>>> df.select(to_utc_timestamp(df.t, "PST").alias('t')).collect()
@@ -1314,8 +1318,8 @@ def instr(str, substr):
13141318
Locate the position of the first occurrence of substr column in the given string.
13151319
Returns null if either of the arguments are null.
13161320
1317-
NOTE: The position is not zero based, but 1 based index, returns 0 if substr
1318-
could not be found in str.
1321+
.. note:: The position is not zero based, but 1 based index. Returns 0 if substr
1322+
could not be found in str.
13191323
13201324
>>> df = spark.createDataFrame([('abcd',)], ['s',])
13211325
>>> df.select(instr(df.s, 'b').alias('s')).collect()
@@ -1379,8 +1383,8 @@ def locate(substr, str, pos=1):
13791383
"""
13801384
Locate the position of the first occurrence of substr in a string column, after position pos.
13811385
1382-
NOTE: The position is not zero based, but 1 based index. returns 0 if substr
1383-
could not be found in str.
1386+
.. note:: The position is not zero based, but 1 based index. Returns 0 if substr
1387+
could not be found in str.
13841388
13851389
:param substr: a string
13861390
:param str: a Column of :class:`pyspark.sql.types.StringType`
@@ -1442,7 +1446,7 @@ def split(str, pattern):
14421446
"""
14431447
Splits str around pattern (pattern is a regular expression).
14441448
1445-
NOTE: pattern is a string represent the regular expression.
1449+
.. note:: pattern is a string represent the regular expression.
14461450
14471451
>>> df = spark.createDataFrame([('ab12cd',)], ['s',])
14481452
>>> df.select(split(df.s, '[0-9]+').alias('s')).collect()
@@ -1785,7 +1789,8 @@ def size(col):
17851789
@since(1.5)
17861790
def sort_array(col, asc=True):
17871791
"""
1788-
Collection function: sorts the input array for the given column in ascending order.
1792+
Collection function: sorts the input array in ascending or descending order according
1793+
to the natural ordering of the array elements.
17891794
17901795
:param col: name of column or expression
17911796

sql/core/src/main/scala/org/apache/spark/sql/functions.scala

Lines changed: 18 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1117,7 +1117,8 @@ object functions {
11171117
def not(e: Column): Column = !e
11181118

11191119
/**
1120-
* Generate a random column with i.i.d. samples from U[0.0, 1.0].
1120+
* Generate a random column with independent and identically distributed (i.i.d.) samples
1121+
* from U[0.0, 1.0].
11211122
*
11221123
* Note that this is indeterministic when data partitions are not fixed.
11231124
*
@@ -1127,15 +1128,17 @@ object functions {
11271128
def rand(seed: Long): Column = withExpr { Rand(seed) }
11281129

11291130
/**
1130-
* Generate a random column with i.i.d. samples from U[0.0, 1.0].
1131+
* Generate a random column with independent and identically distributed (i.i.d.) samples
1132+
* from U[0.0, 1.0].
11311133
*
11321134
* @group normal_funcs
11331135
* @since 1.4.0
11341136
*/
11351137
def rand(): Column = rand(Utils.random.nextLong)
11361138

11371139
/**
1138-
* Generate a column with i.i.d. samples from the standard normal distribution.
1140+
* Generate a column with independent and identically distributed (i.i.d.) samples from
1141+
* the standard normal distribution.
11391142
*
11401143
* Note that this is indeterministic when data partitions are not fixed.
11411144
*
@@ -1145,15 +1148,16 @@ object functions {
11451148
def randn(seed: Long): Column = withExpr { Randn(seed) }
11461149

11471150
/**
1148-
* Generate a column with i.i.d. samples from the standard normal distribution.
1151+
* Generate a column with independent and identically distributed (i.i.d.) samples from
1152+
* the standard normal distribution.
11491153
*
11501154
* @group normal_funcs
11511155
* @since 1.4.0
11521156
*/
11531157
def randn(): Column = randn(Utils.random.nextLong)
11541158

11551159
/**
1156-
* Partition ID of the Spark task.
1160+
* Partition ID.
11571161
*
11581162
* Note that this is indeterministic because it depends on data partitioning and task scheduling.
11591163
*
@@ -1877,8 +1881,8 @@ object functions {
18771881
def shiftLeft(e: Column, numBits: Int): Column = withExpr { ShiftLeft(e.expr, lit(numBits).expr) }
18781882

18791883
/**
1880-
* Shift the given value numBits right. If the given value is a long value, it will return
1881-
* a long value else it will return an integer value.
1884+
* (Signed) shift the given value numBits right. If the given value is a long value, it will
1885+
* return a long value else it will return an integer value.
18821886
*
18831887
* @group math_funcs
18841888
* @since 1.5.0
@@ -2203,7 +2207,7 @@ object functions {
22032207
* Locate the position of the first occurrence of substr column in the given string.
22042208
* Returns null if either of the arguments are null.
22052209
*
2206-
* NOTE: The position is not zero based, but 1 based index, returns 0 if substr
2210+
* NOTE: The position is not zero based, but 1 based index. Returns 0 if substr
22072211
* could not be found in str.
22082212
*
22092213
* @group string_funcs
@@ -2238,7 +2242,7 @@ object functions {
22382242

22392243
/**
22402244
* Locate the position of the first occurrence of substr.
2241-
* NOTE: The position is not zero based, but 1 based index, returns 0 if substr
2245+
* NOTE: The position is not zero based, but 1 based index. Returns 0 if substr
22422246
* could not be found in str.
22432247
*
22442248
* @group string_funcs
@@ -2666,7 +2670,8 @@ object functions {
26662670
}
26672671

26682672
/**
2669-
* Assumes given timestamp is UTC and converts to given timezone.
2673+
* Given a timestamp, which corresponds to a certain time of day in UTC, returns another timestamp
2674+
* that corresponds to the same time of day in the given timezone.
26702675
* @group datetime_funcs
26712676
* @since 1.5.0
26722677
*/
@@ -2675,7 +2680,8 @@ object functions {
26752680
}
26762681

26772682
/**
2678-
* Assumes given timestamp is in given timezone and converts to UTC.
2683+
* Given a timestamp, which corresponds to a certain time of day in the given timezone, returns
2684+
* another timestamp that corresponds to the same time of day in UTC.
26792685
* @group datetime_funcs
26802686
* @since 1.5.0
26812687
*/
@@ -2996,7 +3002,7 @@ object functions {
29963002
def sort_array(e: Column): Column = sort_array(e, asc = true)
29973003

29983004
/**
2999-
* Sorts the input array for the given column in ascending / descending order,
3005+
* Sorts the input array for the given column in ascending or descending order,
30003006
* according to the natural ordering of the array elements.
30013007
*
30023008
* @group collection_funcs

0 commit comments

Comments
 (0)