diff --git a/docs/sql-getting-started.md b/docs/sql-getting-started.md index 5d18c48879f9..0ded2654719c 100644 --- a/docs/sql-getting-started.md +++ b/docs/sql-getting-started.md @@ -346,6 +346,9 @@ For example: +## Scalar Functions +(to be filled soon) + ## Aggregations The [built-in DataFrames functions](api/scala/index.html#org.apache.spark.sql.functions$) provide common diff --git a/docs/sql-ref-syntax-ddl-create-function.md b/docs/sql-ref-syntax-ddl-create-function.md index f95a9eba42c2..4c09ebafb1f5 100644 --- a/docs/sql-ref-syntax-ddl-create-function.md +++ b/docs/sql-ref-syntax-ddl-create-function.md @@ -19,4 +19,153 @@ license: | limitations under the License. --- -**This page is under construction** +### Description +The `CREATE FUNCTION` statement is used to create a temporary or permanent function +in Spark. Temporary functions are scoped at a session level where as permanent +functions are created in the persistent catalog and are made available to +all sessions. The resources specified in the `USING` clause are made available +to all executors when they are executed for the first time. In addition to the +SQL interface, spark allows users to create custom user defined scalar and +aggregate functions using Scala, Python and Java APIs. Please refer to +[scalar_functions](sql-getting-started.html#scalar-functions) and +[aggregate functions](sql-getting-started#aggregations) for more information. + +### Syntax +{% highlight sql %} +CREATE [ OR REPLACE ] [ TEMPORARY ] FUNCTION [ IF NOT EXISTS ] + function_name AS class_name [ resource_locations ] +{% endhighlight %} + +### Parameters +
+
OR REPLACE
+
+ If specified, the resources for the function are reloaded. This is mainly useful + to pick up any changes made to the implementation of the function. This + parameter is mutually exclusive to IF NOT EXISTS and can not + be specified together. +
+
TEMPORARY
+
+ Indicates the scope of function being created. When TEMPORARY is specified, the + created function is valid and visible in the current session. No persistent + entry is made in the catalog for these kind of functions. +
+
IF NOT EXISTS
+
+ If specified, creates the function only when it does not exist. The creation + of function succeeds (no error is thrown) if the specified function already + exists in the system. This parameter is mutually exclusive to OR REPLACE + and can not be specified together. +
+
function_name
+
+ Specifies a name of funnction to be created. The function name may be + optionally qualified with a database name.

+ Syntax: + + [database_name.]function_name + +
+
class_name
+
+ Specifies the name of the class that provides the implementation for function to be created. + The implementing class should extend one of the base classes as follows: + +
+
resource_locations
+
+ Specifies the list of resources that contain the implementation of the function + along with its dependencies.

+ Syntax: + + USING { { (JAR | FILE ) resource_uri} , ...} + +
+
+ +### Examples +{% highlight sql %} +-- 1. Create a simple UDF `SimpleUdf` that increments the supplied integral value by 10. +-- import org.apache.hadoop.hive.ql.exec.UDF; +-- public class SimpleUdf extends UDF { +-- public int evaluate(int value) { +-- return value + 10; +-- } +-- } +-- 2. Compile and place it in a JAR file called `SimpleUdf.jar` in /tmp. + +-- Create a table called `test` and insert two rows. +CREATE TABLE test(c1 INT); +INSERT INTO test VALUES (1), (2); + +-- Create a permanent function called `simple_udf`. +CREATE FUNCTION simple_udf AS 'SimpleUdf' + USING JAR '/tmp/SimpleUdf.jar'; + +-- Verify that the function is in the registry. +SHOW USER FUNCTIONS; + +------------------+ + | function| + +------------------+ + |default.simple_udf| + +------------------+ + +-- Invoke the function. Every selected value should be incremented by 10. +SELECT simple_udf(c1) AS function_return_value FROM t1; + +---------------------+ + |function_return_value| + +---------------------+ + | 11| + | 12| + +---------------------+ + +-- Created a temporary function. +CREATE TEMPORARY FUNCTION simple_temp_udf AS 'SimpleUdf' + USING JAR '/tmp/SimpleUdf.jar'; + +-- Verify that the newly created temporary function is in the registry. +-- Please note that the temporary function does not have a qualified +-- database associated with it. +SHOW USER FUNCTIONS; + +------------------+ + | function| + +------------------+ + |default.simple_udf| + | simple_temp_udf| + +------------------+ + +-- 1. Modify `SimpleUdf`'s implementation to add supplied integral value by 20. +-- import org.apache.hadoop.hive.ql.exec.UDF; + +-- public class SimpleUdfR extends UDF { +-- public int evaluate(int value) { +-- return value + 20; +-- } +-- } +-- 2. Compile and place it in a jar file called `SimpleUdfR.jar` in /tmp. + +-- Replace the implementation of `simple_udf` +CREATE OR REPLACE FUNCTION simple_udf AS 'SimpleUdfR' + USING JAR '/tmp/SimpleUdfR.jar'; + +-- Invoke the function. Every selected value should be incremented by 20. +SELECT simple_udf(c1) AS function_return_value FROM t1; ++---------------------+ +|function_return_value| ++---------------------+ +| 21| +| 22| ++---------------------+ + +{% endhighlight %} + +### Related statements +- [SHOW FUNCTIONS](sql-ref-syntax-aux-show-functions.html) +- [DESCRIBE FUNCTION](sql-ref-syntax-aux-describe-function.html) +- [DROP FUNCTION](sql-ref-syntax-ddl-drop-function.html)