Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
generated-*.html
_generated_function_html/
281 changes: 102 additions & 179 deletions docs/sql-ref-functions-builtin.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,202 +17,125 @@ license: |
limitations under the License.
---

{% for static_file in site.static_files %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it was my bad. we still need this if-else because of SKIP_API=1 bundle exec jekyll build. It fails as below

Configuration file: /.../workspace/forked/spark/docs/_config.yml
            Source: /.../workspace/forked/spark/docs
       Destination: /.../workspace/forked/spark/docs/_site
 Incremental build: disabled. Enable with --incremental
      Generating...
  Liquid Exception: Could not locate the included file '_generated_function_html/agg-funcs-table.html' in any of ["/.../workspace/forked/spark/docs/"]. Ensure it exists in one of those directories and, if it is a symlink, does not point outside your site source. in sql-ref-functions-builtin.md
...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me revert this for now (as it technically breaks the documentation build)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's strange. Does the PR build not build documentation? Because the build was passing as of a9ae743 (the latest commit in the PR), and the merge into master at e4b5977 also had a passing build.

Here's the documentation build after the merge: https://github.com/apache/spark/actions/runs/7294850973/job/19880408450

What am I missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason is that the canonical bundle exec jekyll build would pass because it will trigger the full build of API. If SKIP_API=1 is specified, it will skip the API build (which will skip SQL API generation, then the HTML files would not exist).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just revive the ifs there, and I think we can just go ahead and merge. Let me reopen this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, the branch was removed so I can't reopen the PR 😢

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. I will reopen the PR.

I will also tweak things, if possible, so that SKIP_API=1 will work even if the SQL API docs have never been generated before, without requiring all of the conditional includes, because they really clutter things up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #44627 to address the build issue so we are not caught off guard like this again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just revive the ifs there, and I think we can just go ahead and merge. Let me reopen this PR.

@HyukjinKwon - Actually, thinking about this a bit more , is this really the behavior we want to have?

It's technically a type of silent failure if Jekyll successfully builds our docs but can't find some file that was referenced via include or include_relative. The way docs/sql-ref-functions-builtin.md is currently setup makes it so that Jekyll will never detect these missing includes.

Isn't that actually a bad thing? Using this pattern, how will developers know that the documentation built correctly without manually reviewing every rendered site where they wrote a conditional include?

It seems like we need a way to ignore missing includes only when SKIP_API=1. Do you agree?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah .. as long as those options work (e.g., SKIP_API=1.), it should be fine.

{% if static_file.name == 'generated-agg-funcs-table.html' %}
* Table of contents
{:toc}

### Aggregate Functions
{% include_relative generated-agg-funcs-table.html %}
#### Examples
{% include_relative generated-agg-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-window-funcs-table.html' %}
{% include_relative _generated_function_html/agg-funcs-table.html %}

**Examples**
{% include_relative _generated_function_html/agg-funcs-examples.html %}

### Window Functions
{% include_relative generated-window-funcs-table.html %}
#### Examples
{% include_relative generated-window-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-array-funcs-table.html' %}
{% include_relative _generated_function_html/window-funcs-table.html %}

**Examples**
{% include_relative _generated_function_html/window-funcs-examples.html %}

### Array Functions
{% include_relative generated-array-funcs-table.html %}
#### Examples
{% include_relative generated-array-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-collection-funcs-table.html' %}
{% include_relative _generated_function_html/array-funcs-table.html %}

**Examples**
{% include_relative _generated_function_html/array-funcs-examples.html %}

### Collection Functions
{% include_relative generated-collection-funcs-table.html %}
#### Examples
{% include_relative generated-collection-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-struct-funcs-table.html' %}
{% include_relative _generated_function_html/collection-funcs-table.html %}

**Examples**
{% include_relative _generated_function_html/collection-funcs-examples.html %}

### STRUCT Functions
{% include_relative generated-struct-funcs-table.html %}
#### Examples
{% include_relative generated-struct-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-map-funcs-table.html' %}
{% include_relative _generated_function_html/struct-funcs-table.html %}

**Examples**
{% include_relative _generated_function_html/struct-funcs-examples.html %}

### Map Functions
{% include_relative generated-map-funcs-table.html %}
#### Examples
{% include_relative generated-map-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-datetime-funcs-table.html' %}
{% include_relative _generated_function_html/map-funcs-table.html %}

**Examples**
{% include_relative _generated_function_html/map-funcs-examples.html %}

### Date and Timestamp Functions
{% include_relative generated-datetime-funcs-table.html %}
#### Examples
{% include_relative generated-datetime-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-math-funcs-table.html' %}
{% include_relative _generated_function_html/datetime-funcs-table.html %}

**Examples**
{% include_relative _generated_function_html/datetime-funcs-examples.html %}

### Mathematical Functions
{% include_relative generated-math-funcs-table.html %}
#### Examples
{% include_relative generated-math-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-string-funcs-table.html' %}
{% include_relative _generated_function_html/math-funcs-table.html %}

**Examples**
{% include_relative _generated_function_html/math-funcs-examples.html %}

### String Functions
{% include_relative generated-string-funcs-table.html %}
#### Examples
{% include_relative generated-string-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-conditional-funcs-table.html' %}
{% include_relative _generated_function_html/string-funcs-table.html %}

**Examples**
{% include_relative _generated_function_html/string-funcs-examples.html %}

### Conditional Functions
{% include_relative generated-conditional-funcs-table.html %}
#### Examples
{% include_relative generated-conditional-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-hash-funcs-table.html' %}
{% include_relative _generated_function_html/conditional-funcs-table.html %}

**Examples**
{% include_relative _generated_function_html/conditional-funcs-examples.html %}

### Hash Functions
{% include_relative generated-hash-funcs-table.html %}
#### Examples
{% include_relative generated-hash-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-csv-funcs-table.html' %}
{% include_relative _generated_function_html/hash-funcs-table.html %}

**Examples**
{% include_relative _generated_function_html/hash-funcs-examples.html %}

### CSV Functions
{% include_relative generated-csv-funcs-table.html %}
#### Examples
{% include_relative generated-csv-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-json-funcs-table.html' %}
{% include_relative _generated_function_html/csv-funcs-table.html %}

**Examples**
{% include_relative _generated_function_html/csv-funcs-examples.html %}

### JSON Functions
{% include_relative generated-json-funcs-table.html %}
#### Examples
{% include_relative generated-json-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-xml-funcs-table.html' %}
{% include_relative _generated_function_html/json-funcs-table.html %}

**Examples**
{% include_relative _generated_function_html/json-funcs-examples.html %}

### XML Functions
{% include_relative generated-xml-funcs-table.html %}
#### Examples
{% include_relative generated-xml-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-url-funcs-table.html' %}
{% include_relative _generated_function_html/xml-funcs-table.html %}

**Examples**
{% include_relative _generated_function_html/xml-funcs-examples.html %}

### URL Functions
{% include_relative generated-url-funcs-table.html %}
#### Examples
{% include_relative generated-url-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-bitwise-funcs-table.html' %}
{% include_relative _generated_function_html/url-funcs-table.html %}

**Examples**
{% include_relative _generated_function_html/url-funcs-examples.html %}

### Bitwise Functions
{% include_relative generated-bitwise-funcs-table.html %}
#### Examples
{% include_relative generated-bitwise-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-conversion-funcs-table.html' %}
{% include_relative _generated_function_html/bitwise-funcs-table.html %}

**Examples**
{% include_relative _generated_function_html/bitwise-funcs-examples.html %}

### Conversion Functions
{% include_relative generated-conversion-funcs-table.html %}
#### Examples
{% include_relative generated-conversion-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-predicate-funcs-table.html' %}
{% include_relative _generated_function_html/conversion-funcs-table.html %}

**Examples**
{% include_relative _generated_function_html/conversion-funcs-examples.html %}

### Predicate Functions
{% include_relative generated-predicate-funcs-table.html %}
#### Examples
{% include_relative generated-predicate-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-misc-funcs-table.html' %}
{% include_relative _generated_function_html/predicate-funcs-table.html %}

**Examples**
{% include_relative _generated_function_html/predicate-funcs-examples.html %}

### Misc Functions
{% include_relative generated-misc-funcs-table.html %}
#### Examples
{% include_relative generated-misc-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-generator-funcs-table.html' %}
{% include_relative _generated_function_html/misc-funcs-table.html %}

**Examples**
{% include_relative _generated_function_html/misc-funcs-examples.html %}

### Generator Functions
{% include_relative generated-generator-funcs-table.html %}
#### Examples
{% include_relative generated-generator-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}
{% include_relative _generated_function_html/generator-funcs-table.html %}

**Examples**
{% include_relative _generated_function_html/generator-funcs-examples.html %}
2 changes: 1 addition & 1 deletion docs/sql-ref-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ license: |
---

Spark SQL provides two function features to meet a wide range of user needs: built-in functions and user-defined functions (UDFs).
Built-in functions are commonly used routines that Spark SQL predefines and a complete list of the functions can be found in the [Built-in Functions](api/sql/) API document. UDFs allow users to define their own functions when the system’s built-in functions are not enough to perform the desired task.
Built-in functions are commonly used routines that Spark SQL predefines and a complete list of the functions can be found in the [Built-in Functions](api/sql/index.html) API document. UDFs allow users to define their own functions when the system’s built-in functions are not enough to perform the desired task.

### Built-in Functions

Expand Down
15 changes: 9 additions & 6 deletions sql/gen-sql-functions-docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,19 @@
#

import itertools
import os
import re
import shutil
from collections import namedtuple
from pathlib import Path

# To avoid adding a new direct dependency, we import markdown from within mkdocs.
from mkdocs.structure.pages import markdown

from pyspark.java_gateway import launch_gateway


SPARK_PROJECT_ROOT = Path(__file__).parents[1]

ExpressionInfo = namedtuple("ExpressionInfo", "name usage examples group")

groups = {
Expand Down Expand Up @@ -210,7 +213,7 @@ def generate_functions_table_html(jvm, html_output_dir):
for key, infos in _list_grouped_function_infos(jvm):
function_table = _make_pretty_usage(infos)
key = key.replace("_", "-")
with open("%s/generated-%s-table.html" % (html_output_dir, key), 'w') as table_html:
with open(html_output_dir / f"{key}-table.html", 'w') as table_html:
table_html.write(function_table)


Expand All @@ -233,16 +236,16 @@ def generate_functions_examples_html(jvm, jspark, html_output_dir):
examples = _make_pretty_examples(jspark, infos)
key = key.replace("_", "-")
if examples is not None:
with open("%s/generated-%s-examples.html" % (
html_output_dir, key), 'w') as examples_html:
with open(html_output_dir / f"{key}-examples.html", 'w') as examples_html:
examples_html.write(examples)


if __name__ == "__main__":
jvm = launch_gateway().jvm
jspark = jvm.org.apache.spark.sql.SparkSession.builder().getOrCreate()
jspark.sparkContext().setLogLevel("ERROR") # Make it less noisy.
spark_root_dir = os.path.dirname(os.path.dirname(__file__))
html_output_dir = os.path.join(spark_root_dir, "docs")
html_output_dir = SPARK_PROJECT_ROOT / "docs" / "_generated_function_html"
shutil.rmtree(html_output_dir, ignore_errors=True)
html_output_dir.mkdir()
generate_functions_table_html(jvm, html_output_dir)
generate_functions_examples_html(jvm, jspark, html_output_dir)