Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 14 additions & 42 deletions docs/sql-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ from a Hive table, or from [Spark data sources](#data-sources).

As an example, the following creates a DataFrame based on the content of a JSON file:

{% include_example create_DataFrames r/RSparkSQLExample.R %}
{% include_example create_df r/RSparkSQLExample.R %}

</div>
</div>
Expand Down Expand Up @@ -180,7 +180,7 @@ In addition to simple column references and expressions, DataFrames also have a

<div data-lang="r" markdown="1">

{% include_example dataframe_operations r/RSparkSQLExample.R %}
{% include_example untyped_ops r/RSparkSQLExample.R %}

For a complete list of the types of operations that can be performed on a DataFrame refer to the [API Documentation](api/R/index.html).

Expand Down Expand Up @@ -214,7 +214,7 @@ The `sql` function on a `SparkSession` enables applications to run SQL queries p
<div data-lang="r" markdown="1">
The `sql` function enables applications to run SQL queries programmatically and returns the result as a `SparkDataFrame`.

{% include_example sql_query r/RSparkSQLExample.R %}
{% include_example run_sql r/RSparkSQLExample.R %}

</div>
</div>
Expand Down Expand Up @@ -377,7 +377,7 @@ In the simplest form, the default data source (`parquet` unless otherwise config

<div data-lang="r" markdown="1">

{% include_example source_parquet r/RSparkSQLExample.R %}
{% include_example generic_load_save_functions r/RSparkSQLExample.R %}

</div>
</div>
Expand All @@ -400,13 +400,11 @@ using this syntax.
</div>

<div data-lang="python" markdown="1">

{% include_example manual_load_options python/sql/datasource.py %}
</div>
<div data-lang="r" markdown="1">

{% include_example source_json r/RSparkSQLExample.R %}

<div data-lang="r" markdown="1">
{% include_example manual_load_options r/RSparkSQLExample.R %}
</div>
</div>

Expand All @@ -425,13 +423,11 @@ file directly with SQL.
</div>

<div data-lang="python" markdown="1">

{% include_example direct_sql python/sql/datasource.py %}
</div>

<div data-lang="r" markdown="1">

{% include_example direct_query r/RSparkSQLExample.R %}
{% include_example direct_sql r/RSparkSQLExample.R %}

</div>
</div>
Expand Down Expand Up @@ -523,7 +519,7 @@ Using the data from the above example:

<div data-lang="r" markdown="1">

{% include_example load_programmatically r/RSparkSQLExample.R %}
{% include_example basic_parquet_example r/RSparkSQLExample.R %}

</div>

Expand Down Expand Up @@ -839,7 +835,7 @@ Note that the file that is offered as _a json file_ is not a typical JSON file.
line must contain a separate, self-contained valid JSON object. As a consequence,
a regular multi-line JSON file will most often fail.

{% include_example load_json_file r/RSparkSQLExample.R %}
{% include_example json_dataset r/RSparkSQLExample.R %}

</div>

Expand Down Expand Up @@ -925,7 +921,7 @@ You may need to grant write privilege to the user who starts the spark applicati
When working with Hive one must instantiate `SparkSession` with Hive support. This
adds support for finding tables in the MetaStore and writing queries using HiveQL.

{% include_example hive_table r/RSparkSQLExample.R %}
{% include_example spark_hive r/RSparkSQLExample.R %}

</div>
</div>
Expand Down Expand Up @@ -1067,43 +1063,19 @@ the Data Sources API. The following options are supported:
<div class="codetabs">

<div data-lang="scala" markdown="1">

{% highlight scala %}
val jdbcDF = spark.read.format("jdbc").options(
Map("url" -> "jdbc:postgresql:dbserver",
"dbtable" -> "schema.tablename")).load()
{% endhighlight %}

{% include_example jdbc_dataset scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
</div>

<div data-lang="java" markdown="1">

{% highlight java %}

Map<String, String> options = new HashMap<>();
options.put("url", "jdbc:postgresql:dbserver");
options.put("dbtable", "schema.tablename");

Dataset<Row> jdbcDF = spark.read().format("jdbc"). options(options).load();
{% endhighlight %}


{% include_example jdbc_dataset java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
</div>

<div data-lang="python" markdown="1">

{% highlight python %}

df = spark.read.format('jdbc').options(url='jdbc:postgresql:dbserver', dbtable='schema.tablename').load()

{% endhighlight %}

{% include_example jdbc_dataset python/sql/datasource.py %}
</div>

<div data-lang="r" markdown="1">

{% include_example jdbc r/RSparkSQLExample.R %}

{% include_example jdbc_dataset r/RSparkSQLExample.R %}
</div>

<div data-lang="sql" markdown="1">
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@
// $example on:basic_parquet_example$
import org.apache.spark.api.java.function.MapFunction;
import org.apache.spark.sql.Encoders;
// import org.apache.spark.sql.Encoders;
// $example on:schema_merging$
// $example on:json_dataset$
import org.apache.spark.sql.Dataset;
Expand Down Expand Up @@ -92,14 +91,15 @@ public void setCube(int cube) {
public static void main(String[] args) {
SparkSession spark = SparkSession
.builder()
.appName("Java Spark SQL Data Sources Example")
.appName("Java Spark SQL data sources example")
.config("spark.some.config.option", "some-value")
.getOrCreate();

runBasicDataSourceExample(spark);
runBasicParquetExample(spark);
runParquetSchemaMergingExample(spark);
runJsonDatasetExample(spark);
runJdbcDatasetExample(spark);

spark.stop();
}
Expand Down Expand Up @@ -183,10 +183,10 @@ private static void runParquetSchemaMergingExample(SparkSession spark) {
// The final schema consists of all 3 columns in the Parquet files together
// with the partitioning column appeared in the partition directory paths
// root
// |-- value: int (nullable = true)
// |-- square: int (nullable = true)
// |-- cube: int (nullable = true)
// |-- key : int (nullable = true)
// |-- value: int (nullable = true)
// |-- square: int (nullable = true)
// |-- cube: int (nullable = true)
// |-- key: int (nullable = true)
// $example off:schema_merging$
}

Expand Down Expand Up @@ -216,4 +216,15 @@ private static void runJsonDatasetExample(SparkSession spark) {
// $example off:json_dataset$
}

private static void runJdbcDatasetExample(SparkSession spark) {
// $example on:jdbc_dataset$
Dataset<Row> jdbcDF = spark.read()
.format("jdbc")
.option("url", "jdbc:postgresql:dbserver")
.option("dbtable", "schema.tablename")
.option("user", "username")
.option("password", "password")
.load();
// $example off:jdbc_dataset$
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ public static void main(String[] args) {
// $example on:init_session$
SparkSession spark = SparkSession
.builder()
.appName("Java Spark SQL Example")
.appName("Java Spark SQL basic example")
.config("spark.some.config.option", "some-value")
.getOrCreate();
// $example off:init_session$
Expand Down
2 changes: 1 addition & 1 deletion examples/src/main/python/sql/basic.py
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ def programmatic_schema_example(spark):
# $example on:init_session$
spark = SparkSession \
.builder \
.appName("PythonSQL") \
.appName("Python Spark SQL basic example") \
.config("spark.some.config.option", "some-value") \
.getOrCreate()
# $example off:init_session$
Expand Down
32 changes: 23 additions & 9 deletions examples/src/main/python/sql/datasource.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,14 +92,14 @@ def parquet_schema_merging_example(spark):
# The final schema consists of all 3 columns in the Parquet files together
# with the partitioning column appeared in the partition directory paths.
# root
# |-- double: long (nullable = true)
# |-- single: long (nullable = true)
# |-- triple: long (nullable = true)
# |-- key: integer (nullable = true)
# |-- double: long (nullable = true)
# |-- single: long (nullable = true)
# |-- triple: long (nullable = true)
# |-- key: integer (nullable = true)
# $example off:schema_merging$


def json_dataset_examplg(spark):
def json_dataset_example(spark):
# $example on:json_dataset$
# spark is from the previous example.
sc = spark.sparkContext
Expand All @@ -112,8 +112,8 @@ def json_dataset_examplg(spark):
# The inferred schema can be visualized using the printSchema() method
peopleDF.printSchema()
# root
# |-- age: long (nullable = true)
# |-- name: string (nullable = true)
# |-- age: long (nullable = true)
# |-- name: string (nullable = true)

# Creates a temporary view using the DataFrame
peopleDF.createOrReplaceTempView("people")
Expand All @@ -140,15 +140,29 @@ def json_dataset_examplg(spark):
# +---------------+----+
# $example off:json_dataset$


def jdbc_dataset_example(spark):
# $example on:jdbc_dataset$
jdbcDF = spark.read \
.format("jdbc") \
.option("url", "jdbc:postgresql:dbserver") \
.option("dbtable", "schema.tablename") \
.option("user", "username") \
.option("password", "password") \
.load()
# $example off:jdbc_dataset$


if __name__ == "__main__":
spark = SparkSession \
.builder \
.appName("PythonSQL") \
.appName("Python Spark SQL data source example") \
.getOrCreate()

basic_datasource_example(spark)
parquet_example(spark)
parquet_schema_merging_example(spark)
json_dataset_examplg(spark)
json_dataset_example(spark)
jdbc_dataset_example(spark)

spark.stop()
2 changes: 1 addition & 1 deletion examples/src/main/python/sql/hive.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@

spark = SparkSession \
.builder \
.appName("PythonSQL") \
.appName("Python Spark SQL Hive integration example") \
.config("spark.sql.warehouse.dir", warehouse_location) \
.enableHiveSupport() \
.getOrCreate()
Expand Down
Loading