-
Notifications
You must be signed in to change notification settings - Fork 52
[SPARK-40528] Support dockerfile template #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
HyukjinKwon
approved these changes
Oct 17, 2022
martin-g
reviewed
Oct 17, 2022
Member
Author
Just a rebase, will merge soon |
Member
Author
|
@HyukjinKwon @martin-g Thanks, merged. |
9 tasks
dongjoon-hyun
added a commit
that referenced
this pull request
Jun 23, 2025
…87) ### What changes were proposed in this pull request? This PR aims to use `/nonexistent` explicitly instead of nonexistent `/home/spark` because the current status is misleading. Please note that SPARK-40528 introduced `useradd --system` which created `spark` user with a non-existent `/home/spark` directory from the beginning of this repository, `spark-docker`. - #12 https://github.com/apache/spark-docker/blob/c264d48dc510018095700ed33e700ccc34268bf2/Dockerfile.template#L21-L22 **Rejected Alternatives** - We can set `HOME` to `/opt/spark` like Apache Spark behavior. However, it's also different from `WORKDIR` (`/opt/spark/work-dir`). - We can create `/home/spark`, but it could be more vulnerable than AS-IS status. For `system` account, `/nonexistent` is frequently used as the security practice to prevent any side effects of `HOME` directory. ``` $ docker run -it --rm apache/spark:4.0.0 cat /etc/passwd | grep /nonexistent nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin _apt:x:100:65534::/nonexistent:/usr/sbin/nologin ``` ### Why are the changes needed? **Apache Spark 3.3.3** ``` $ docker run -it --rm apache/spark:3.3.3 /opt/spark/bin/spark-sql ... 25/06/20 20:15:41 WARN SparkSQLCLIDriver: WARNING: Directory for Hive history file: /home/spark does not exist. History will not be available during this session. ``` ``` $ docker run -it --rm -uroot apache/spark:3.3.3 tail -1 /etc/passwd spark:x:185:185::/home/spark:/bin/sh $ docker run -it --rm -uroot apache/spark:3.3.3 ls -al /home/spark ls: cannot access '/home/spark': No such file or directory ``` **Apache Spark 3.4.4** ``` $ docker run -it --rm -uroot apache/spark:3.4.4 tail -1 /etc/passwd spark:x:185:185::/home/spark:/bin/sh $ docker run -it --rm -uroot apache/spark:3.4.4 ls -al /home/spark ls: cannot access '/home/spark': No such file or directory ``` **Apache Spark 3.5.6** ``` $ docker run -it --rm -uroot apache/spark:3.5.6 tail -1 /etc/passwd spark:x:185:185::/home/spark:/bin/sh $ docker run -it --rm -uroot apache/spark:3.5.6 ls /home/spark ls: cannot access '/home/spark': No such file or directory ``` **Apache Spark 4.0.0** ``` $ docker run -it --rm -uroot apache/spark:4.0.0 tail -1 /etc/passwd spark:x:185:185::/home/spark:/bin/sh $ docker run -it --rm -uroot apache/spark:4.0.0 ls /home/spark ls: cannot access '/home/spark': No such file or directory ``` ### Does this PR introduce _any_ user-facing change? No behavior change because it doesn't exist already. ### How was this patch tested? Manual review.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This patch:
Dockerfile.templatecontains 3 vars:BASE_IMAGEfor base image name,HAVE_PYfor adding python support,HAVE_Rfor adding sparkr support.add-dockerfiles.sh, you can./add-dockerfiles.sh 3.3.0tempalte.pyto help generate dockerfile from jinja template.Why are the changes needed?
Generate the dockerfiles to make life easier.
Does this PR introduce any user-facing change?
No, dev only.
How was this patch tested?
lint: