Skip to content

Conversation

@Yikun
Copy link
Member

@Yikun Yikun commented Oct 17, 2022

What changes were proposed in this pull request?

This patch:

  • Add dockerfile template: Dockerfile.template contains 3 vars: BASE_IMAGE for base image name, HAVE_PY for adding python support, HAVE_R for adding sparkr support.
  • Add a script: add-dockerfiles.sh, you can ./add-dockerfiles.sh 3.3.0
  • Add a tool: tempalte.py to help generate dockerfile from jinja template.

Why are the changes needed?

Generate the dockerfiles to make life easier.

Does this PR introduce any user-facing change?

No, dev only.

How was this patch tested?

# Prepare new env
python3 -m venv ~/xxx
pip install -r ./tools/requirements.txt
source ~/xxx/bin/activate

# Generate 3.3.0
./add-dockerfiles.sh 3.3.0

# no diff
git diff

lint:

$ flake8 ./tools/template.py
$ black ./tools/template.py
All done! ✨ 🍰 ✨
1 file left unchanged.

@Yikun Yikun marked this pull request as ready for review October 17, 2022 08:16
@Yikun
Copy link
Member Author

Yikun commented Oct 17, 2022

(venv) ➜  spark-docker git:(SPARK-40528) flake8 ./tools/template.py
(venv) ➜  spark-docker git:(SPARK-40528) black ./tools/template.py
All done! ✨ 🍰 ✨
1 file left unchanged.
(venv) ➜  spark-docker git:(SPARK-40528) ./add-dockerfiles.sh 3.3.0
(venv) ➜  spark-docker git:(SPARK-40528) git --no-pager diff

Just a rebase, will merge soon

@Yikun Yikun closed this in 6459e3d Oct 17, 2022
@Yikun
Copy link
Member Author

Yikun commented Oct 17, 2022

@HyukjinKwon @martin-g Thanks, merged.

dongjoon-hyun added a commit that referenced this pull request Jun 23, 2025
…87)

### What changes were proposed in this pull request?

This PR aims to use `/nonexistent` explicitly instead of nonexistent `/home/spark` because the current status is misleading.

Please note that SPARK-40528 introduced `useradd --system` which created `spark` user with a non-existent `/home/spark` directory from the beginning of this repository, `spark-docker`.

- #12 

  https://github.com/apache/spark-docker/blob/c264d48dc510018095700ed33e700ccc34268bf2/Dockerfile.template#L21-L22

**Rejected Alternatives**

- We can set `HOME` to `/opt/spark` like Apache Spark behavior. However, it's also different from `WORKDIR` (`/opt/spark/work-dir`).
- We can create `/home/spark`, but it could be more vulnerable than AS-IS status. For `system` account, `/nonexistent` is frequently used as the security practice to prevent any side effects of `HOME` directory.

```
$ docker run -it --rm apache/spark:4.0.0 cat /etc/passwd | grep /nonexistent
nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
_apt:x:100:65534::/nonexistent:/usr/sbin/nologin
```

### Why are the changes needed?

**Apache Spark 3.3.3**

```
$ docker run -it --rm apache/spark:3.3.3 /opt/spark/bin/spark-sql
...
25/06/20 20:15:41 WARN SparkSQLCLIDriver: WARNING: Directory for Hive history file: /home/spark does not exist.   History will not be available during this session.
```

```
$ docker run -it --rm -uroot apache/spark:3.3.3 tail -1 /etc/passwd
spark:x:185:185::/home/spark:/bin/sh

$ docker run -it --rm -uroot apache/spark:3.3.3 ls -al /home/spark
ls: cannot access '/home/spark': No such file or directory
```

**Apache Spark 3.4.4**

```
$ docker run -it --rm -uroot apache/spark:3.4.4 tail -1 /etc/passwd
spark:x:185:185::/home/spark:/bin/sh

$ docker run -it --rm -uroot apache/spark:3.4.4 ls -al /home/spark
ls: cannot access '/home/spark': No such file or directory
```

**Apache Spark 3.5.6**

```
$ docker run -it --rm -uroot apache/spark:3.5.6 tail -1 /etc/passwd
spark:x:185:185::/home/spark:/bin/sh

$ docker run -it --rm -uroot apache/spark:3.5.6 ls /home/spark
ls: cannot access '/home/spark': No such file or directory
```

**Apache Spark 4.0.0**
```
$ docker run -it --rm -uroot apache/spark:4.0.0 tail -1 /etc/passwd
spark:x:185:185::/home/spark:/bin/sh

$ docker run -it --rm -uroot apache/spark:4.0.0 ls /home/spark
ls: cannot access '/home/spark': No such file or directory
```

### Does this PR introduce _any_ user-facing change?

No behavior change because it doesn't exist already.

### How was this patch tested?

Manual review.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants