Skip to content

Conversation

@Yikun
Copy link
Member

@Yikun Yikun commented Oct 11, 2022

What changes were proposed in this pull request?

This patch is to make the workflow work in apache repo:

  • Swtich event from pull_request to pull_request_target to make sure PR has permision to push test image to ghcr.io/apache/spark-docker/spark:TAG. This real push image will help us do K8s E2E test in future.
  • Add .github/workflows/build_3.3.0.yaml and 3.3.0/** to trigger paths
  • Add packages write permissions
  • Change apache/spark-docker:TAG to ghcr.io/apache/spark-docker/spark:TAG

Why are the changes needed?

To make the workflow works well in apache repo.

Does this PR introduce any user-facing change?

No

How was this patch tested?

The CI will only work after this patch merged, it won't be triggered in this PR (due to pull_request_target)

We can't do complete test before this merge but:

@Yikun Yikun force-pushed the SPARK-40516-followup branch from 0a89805 to 91ad0ea Compare October 11, 2022 09:12
@Yikun Yikun changed the title [SPARK-40516][INFRA][FOLLOWUP] Switch topull_request_target and fix workflow [SPARK-40746][INFRA] Switch topull_request_target and fix workflow Oct 11, 2022
@Yikun Yikun marked this pull request as ready for review October 11, 2022 09:23
@Yikun
Copy link
Member Author

Yikun commented Oct 11, 2022

cc @HyukjinKwon @martin-g

jobs:
run-build:
permissions:
packages: write
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed ? This workflow just delegates to main.yml where this is also specified.
It does not do any harm either!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's needed see also: apache/spark#38145

Co-authored-by: Martin Grigorov <[email protected]>
context: ${{ env.IMAGE_PATH }}
push: true
tags: ${{ env.TEST_REPO }}:${{ env.UNIQUE_IMAGE_TAG }}
tags: ${{ env.TEST_REPO }}/${{ env.IMAGE_NAME }}:${{ env.UNIQUE_IMAGE_TAG }}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emm, this might has some issue when multiple concurrency jobs push the same tag. Of course, we could use action concurrency or add some random hash tag to protect.

But after some consideration, I guess what we need to do just remove push in this PR

push: true

And change it to local build in minikube docker (don't do a real push) when we adding K8s test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is replaced by #7

@Yikun Yikun closed this in c116698 Oct 12, 2022
Yikun added a commit that referenced this pull request Oct 17, 2022
### What changes were proposed in this pull request?
This patch:
- Add spark uid/gid in dockerfile (useradd and groupadd). (used in entrypoint) This way is also used by [others DOI](https://github.com/search?p=2&q=org%3Adocker-library+useradd&type=Code) and apache DOI (such as [zookeeper](https://github.com/31z4/zookeeper-docker/blob/master/3.8.0/Dockerfile#L17-L21), [solr](https://github.com/apache/solr-docker/blob/a20477ed123cd1a72132aebcc0742cee46b5f976/9.0/Dockerfile#L108-L110), [flink](https://github.com/apache/flink-docker/blob/master/1.15/scala_2.12-java11-ubuntu/Dockerfile#L55-L56)).
- Use `spark` user in `entrypoint.sh` rather than Dockerfile. (make sure the spark process is executed as non-root users)
- Remove `USER` setting in Dockerfile. (make sure base image has permission to extend dockerifle, such as execute `apt update`)
- Chown script to `spark:spark` instead of `root:root`. (avoid permission issue such like standalone mode)
- Add `gosu` deps, a `sudo` replacement recommanded by [docker](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#user) and [docker official image](https://github.com/docker-library/official-images/blob/9a4d54f1a42ea82970baa4e6f3d0bc75e98fc961/README.md#consistency), and also are used by other DOI images.

This change also follow the rules of docker official images, see also [consistency](https://github.com/docker-library/official-images/blob/9a4d54f1a42ea82970baa4e6f3d0bc75e98fc961/README.md#consistency) and [dockerfile best practices about user](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#user).

### Why are the changes needed?

The below issues are what I have found so far

1. **Irregular login username**
  Docker images username is not very standard, docker run with `185` username is a little bit weird.

  ```
  $ docker run -ti apache/spark bash
  185d88a24357413:/opt/spark/work-dir$
  ```

2. **Permission issue of spark sbin**
And also there are some permission issue when running some spark script, such as standalone mode:

  ```
  $ docker run -ti apache/spark /opt/spark/sbin/start-master.sh

  mkdir: cannot create directory ‘/opt/spark/logs’: Permission denied
  chown: cannot access '/opt/spark/logs': No such file or directory
  starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out
  /opt/spark/sbin/spark-daemon.sh: line 135: /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out: No such file or directory
  failed to launch: nice -n 0 /opt/spark/bin/spark-class org.apache.spark.deploy.master.Master --host 1c345a00e312 --port 7077 --webui-port 8080
  tail: cannot open '/opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out' for reading: No such file or directory
  full log in /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out
  ```

  </details>

3. **spark as base image case is not supported well**
  Due to static USER set in Dockerfile.
  ```
  $ cat Dockerfile
  FROM apache/spark
  RUN apt update

  $  docker build -t spark-test:1015 .
  // ...
  ------
   > [2/2] RUN apt update:
  #5 0.405 E: Could not open lock file /var/lib/apt/lists/lock - open (13: Permission denied)
  #5 0.405 E: Unable to lock directory /var/lib/apt/lists/
  ------
  executor failed running [/bin/sh -c apt update]: exit code: 100

  ```

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
- CI passed: all k8s test

- Regression test:
```
# Username is set to spark rather than 185
docker run -ti spark:scala2.12-java11-python3-r-ubuntu bash
spark27bbfca0a581:/opt/spark/work-dir$
```
```
# start-master.sh no permission issue
$ docker run -ti spark:scala2.12-java11-python3-r-ubuntu bash

spark8d1118e26766:~/work-dir$ /opt/spark/sbin/start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-8d1118e26766.out
```
```
# Image as parent case
$ cat Dockerfile
FROM spark:scala2.12-java11-python3-r-ubuntu
RUN apt update
$ docker build -t spark-test:1015 .
[+] Building 7.8s (6/6) FINISHED
 => [1/2] FROM docker.io/library/spark:scala2.12-java11-python3-r-ubuntu                                                                                                              0.0s
 => [2/2] RUN apt update                                                                                                                                                              7.7s
```

- Other test:
```
# Test on pyspark
$ cd spark-docker/3.3.0/scala2.12-java11-python3-r-ubuntu
$ docker build -t spark:scala2.12-java11-python3-r-ubuntu .
$ docker run -p 4040:4040 -ti spark:scala2.12-java11-python3-r-ubuntu /opt/spark/bin/pyspark
```

```
# A simple test for `start-master.sh` (standalone mode)
$ docker run -ti spark:scala2.12-java11-python3-r-ubuntu bash
spark8d1118e26766:~/work-dir$ /opt/spark/sbin/start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-8d1118e26766.out
```

Closes #11 from Yikun/spark-user.

Authored-by: Yikun Jiang <[email protected]>
Signed-off-by: Yikun Jiang <[email protected]>
Yikun added a commit that referenced this pull request May 25, 2023
### What changes were proposed in this pull request?
- This patch changes the `build-args` to `patch in test` in build and publish workflow, because the docker official image do not support **parameterized FROM** values. docker-library/official-images#13089 (comment)
- And also Refactor publish workflow:
![image](https://user-images.githubusercontent.com/1736354/236613626-96f8fbf6-7df7-4d10-b4fb-be4d57c56dce.png)
### Why are the changes needed?
Same change with build workflow refactor, to avoid the publish issue like:
```
#5 [linux/amd64 internal] load metadata for docker.io/library/spark:3.4.0-scala2.12-java11-ubuntu
#5 ERROR: pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
------
 > [linux/amd64 internal] load metadata for docker.io/library/spark:3.4.0-scala2.12-java11-ubuntu:
------
Dockerfile:18
--------------------
  16 |     #
  17 |     ARG BASE_IMAGE=spark:3.4.0-scala2.12-java11-ubuntu
  18 | >>> FROM $BASE_IMAGE
  19 |
  20 |     RUN set -ex && \
--------------------
ERROR: failed to solve: spark:3.4.0-scala2.12-java11-ubuntu: pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
Error: buildx failed with: ERROR: failed to solve: spark:3.4.0-scala2.12-java11-ubuntu: pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Publish test in my local fork:
- https://github.com/Yikun/spark-docker/actions/runs/5076986823/jobs/9120029759: Skip the local base build use the [published base](https://github.com/Yikun/spark-docker/actions/runs/5076986823/jobs/9120029759#step:11:135) image:

![image](https://user-images.githubusercontent.com/1736354/236612540-2b454c14-e194-4d73-b859-0df001570d27.png)

```
#3 [linux/amd64 internal] load metadata for ghcr.io/yikun/spark-docker/spark:3.4.0-scala2.12-java11-ubuntu
#3 DONE 0.9s

#4 [linux/arm64 internal] load metadata for ghcr.io/yikun/spark-docker/spark:3.4.0-scala2.12-java11-ubuntu
#4 DONE 0.9s
```

- CI passed: do local base build first and build base on the local build

Closes #39 from Yikun/publish-build.

Authored-by: Yikun Jiang <[email protected]>
Signed-off-by: Yikun Jiang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants