Skip to content

Conversation

@peteski22
Copy link
Contributor

@peteski22 peteski22 commented Mar 21, 2025

What's changing

A more scoped version of #1262

  • Pins (all) dependencies in job requirements (s3fs wasn't versioned before)
  • Removes request-mock from requirements.txt
  • Updates notebook walkthrough to give it time for the dataset to be uploaded post job completion (bug raised, referenced below)

Refs: #1262
Refs: #1263

How to test it

Steps to test the changes:

Additional notes for reviewers

Here's an example of what seemed to be going wrong inside of Ray (causing jobs to fail):

2025-03-20 11:21:10,881	INFO job_manager.py:530 -- Runtime env is setting up.
Traceback (most recent call last):
  File "/tmp/ray/session_2025-03-20_11-19-23_851423_11/runtime_resources/working_dir_files/_ray_pkg_c9f99c39b3b6b463/inference.py", line 9, in <module>
    from dataset import create_dataloader
  File "/tmp/ray/session_2025-03-20_11-19-23_851423_11/runtime_resources/working_dir_files/_ray_pkg_c9f99c39b3b6b463/dataset.py", line 1, in <module>
    from datasets import Dataset
  File "/tmp/ray_pip_cache/d34a54b93f0976e9f981644a123931d08eddb8c6/virtualenv/lib/python3.11/site-packages/datasets/__init__.py", line 17, in <module>
    from .arrow_dataset import Dataset
  File "/tmp/ray_pip_cache/d34a54b93f0976e9f981644a123931d08eddb8c6/virtualenv/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 59, in <module>
    import pandas as pd
  File "/home/ray/anaconda3/lib/python3.11/site-packages/pandas/__init__.py", line 22, in <module>
    from pandas.compat import is_numpy_dev as _is_numpy_dev  # pyright: ignore # noqa:F401
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/anaconda3/lib/python3.11/site-packages/pandas/compat/__init__.py", line 18, in <module>
    from pandas.compat.numpy import (
  File "/home/ray/anaconda3/lib/python3.11/site-packages/pandas/compat/numpy/__init__.py", line 4, in <module>
    from pandas.util.version import Version
  File "/home/ray/anaconda3/lib/python3.11/site-packages/pandas/util/__init__.py", line 2, in <module>
    from pandas.util._decorators import (  # noqa:F401
  File "/home/ray/anaconda3/lib/python3.11/site-packages/pandas/util/_decorators.py", line 14, in <module>
    from pandas._libs.properties import cache_readonly
  File "/home/ray/anaconda3/lib/python3.11/site-packages/pandas/_libs/__init__.py", line 13, in <module>
    from pandas._libs.interval import Interval
  File "pandas/_libs/interval.pyx", line 1, in init pandas._libs.interval
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

I already...

  • Tested the changes in a working environment to ensure they work as expected
  • Added some tests for any new functionality
  • Updated the documentation (both comments in code and product documentation under /docs)
  • Checked if a (backend) DB migration step was required and included it if required

@peteski22 peteski22 merged commit 06f16a9 into main Mar 21, 2025
21 checks passed
@peteski22 peteski22 deleted the peteski22/gha/fixes/deps-and-notebook branch March 21, 2025 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants