Skip to content

Conversation

@mattdangerw
Copy link
Member

Move from DTensor to the keras.distribution API in Keras 3.

The overall plan is that we will have a default layout map that will work with most of backbones (because most of our backbones share the same weights structure). A backbone that needs a separate layout map should override create_layout_map.

Note that this will not pass on CI for a while, because we need to tests agains the latest keras-nightlies for this to work. That is blocked on the keras-nightly being available and tensorflow-text-nightly no longer being a month+ old. Both in route I think.

@mattdangerw mattdangerw requested a review from qlzh727 October 5, 2023 22:05
@mattdangerw mattdangerw force-pushed the model-parallel-distribution branch from 18bff4f to 0770022 Compare October 5, 2023 22:05
@mattdangerw
Copy link
Member Author

@qlzh727 any thoughts on how to best test this (once we have a nightly environment to test)?

Previously we were just making a 1x1 mesh and passing it in to exercise the code path, which is not great but at least it is something. Is there more we can do here?

@qlzh727
Copy link
Member

qlzh727 commented Oct 5, 2023

For single worker testing, I think you can take approaches like https://github.com/keras-team/keras/blob/c8a5a8969a8712a9a1939937ce34158e04cfc09d/keras/distribution/distribution_lib_test.py#L464.

This will run with 8 virtual CPUs locally, and you can do some e2e verification.

The config of 8 local virtual CPUs can be found in https://github.com/keras-team/keras/blob/c8a5a8969a8712a9a1939937ce34158e04cfc09d/keras/distribution/distribution_lib_test.py#L17.

mattdangerw added a commit to mattdangerw/keras-hub that referenced this pull request Oct 9, 2023
We will replace this with the work on
keras-team#1267
But we have no coverage for that PR till we run tests against Keras 3,
which will probably still be about a week.

For now, let's just remove this usage, which is no longer needed and
will break a Keras 3 install.
@mattdangerw mattdangerw mentioned this pull request Oct 9, 2023
mattdangerw added a commit that referenced this pull request Oct 9, 2023
We will replace this with the work on
#1267
But we have no coverage for that PR till we run tests against Keras 3,
which will probably still be about a week.

For now, let's just remove this usage, which is no longer needed and
will break a Keras 3 install.
mattdangerw added a commit that referenced this pull request Nov 7, 2023
We will replace this with the work on
#1267
But we have no coverage for that PR till we run tests against Keras 3,
which will probably still be about a week.

For now, let's just remove this usage, which is no longer needed and
will break a Keras 3 install.
I813AWt29W added a commit to I813AWt29W/keras-nlp that referenced this pull request Aug 26, 2024
We will replace this with the work on
keras-team/keras-hub#1267
But we have no coverage for that PR till we run tests against Keras 3,
which will probably still be about a week.

For now, let's just remove this usage, which is no longer needed and
will break a Keras 3 install.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants