-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
ref(ecosystem): refactor link_all_repos to bulk update repositories #95494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look mostly good, just a question on partial failure handling.
lifecycle.record_halt(str(LinkAllReposHaltReason.RATE_LIMITED)) | ||
return | ||
|
||
metrics.incr(f"{integration_key}.link_all_repos.api_error") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any dashboards we need to update that use this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not aware of any monitoring we have on this, plus we should be using the lifecycle metrics anyway
repo_config=config, organization=rpc_org | ||
) | ||
success = True | ||
repo_configs.append(get_repo_config(repo, integration_id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏🏼 Much cleaner
lifecycle.record_halt( | ||
str(LinkAllReposHaltReason.REPOSITORY_NOT_CREATED), | ||
{"missing_repos": missing_repos, "integration_id": integration_id}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any other partial failures cases we need to worry about?
What does integration_repo_provider.create_repositories
do if any of these repo creations fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only case is where some fail and some succeed. Previously we would actually mark it as success if any repo was created, but here we only mark it as success if ALL repos are created/updated successfully.
If repo creation fails, we now raise an exception with the failed repos and mark it as a halt
Codecov ReportAttention: Patch coverage is ✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## master #95494 +/- ##
==========================================
+ Coverage 77.79% 82.71% +4.91%
==========================================
Files 10535 10535
Lines 607265 607335 +70
Branches 23794 23794
==========================================
+ Hits 472431 502333 +29902
+ Misses 134449 104617 -29832
Partials 385 385 |
d77e54a
to
57e95a4
Compare
57e95a4
to
c06c9c7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Duplicate Halt Events in Repo Linking
The link_all_repos
task can record a halt event twice for the same execution. This occurs when create_repositories()
raises a RepoExistsError
and the missing_repos
list is also non-empty. The RepoExistsError
handler records a halt but continues execution, allowing a second halt to be recorded by the missing_repos
check. This leads to duplicate lifecycle events and inaccurate metrics.
src/sentry/integrations/github/tasks/link_all_repos.py#L92-L107
sentry/src/sentry/integrations/github/tasks/link_all_repos.py
Lines 92 to 107 in c06c9c7
try: | |
integration_repo_provider.create_repositories( | |
configs=repo_configs, organization=rpc_org | |
) | |
except RepoExistsError as e: | |
lifecycle.record_halt( | |
str(LinkAllReposHaltReason.REPOSITORY_NOT_CREATED), | |
{"missing_repos": e.repos, "integration_id": integration_id}, | |
) | |
if missing_repos: | |
lifecycle.record_halt( | |
str(LinkAllReposHaltReason.REPOSITORY_NOT_CREATED), | |
{"missing_repos": missing_repos, "integration_id": integration_id}, | |
) |
Bug: Repository Update Fails to Set Provider
The _update_repository
method fails to update the provider
field for existing repositories. Unlike the previous create_repository
logic which explicitly set provider: self.id
during updates, the new method only applies fields from RepositoryConfig
, which does not include provider
. This omission causes inconsistencies and prevents the "migration" of repositories to a new provider.
src/sentry/plugins/providers/integration_repository.py#L206-L212
sentry/src/sentry/plugins/providers/integration_repository.py
Lines 206 to 212 in c06c9c7
def _update_repository(self, repo: RpcRepository, config: RepositoryConfig): | |
repo.status = ObjectStatus.ACTIVE | |
for field_name, field_value in config.items(): | |
setattr(repo, field_name, field_value) | |
return repo |
Was this report helpful? Give feedback by reacting with 👍 or 👎
We are hitting the processing deadline of 60 seconds for
link_all_repos
. We can improve the performance by making more bulk queries.Changes
RepositoryConfig
typeIntegrationRepositoryProvider.create_repository
) with possibly many RPC calls insideUnchanged
Repository
individually. This is because we may need to take actions for eachRepository
that fails creation. Django'sbulk_create
will fail and rollback all changes if there is an error. Alternatively, we can passignore_conflicts=True
tobulk_create
, but then we wouldn't be able to see which ones failed.Fixes ECO-801