Enhance retry error handling with detailed context and VLM-CLOUD-PROD-MA detection #109

devin-ai-integration · 2025-07-25T04:15:15Z

Fix retry error handling to throw errors gracefully for VLM-CLOUD-PROD-MA

Summary

This PR enhances retry error handling across the VLM platform to provide more specific and actionable error information, particularly for the "VLM-CLOUD-PROD-MA" error scenario. The changes replace generic "unexpected-error" codes with specific error categorization and add detailed context to retry failures.

Key Changes:

vlm-lab: Added _categorize_error() function to identify Modal infrastructure, network, timeout, and VLM-CLOUD-PROD-MA specific errors
vlm-lab: Enhanced callback retry error handling with detailed error context and retry attempt tracking
vlmrun-python-sdk: Added retry attempt counts to error messages and specific VLM-CLOUD-PROD-MA error detection
Both repos: Maintained backward compatibility while providing more actionable debugging information

Review & Testing Checklist for Human

Verify VLM-CLOUD-PROD-MA error detection works with real scenarios - The string matching logic "VLM-CLOUD-PROD-MA" in str(exception) needs validation with actual production errors
Test error categorization with live Modal infrastructure failures - The _categorize_error() function categorizes modal.exception.ModalException but needs verification with real Modal errors
Confirm error logging/monitoring systems still work - Changes from "unexpected-error" to specific error codes could break downstream alerting or analytics
Test retry behavior under various failure conditions - Ensure enhanced error handling doesn't interfere with the actual retry mechanisms in production
Verify callback retry improvements work end-to-end - Test the _get_callback_error_details() function with real webhook callback failures

Recommended Test Plan:

Trigger Modal infrastructure errors in a test environment and verify they get categorized correctly
Test callback retry failures and confirm detailed error information is logged
Monitor error dashboards to ensure new error codes are properly captured
Simulate VLM-CLOUD-PROD-MA scenarios if possible to validate detection logic

Diagram

%%{ init : { "theme" : "default" }}%%
graph TD
    A["vlm/infra/cloud/_modal.py"]:::major-edit --> B["handle_request_errors()"]:::major-edit
    A --> C["call_callback_with_retry()"]:::major-edit
    B --> D["_categorize_error()"]:::major-edit
    C --> E["_get_callback_error_details()"]:::major-edit
    
    F["vlmrun/client/base_requestor.py"]:::major-edit --> G["_handle_retry_error()"]:::major-edit
    
    H["Modal Infrastructure"]:::context --> D
    I["Callback URLs"]:::context --> E
    J["RetryError Exceptions"]:::context --> G
    
    D --> K["Specific Error Codes<br/>(modal-infrastructure-error,<br/>vlm-cloud-prod-ma-error, etc.)"]:::major-edit
    E --> L["Detailed Callback<br/>Error Context"]:::major-edit
    G --> M["Enhanced Retry<br/>Error Messages"]:::major-edit

    subgraph Legend
        L1[Major Edit]:::major-edit
        L2[Minor Edit]:::minor-edit  
        L3[Context/No Edit]:::context
    end

    classDef major-edit fill:#90EE90
    classDef minor-edit fill:#87CEEB
    classDef context fill:#FFFFFF

Notes

Environment Issues: vlm-lab tests failed due to missing vlm module setup, so changes weren't fully tested in that repository
String-based Detection: VLM-CLOUD-PROD-MA errors are detected via string matching since no concrete examples were found in the codebase
Backward Compatibility: Error code changes maintain the same database schema but modify the actual values stored
Session Info: Requested by [email protected] - https://app.devin.ai/sessions/ada923c8d06549e4bf8241bbab8c3fb5

…-MA detection - Add retry attempt count to error messages for better debugging - Detect VLM-CLOUD-PROD-MA errors and provide specific error messages - Use getattr() for safe access to attempt_number to handle test mocks - Maintain backward compatibility while improving error context Co-Authored-By: [email protected] <[email protected]>

devin-ai-integration · 2025-07-25T04:15:17Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

devin-ai-integration · 2025-08-02T16:20:38Z

Closing due to inactivity for more than 7 days. Configure here.

devin-ai-integration bot temporarily deployed to dev July 25, 2025 04:15 Inactive

dineshreddy91 requested a review from shahrear33 July 28, 2025 19:42

devin-ai-integration bot closed this Aug 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhance retry error handling with detailed context and VLM-CLOUD-PROD-MA detection #109

Enhance retry error handling with detailed context and VLM-CLOUD-PROD-MA detection #109

Uh oh!

devin-ai-integration bot commented Jul 25, 2025

Uh oh!

devin-ai-integration bot commented Jul 25, 2025

Uh oh!

devin-ai-integration bot commented Aug 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Enhance retry error handling with detailed context and VLM-CLOUD-PROD-MA detection #109

Enhance retry error handling with detailed context and VLM-CLOUD-PROD-MA detection #109

Uh oh!

Conversation

devin-ai-integration bot commented Jul 25, 2025

Fix retry error handling to throw errors gracefully for VLM-CLOUD-PROD-MA

Summary

Review & Testing Checklist for Human

Diagram

Notes

Uh oh!

devin-ai-integration bot commented Jul 25, 2025

🤖 Devin AI Engineer

Uh oh!

devin-ai-integration bot commented Aug 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant