A production-ready Streamlit application that converts legacy SAS code to PySpark or Databricks SQL using Databricks Foundation Models (Claude Sonnet 4.5).
- Multi-format Output: Generate PySpark DataFrame API or pure Databricks SQL
- Smart Conversion: 12 production-validated rules prevent common migration errors
- Interactive UI: Streamlit-based interface with real-time previews
- Foundation Models: Powered by Claude Sonnet 4.5 for optimal conversion quality
- Healthcare Focused: Optimized for healthcare payer analytics (HEDIS, risk adjustment, claims)
- Production Ready: Includes comprehensive instructions for LLM (1,700+ lines)
- Databricks workspace with Unity Catalog enabled
- Databricks CLI installed and configured (Installation Guide)
- Python 3.10+ (for local development only)
- Workspace permissions: Ability to create catalogs, schemas, and volumes
Best for: Automated bulk conversions with a user-friendly UI
1️⃣ Clone the Repository
git clone <your-repo-url>
cd SAS-work2️⃣ Configure Databricks CLI
# If not already configured
databricks configure
# Or set your profile (optional)
export DATABRICKS_PROFILE="your_profile_name"3️⃣ Run Full Setup (First Time Only)
This creates demo catalogs, data, volume, and uploads SAS files:
./setup_databricks_assistant.shTime: ~10 minutes | Creates: 2 catalogs, 29K rows of data, 6 SAS files
4️⃣ Deploy the Streamlit App
./deploy_streamlit_app.shTime: ~30 seconds | Deploys: Streamlit app to Databricks Apps
The script will show you a big reminder box at the end:
╔════════════════════════════════════════════════════════════════╗
║ ⚠️ ACTION REQUIRED: Grant App Permissions (One-Time) ║
╚════════════════════════════════════════════════════════════════╝
5️⃣ Grant App Permissions (One-Time)
The deploy script outputs SQL commands. Copy them to Databricks SQL Editor and run:
GRANT USE CATALOG ON CATALOG payer_dev TO `<auto-detected-uuid>`;
GRANT USE SCHEMA ON SCHEMA payer_dev.sas_migration TO `<auto-detected-uuid>`;
GRANT READ VOLUME ON VOLUME payer_dev.sas_migration.legacy_sas TO `<auto-detected-uuid>`;
GRANT SELECT ON CATALOG payer_dev TO `<auto-detected-uuid>`;💡 Tip: Run ./show_grant_commands.sh anytime to regenerate these commands with your app's UUID.
6️⃣ Access Your App
Open the app URL shown in the deploy output:
https://sas-converter-<workspace-id>.azuredatabricksapps.com
✅ You're Done! Start converting SAS code to PySpark/SQL.
Already deployed? Just need to update the app or instructions?
# Update app + instructions only (no data reload)
./deploy_streamlit_app.sh # ~30 secondsBest for: Interactive conversions directly in notebooks with AI assistance
Run the same setup script as Option 1:
./setup_databricks_assistant.shThis deploys .assistant_instructions.md to your workspace at:
/Workspace/Users/your.email/.assistant_instructions.md
- Open any Databricks notebook
- Click the Assistant icon
- Paste your SAS code
- Ask: "Convert this to PySpark"
- Assistant uses your custom instructions automatically
Benefits:
- Context-aware conversions based on your patterns
- Interactive Q&A about SAS → PySpark/SQL
- Works across all notebooks in your workspace
Learn more about custom instructions
# Install dependencies
pip install -r dashboard/requirements.txt
# Run locally
cd dashboard
streamlit run sas_converter_app.pySAS-work/
├── dashboard/ # Main application
│ ├── sas_converter_app.py # Entry point (1,318 lines)
│ ├── file_handler.py # File I/O utilities
│ ├── utils.py # Validation & formatting
│ ├── app.yaml # Databricks app config
│ └── config/
│ └── .assistant_instructions.md # LLM instructions (1,706 lines)
│
├── config/
│ └── .assistant_instructions.md # Source of truth (edit here!)
│
├── notebooks/ # Demo notebooks
│ ├── 00c_setup_with_inline_data.py # Setup script
│ ├── 01-09_*.py # Example conversions
│ └── README.md
│
├── legacy_sas/ # Sample SAS files
│ └── *.sas
│
├── databricks.yml # Databricks Asset Bundle config
├── setup_databricks_assistant.sh # Setup Databricks Assistant with instructions
└── deploy_streamlit_app.sh # Deploy Streamlit app to Databricks Apps
Use the Streamlit App when:
- You want a self-service UI for business users
- You need batch conversions of multiple SAS files
- You want consistent, automated output
- You're demoing to customers or stakeholders
Use Databricks Assistant when:
- You're actively developing in notebooks
- You want AI help while coding (iterative approach)
- You need to ask questions about SAS patterns
- You prefer manual control over each conversion step
Use both! Many teams deploy the Streamlit app for business users while developers use Databricks Assistant for hands-on work.
The converter follows 12 critical rules validated over 3 days of production testing:
- Always cast aggregations (
.cast("double")/.cast("long")) - Always use
overwriteSchema=trueon table writes - Always import both
FandWindowfor PySpark - Never use
SELECT *, colpattern (causesCOLUMN_ALREADY_EXISTSerrors) - Always check for division by zero
- Always use 3-level namespace (
catalog.schema.table) - Plus 6 more...
PySpark Mode:
- Pure DataFrame API
- Best for complex transformations
- Auto-overwrites duplicate columns
SQL Mode:
- Pure Databricks SQL
- Familiar syntax for SQL users
- Includes warnings about common pitfalls
Uses Claude Sonnet 4.5 configured in dashboard/app.yaml:
- databricks-claude-sonnet-4-5 # Best quality for SAS conversionsThis framework includes comprehensive custom instructions for Databricks Assistant:
Setup:
./setup_databricks_assistant.shThis deploys .assistant_instructions.md to your workspace folder (/Workspace/Users/your.email/.assistant_instructions.md).
Usage in Notebooks:
- Open any Databricks notebook
- Use Databricks Assistant (AI helper)
- Paste your SAS code and ask: "Convert this to PySpark"
- Assistant uses your custom instructions automatically
Benefits:
- Context-aware conversions based on your patterns
- Interactive Q&A about SAS → PySpark/SQL
- Learns from your custom rules and domain knowledge
- Works across all notebooks in your workspace
Learn more about custom instructions
- ARCHITECTURE.md - System design and data flow
- DEMO_SCRIPT.md - Demo walkthrough
- DEPLOYMENT_GUIDE.md - Detailed deployment instructions
- AUTOMATED_TESTING_GUIDE.md - Testing procedures
- Upload SAS File: Click "Browse files" or paste code
- Configure: Set source/target catalogs and schemas
- Select Mode: Choose PySpark or SQL output
- Convert: Click "Convert to Databricks"
- Download: Save as
.pynotebook file - Deploy: Upload to Databricks workspace
Critical: Always edit the source file, then deploy!
# 1. Edit source
vim config/.assistant_instructions.md
# 2a. Deploy to Streamlit app (auto-syncs to dashboard/config/)
./deploy_streamlit_app.sh
# OR
# 2b. Deploy to workspace for Databricks Assistant
./setup_databricks_assistant.sh
# Done! Instructions deployedChoose based on your use case:
# Option 1: Deploy Streamlit App (~30 seconds)
./deploy_streamlit_app.sh
# Use for: Automated UI-based conversions
# Option 2: Setup Databricks Assistant (~10 minutes)
./setup_databricks_assistant.sh
# Use for: Manual notebook conversions with AI AssistantAfter deploying the Streamlit app, you need to grant it access to Unity Catalog volumes. The deployment script will remind you to do this.
Step 1: Get the grant commands (with your app's UUID auto-detected)
./show_grant_commands.shStep 2: Copy the SQL output and run in Databricks SQL Editor
Example output (your UUID will be different):
-- Grant permissions to app service principal
GRANT USE CATALOG ON CATALOG payer_dev TO `e9dfbf80-9204-43f4-9758-41b204defc1c`;
GRANT USE SCHEMA ON SCHEMA payer_dev.sas_migration TO `e9dfbf80-9204-43f4-9758-41b204defc1c`;
GRANT READ VOLUME ON VOLUME payer_dev.sas_migration.legacy_sas TO `e9dfbf80-9204-43f4-9758-41b204defc1c`;
GRANT SELECT ON CATALOG payer_dev TO `e9dfbf80-9204-43f4-9758-41b204defc1c`;Why manual? Unity Catalog permission grants require specific admin privileges that may not be available to automation scripts. This one-time step (takes 30 seconds) ensures the app can access volumes containing SAS files.
Verify: Check Catalog Explorer → payer_dev.sas_migration.legacy_sas → Permissions tab to confirm the app has access.
# For Streamlit app
databricks bundle deploy --profile DEFAULT_azure
databricks apps deploy sas-converter --profile DEFAULT_azure
# For Databricks Assistant
databricks workspace import --overwrite \
--file config/.assistant_instructions.md \
/Workspace/Users/[email protected]/.assistant_instructions.mdenv:
- name: "DATABRICKS_HOST"
value: "your-workspace.azuredatabricks.net"
- name: "SERVING_ENDPOINT_NAME"
value: "databricks-claude-sonnet-4-5"
- name: "MAX_TOKENS"
value: "8000"See config/.assistant_instructions.md for:
- Conversion patterns
- Healthcare payer-specific rules
- SAS function mappings
- Error handling strategies
After deployment:
- Test PySpark conversion (should include
overwriteSchema=true) - Test SQL conversion (should show warning about duplicate columns)
- Verify imports work (
from file_handler import ...) - Check notebooks run successfully
- Import errors after restructure: Verify
file_handler.pyandutils.pyare indashboard/root - Rate limit errors: Check Foundation Model API quotas in Databricks workspace
- COLUMN_ALREADY_EXISTS errors: Read SQL notebook warning, use
SELECT * REPLACE - DELTA_FAILED_TO_MERGE_FIELDS: Add
overwriteSchema=trueto table writes
# Check app logs
databricks apps logs sas-converter --profile DEFAULT_azure
# Check bundle status
databricks bundle validateThis project uses comprehensive instruction files for optimal LLM conversion. You can:
- Use as-is: The included
.assistant_instructions.mdfiles work out of the box - Customize: Copy
.assistant_instructions.mdto your own environment tracking - Extend: Add your own industry-specific rules and patterns
Note: The authors use centralized configuration management for personal development, but this is optional.
This is production code from a 3-day development sprint with comprehensive lessons learned. Key improvements validated:
- 12 critical conversion rules
- Claude Sonnet 4.5 integration
- Comprehensive error prevention
- Healthcare payer-specific patterns
Contributions welcome! Please test thoroughly before submitting PRs.
MIT License - See LICENSE file for details
For issues or questions:
- Review the comprehensive
.assistant_instructions.mdfor conversion patterns - Check troubleshooting section above for common issues
- Test with
./deploy_streamlit_app.shfor quick app iterations - Or use
./setup_databricks_assistant.shfor notebook-based conversions - Open an issue on GitHub with details and error logs
- v2.4 (Current): Pure SQL mode with warnings, Claude Sonnet 4.5 integration
- v2.3: Hybrid mode implementation
- v2.2: Comprehensive validation rules
- v2.1: Initial production release
Built with Databricks Foundation Models | Optimized for Healthcare Payers