Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 3 additions & 81 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,103 +17,25 @@ jobs:
with:
fetch-depth: 0

- name: Checkout release
run: |
git fetch origin release
git checkout release

- name: Determine Version
id: get-version
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
VERSION="${GITHUB_REF#refs/tags/}"
echo "version=$VERSION" >> "$GITHUB_OUTPUT"

- name: Determine If Tag Is From Release
id: is-release
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
TAG_COMMIT=$(git rev-parse ${{ github.ref }})
BRANCHES=$(git branch -r --contains "$TAG_COMMIT")

if echo "$BRANCHES" | grep -q 'origin/release'; then
echo "should_continue=true" >> "$GITHUB_OUTPUT"
else
echo "Exiting Workflow: Tag is not from release branch"
echo "IF intending to create a release from tag follow steps under 'Creating A Release' in the README file"
echo "should_continue=false" >> "$GITHUB_OUTPUT"
fi

- name: Setup Python
if: steps.is-release.outputs.should_continue == 'true'
uses: actions/setup-python@v5
with:
python-version: '3.10'

- name: Install Poetry
if: steps.is-release.outputs.should_continue == 'true'
- name: Install poetry
run: |
curl -sSL https://install.python-poetry.org | python3 -
poetry --version
pip install poetry>=2.0

- name: Install Dependencies
if: steps.is-release.outputs.should_continue == 'true'
run: |
poetry install --no-root --with dev

- name: Update pyproject.toml Version
if: steps.is-release.outputs.should_continue == 'true'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
VERSION=${{ steps.get-version.outputs.version }}

sed -i.bak -E "s/^\s*version\s*=\s*\"[^\"]+\"/version = \"${VERSION}\"/" pyproject.toml

- name: Update Change Log
if: steps.is-release.outputs.should_continue == 'true'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
VERSION=${{ steps.get-version.outputs.version }}
REPO_URL="https://github.com/${{ github.repository }}"
RELEASE_URL="$REPO_URL/releases/tag/$VERSION"

awk -v tag="## [$VERSION]" -v url="$RELEASE_URL" '
!done && /---/ {
print $0
print ""
print tag
print url
done = 1
next
}
{ print $0 }
' changelog.md > temp.md && mv temp.md changelog.md
poetry install --with dev

- name: Build Wheel
if: steps.is-release.outputs.should_continue == 'true'
run: |
poetry build

- name: Create Release
if: steps.is-release.outputs.should_continue == 'true'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
gh release create ${{ github.ref }} dist/* --generate-notes --latest

- name: Commit Change Log and pyproject.toml Updates
if: steps.is-release.outputs.should_continue == 'true'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
VERSION=${{ steps.get-version.outputs.version }}

git config user.name "github-actions"
git config user.email "[email protected]"
git add pyproject.toml changelog.md
git commit -m "Update Version to ${{ steps.get-version.outputs.version }}"
git push origin release
16 changes: 15 additions & 1 deletion changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,21 @@ The versioning pattern is `YYYY.MM.DD.micro(a/b/{none if release})

---

## [2025.12.10.0a]
## [2025.12.10.0]

### Summary

**First Official Release!** 🎉

This release marks a significant milestone for the CFA DataOps project, providing a robust foundation for data cataloging, ETL pipelines, and reporting. It consolidates months of development into a unified, versioned package ready for broader adoption.

**Key Highlights:**
- **Unified Data Access**: `datacat` interface for seamless dataset access.
- **Automated Reporting**: `reportcat` for generating interactive HTML reports from Jupyter notebooks.
- **Robust CLI Tools**: Manage catalogs and datasets effortlessly.
- **Flexible Data Loading**: Advanced version filtering for Pandas and Polars DataFrames.

See the [Release Notes](docs/release_notes/v2025.12.10.md) for more details.

### Updated

Expand Down
91 changes: 91 additions & 0 deletions docs/release_notes/v2025.12.10.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# Release Notes - v2025.12.10

We're thrilled to announce the first official release of the **CFA DataOps** project! This milestone version `2025.12.10.0` brings a comprehensive suite of tools for data cataloging, ETL pipelines, and reporting, designed to streamline data operations within the CFA environment.

## 🚀 Highlights

* **Unified Data Access**: Seamlessly access datasets across multiple catalogs using the `datacat` interface.
* **Automated Reporting**: Generate client-side rendering interactive HTML reports from Jupyter notebooks with `reportcat`.
* **Robust CLI Tools**: Manage catalogs, datasets, stages, and versions directly from the command line.
* **Flexible Data Loading**: Load data into Pandas or Polars DataFrames with advanced version filtering and selection.
* **Azure Blob Storage Integration**: Built-in support for reading/writing raw and transformed data to Azure Blob Storage.
* **Schema Validation**: Ensure data quality with rigorous schema validation for both raw and transformed datasets.

## 🏗️ Architecture

```
┌─────────────────────────────────────────────────────────────────┐
│ CFA Data Science Ecosystem │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Cloud OPS │────>│ Data OPS │────>│ Catalogs │ │
│ │ │ │ │ │ │ │
│ │ • Compute │ │ • Datacat │ │ • Public │ │
│ │ • BLOB │ │ • Reportcat │ │ • Private │ │
│ │ • Key Vault │ │ • Ledger │ │ • Team-spec. │ │
│ │ • │ │ • Cat init │ │ • workflows │ │
│ │ │ │ • │ │ • reports │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ └────────────────────┼────────────────────┘ │
│ V │
│ ┌──────────────────┐ │
│ │ Data Scientists │ │
│ │ & Applications │ │
│ └──────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
```

## 📋 Key Features

### Data Management
* **Catalog Creation**: Initialize new dataset catalog libraries with standardized structures using `dataops_catalog_init`.
* **Multi-Catalog Support**: Install and manage multiple catalog libraries in the same Python environment.
* **Configuration-Driven ETL**: Define ETL pipelines using simple TOML configuration files.

### Data Access & Versioning
* **Version Control**: Retrieve specific data versions using timestamp-based versioning.
* **Advanced Filtering**: Use conditional logic (e.g., `>2024.12.01,<2025.08`, `latest`, `~=2024/11`) to select data versions.
* **Local Download**: Download dataset versions to your local filesystem for offline analysis.

### Reporting & Visualization
* **Jupyter Integration**: Author reports as Jupyter notebooks and convert them to interactive HTML.
* **Visualization Utilities**: Includes plotting functions for lines, points, and intervals, plus PDF report generation.

## 🗃️ Existing and Growing Catalogs

* **Public**: [https://github.com/CDCgov/cfa-catalog-pub](https://github.com/CDCgov/cfa-catalog-pub)
* **Private**: [https://github.com/cdcent/cfa-catalog-private](https://github.com/cdcent/cfa-catalog-private)

## 🛠️ Usage Examples

**List Available Datasets:**
```python
from cfa.dataops import datacat
print(datacat.__namespace_list__)
```

**Load a Dataframe with Version Filtering:**
```python
from cfa.dataops import datacat
df = datacat.public.my_dataset.load.get_dataframe(version=">2024.12.01,<2025.08")
```

**Generate a Report:**
```python
from cfa.dataops.reporting import reportcat
reportcat.examples.dataset_report_ipynb.nb_to_html_file('report.html')
```

## 📚 Documentation

For more detailed information, please refer to our comprehensive documentation:
* [Data User Guide](https://cdcgov.github.io/cfa-dataops/data_user_guide/)
* [Data Developer Guide](https://cdcgov.github.io/cfa-dataops/data_developer_guide/)
* [Managing Catalogs](https://cdcgov.github.io/cfa-dataops/managing_catalogs/)
* [Report Generation](https://cdcgov.github.io/cfa-dataops/report_generation/)
* [CLI Tools](https://cdcgov.github.io/cfa-dataops/cli_tools/)

---
*Thank you to all the contributors who made this release possible!*
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "cfa.dataops"
version = "2025.12.10.0a"
version = "2025.12.10.0"
description = "Data cataloging, ETL, modeling, verification, and validation for CFA"
authors = [
{ name = "Phil Rogers", email = "[email protected]" },
Expand Down