Skip to content

Commit 11b4ab0

Browse files
authored
feat: Add community provider plugin registry (google#182)
* feat: Add community provider plugin registry Add validated markdown table for community plugins with CI checks. Enforces alphabetical sorting, LangExtract issue links, and proper formatting. * feat: Add community provider plugin registry Add a validated markdown table for community plugins with CI checks. Enforces alphabetical sorting, LangExtract issue links, and proper formatting.
1 parent 8b54f7f commit 11b4ab0

File tree

5 files changed

+306
-1
lines changed

5 files changed

+306
-1
lines changed
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Copyright 2025 Google LLC.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
name: Validate Community Providers
16+
17+
on:
18+
pull_request:
19+
paths:
20+
- 'COMMUNITY_PROVIDERS.md'
21+
- 'scripts/validate_community_providers.py'
22+
23+
permissions:
24+
contents: read
25+
pull-requests: read
26+
27+
concurrency:
28+
group: ${{ github.workflow }}-${{ github.ref }}
29+
cancel-in-progress: true
30+
31+
jobs:
32+
validate:
33+
runs-on: ubuntu-latest
34+
steps:
35+
- uses: actions/checkout@v4
36+
37+
- name: Set up Python
38+
uses: actions/setup-python@v5
39+
with:
40+
python-version: '3.11'
41+
42+
- name: Validate table format
43+
run: |
44+
python scripts/validate_community_providers.py COMMUNITY_PROVIDERS.md

COMMUNITY_PROVIDERS.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Community Provider Plugins
2+
3+
Community-developed provider plugins that extend LangExtract with additional model backends.
4+
5+
**Supporting the Community:** Star plugin repositories you find useful and add 👍 reactions to their tracking issues to support maintainers' efforts.
6+
7+
**⚠️ Important:** These are community-maintained packages. Please review the [safety guidelines](#safety-disclaimer) before use.
8+
9+
## Plugin Registry
10+
11+
| Plugin Name | PyPI Package | Maintainer | GitHub Repo | Description | Issue Link |
12+
|-------------|--------------|------------|-------------|-------------|------------|
13+
| Example Provider | `langextract-provider-example` | [@google](https://github.com/google) | [google/langextract](https://github.com/google/langextract) | Reference implementation for creating custom providers | [#123](https://github.com/google/langextract/issues/123) |
14+
15+
<!-- ADD NEW PLUGINS ABOVE THIS LINE -->
16+
17+
## How to Add Your Plugin (PR Checklist)
18+
19+
Copy this row template, replace placeholders, and insert **above** the marker line:
20+
21+
```markdown
22+
| Your Plugin | `langextract-provider-yourname` | [@yourhandle](https://github.com/yourhandle) | [yourorg/yourrepo](https://github.com/yourorg/yourrepo) | Brief description (min 10 chars) | [#456](https://github.com/google/langextract/issues/456) |
23+
```
24+
25+
**Before submitting your PR:**
26+
- [ ] Plugin name follows `langextract-provider-<name>` convention
27+
- [ ] PyPI package is published (or will be soon) and listed in backticks
28+
- [ ] Maintainer(s) listed as GitHub profile links (comma-separated if multiple)
29+
- [ ] Repository link points to public GitHub repo
30+
- [ ] Description clearly explains what your provider does
31+
- [ ] Issue Link points to a tracking issue in the LangExtract repository
32+
- [ ] Entries are sorted alphabetically by Plugin Name
33+
34+
## Documentation
35+
36+
For detailed plugin development instructions, see the [Custom Provider Plugin Example](examples/custom_provider_plugin/README.md).
37+
38+
## Safety Disclaimer
39+
40+
Community plugins are independently developed and maintained. While we encourage community contributions, the LangExtract team cannot guarantee the safety, security, or functionality of third-party packages.
41+
42+
**Before installing any plugin, we recommend:**
43+
44+
- **Review the code** - Examine the source code and dependencies on GitHub
45+
- **Check community feedback** - Read issues and discussions for user experiences
46+
- **Verify the maintainer** - Look for active maintenance and responsive support
47+
- **Test safely** - Try plugins in isolated environments before production use
48+
- **Assess security needs** - Consider your specific security requirements
49+
50+
Community plugins are used at your own discretion. When in doubt, reach out to the community through the plugin's issue tracker or the main LangExtract discussions.

README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
- [*Romeo and Juliet* Full Text Extraction](#romeo-and-juliet-full-text-extraction)
2525
- [Medication Extraction](#medication-extraction)
2626
- [Radiology Report Structuring: RadExtract](#radiology-report-structuring-radextract)
27+
- [Community Providers](#community-providers)
2728
- [Contributing](#contributing)
2829
- [Testing](#testing)
2930
- [Disclaimer](#disclaimer)
@@ -338,6 +339,12 @@ Explore RadExtract, a live interactive demo on HuggingFace Spaces that shows how
338339

339340
**[View RadExtract Demo →](https://huggingface.co/spaces/google/radextract)**
340341

342+
## Community Providers
343+
344+
Extend LangExtract with custom model providers! Check out our [Community Provider Plugins](COMMUNITY_PROVIDERS.md) registry to discover providers created by the community or add your own.
345+
346+
For detailed instructions on creating a provider plugin, see the [Custom Provider Plugin Example](examples/custom_provider_plugin/).
347+
341348
## Contributing
342349

343350
Contributions are welcome! See [CONTRIBUTING.md](https://github.com/google/langextract/blob/main/CONTRIBUTING.md) to get started

examples/custom_provider_plugin/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -199,8 +199,8 @@ twine upload dist/*
199199
```
200200

201201
**Share with the community:**
202+
- Submit a PR to add your provider to the [Community Providers Registry](../../COMMUNITY_PROVIDERS.md)
202203
- Open an issue on [LangExtract GitHub](https://github.com/google/langextract/issues) to announce your provider and get feedback
203-
- Consider submitting a PR to add your provider to the community providers list (coming soon)
204204

205205
## Common Pitfalls to Avoid
206206

Lines changed: 204 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,204 @@
1+
# Copyright 2025 Google LLC.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
#!/usr/bin/env python3
16+
"""Validation for COMMUNITY_PROVIDERS.md plugin registry table."""
17+
18+
import os
19+
from pathlib import Path
20+
import re
21+
import re as regex_module
22+
import sys
23+
from typing import Dict, List, Tuple
24+
25+
HEADER_ANCHOR = '| Plugin Name | PyPI Package |'
26+
END_MARKER = '<!-- ADD NEW PLUGINS ABOVE THIS LINE -->'
27+
28+
# GitHub username/org and repo patterns
29+
GH_NAME = r'[-a-zA-Z0-9]+' # usernames/orgs allow hyphens
30+
GH_REPO = r'[-a-zA-Z0-9._]+' # repos allow ., _
31+
GH_USER_LINK = rf'\[@{GH_NAME}\]\(https://github\.com/{GH_NAME}\)'
32+
GH_MULTI_USER = rf'^{GH_USER_LINK}(,\s*{GH_USER_LINK})*$'
33+
34+
# Markdown link to a GitHub repo
35+
GH_REPO_LINK = rf'^\[[^\]]+\]\(https://github\.com/{GH_NAME}/{GH_REPO}\)$'
36+
37+
# Issue link must point to LangExtract repository (issues only)
38+
LANGEXTRACT_ISSUE_LINK = (
39+
r'^\[[^\]]+\]\(https://github\.com/google/langextract/issues/\d+\)$'
40+
)
41+
42+
# PEP 503-ish normalized name (loose): lowercase letters/digits with - _ . separators
43+
PYPI_NORMALIZED = r'`[a-z0-9]([\-_.]?[a-z0-9]+)*`'
44+
45+
MIN_DESC_LEN = 10
46+
47+
48+
def normalize_pypi(name: str) -> str:
49+
"""PEP 503 normalization for PyPI package names."""
50+
return regex_module.sub(r'[-_.]+', '-', name.strip().lower())
51+
52+
53+
def find_table_bounds(lines: List[str]) -> Tuple[int, int]:
54+
start = end = -1
55+
for i, line in enumerate(lines):
56+
if HEADER_ANCHOR in line:
57+
start = i
58+
elif start >= 0 and END_MARKER in line:
59+
end = i
60+
break
61+
return start, end
62+
63+
64+
def parse_row(line: str) -> List[str]:
65+
# assumes caller trimmed line
66+
parts = [c.strip() for c in line.split('|')[1:-1]]
67+
return parts
68+
69+
70+
def validate(filepath: Path) -> bool:
71+
errors: List[str] = []
72+
warnings: List[str] = []
73+
74+
content = filepath.read_text(encoding='utf-8')
75+
lines = content.splitlines()
76+
77+
start, end = find_table_bounds(lines)
78+
if start < 0:
79+
errors.append('Could not find plugin registry table header.')
80+
print_report(errors, warnings)
81+
return False
82+
if end < 0:
83+
errors.append(
84+
'Could not find end marker: <!-- ADD NEW PLUGINS ABOVE THIS LINE -->.'
85+
)
86+
print_report(errors, warnings)
87+
return False
88+
89+
rows: List[Dict] = []
90+
seen_names = set()
91+
seen_pkgs = set()
92+
93+
for i in range(start + 2, end):
94+
raw = lines[i].strip()
95+
if not raw:
96+
continue
97+
98+
if not raw.startswith('|') or not raw.endswith('|'):
99+
errors.append(
100+
f"Line {i+1}: Not a valid table row (must start and end with '|')."
101+
)
102+
continue
103+
104+
cols = parse_row(raw)
105+
if len(cols) != 6:
106+
errors.append(f'Line {i+1}: Expected 6 columns, found {len(cols)}.')
107+
continue
108+
109+
plugin, pypi, maint, repo, desc, issue_link = cols
110+
111+
# Basic presence checks
112+
if not plugin:
113+
errors.append(f'Line {i+1}: Plugin Name is required.')
114+
115+
if not re.fullmatch(PYPI_NORMALIZED, pypi):
116+
errors.append(
117+
f'Line {i+1}: PyPI package must be backticked and normalized (e.g.,'
118+
' `langextract-provider-foo`).'
119+
)
120+
121+
if not re.fullmatch(GH_MULTI_USER, maint):
122+
errors.append(
123+
f'Line {i+1}: Maintainer must be one or more GitHub handles as links '
124+
'(e.g., [@alice](https://github.com/alice) or comma-separated).'
125+
)
126+
127+
if not re.fullmatch(GH_REPO_LINK, repo):
128+
errors.append(
129+
f'Line {i+1}: GitHub Repo must be a Markdown link to a GitHub'
130+
' repository.'
131+
)
132+
133+
if not desc or len(desc) < MIN_DESC_LEN:
134+
errors.append(
135+
f'Line {i+1}: Description must be at least {MIN_DESC_LEN} characters.'
136+
)
137+
138+
# Issue link is required and must point to LangExtract repo
139+
if not issue_link:
140+
errors.append(f'Line {i+1}: Issue Link is required.')
141+
elif not re.fullmatch(LANGEXTRACT_ISSUE_LINK, issue_link):
142+
errors.append(
143+
f'Line {i+1}: Issue Link must point to a LangExtract issue (e.g.,'
144+
' [#123](https://github.com/google/langextract/issues/123)).'
145+
)
146+
147+
rows.append({
148+
'line': i + 1,
149+
'plugin': plugin,
150+
'pypi': pypi.strip('`').lower() if pypi else '',
151+
})
152+
153+
# Duplicate checks (case-insensitive and PEP 503 normalized)
154+
for r in rows:
155+
pn_key = r['plugin'].strip().casefold()
156+
pk_key = normalize_pypi(r['pypi']) if r['pypi'] else None
157+
158+
if pn_key in seen_names:
159+
errors.append(f"Line {r['line']}: Duplicate Plugin Name '{r['plugin']}'.")
160+
seen_names.add(pn_key)
161+
162+
if pk_key and pk_key in seen_pkgs:
163+
errors.append(f"Line {r['line']}: Duplicate PyPI Package '{r['pypi']}'.")
164+
if pk_key:
165+
seen_pkgs.add(pk_key)
166+
167+
# Required alphabetical sorting check
168+
sorted_by_name = sorted(rows, key=lambda r: r['plugin'].casefold())
169+
if [r['plugin'] for r in rows] != [r['plugin'] for r in sorted_by_name]:
170+
errors.append('Registry rows must be alphabetically sorted by Plugin Name.')
171+
172+
# Guardrail: discourage leaving only the example entry
173+
if len(rows) == 1 and rows[0]['plugin'].lower().startswith('example'):
174+
warnings.append(
175+
'The registry currently contains only the example row. Add real'
176+
' providers above the marker.'
177+
)
178+
179+
print_report(errors, warnings)
180+
return not errors
181+
182+
183+
def print_report(errors: List[str], warnings: List[str]) -> None:
184+
if errors:
185+
print('❌ Validation failed:')
186+
for e in errors:
187+
print(f' • {e}')
188+
if warnings:
189+
print('⚠️ Warnings:')
190+
for w in warnings:
191+
print(f' • {w}')
192+
if not errors and not warnings:
193+
print('✅ Table format validation passed!')
194+
195+
196+
if __name__ == '__main__':
197+
path = Path('COMMUNITY_PROVIDERS.md')
198+
if len(sys.argv) > 1:
199+
path = Path(sys.argv[1])
200+
if not path.exists():
201+
print(f'❌ Error: File not found: {path}')
202+
sys.exit(1)
203+
ok = validate(path)
204+
sys.exit(0 if ok else 1)

0 commit comments

Comments
 (0)