Skip to content

Commit aa97c4a

Browse files
authored
Improve provider plugin documentation with plugin creation script (google#144)
* Improve provider plugin documentation with plugin creation script - Add comprehensive 7-step checklist for provider creation - Create automated plugin generator script (create_provider_plugin.py) - Fix sample plugin with proper schema support and apply_schema method - Update documentation to simplify testing and add community engagement * Add links to provider plugin generator script in README files
1 parent b3c6b93 commit aa97c4a

File tree

4 files changed

+1095
-15
lines changed

4 files changed

+1095
-15
lines changed

examples/custom_provider_plugin/README.md

Lines changed: 75 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,12 @@ This example demonstrates how to create a custom provider plugin that extends La
44

55
**Note**: This is an example included in the LangExtract repository for reference. It is not part of the LangExtract package and won't be installed when you `pip install langextract`.
66

7+
**Automated Creation**: Instead of manually copying this example, use the [provider plugin generator script](../../scripts/create_provider_plugin.py):
8+
```bash
9+
python scripts/create_provider_plugin.py MyProvider --with-schema
10+
```
11+
This will create a complete plugin structure with all boilerplate code ready for customization.
12+
713
## Structure
814

915
```
@@ -133,13 +139,76 @@ result = lx.extract(
133139
# )
134140
```
135141

136-
## Creating Your Own Provider
142+
## Creating Your Own Provider - Step by Step
143+
144+
### 1. Copy and Rename
145+
```bash
146+
# Copy this example directory
147+
cp -r examples/custom_provider_plugin/ ~/langextract-myprovider/
148+
149+
# Rename the package directory
150+
cd ~/langextract-myprovider/
151+
mv langextract_provider_example langextract_myprovider
152+
```
153+
154+
### 2. Update Package Configuration
155+
Edit `pyproject.toml`:
156+
- Change `name = "langextract-myprovider"`
157+
- Update description and author information
158+
- Change entry point: `myprovider = "langextract_myprovider:MyProvider"`
159+
160+
### 3. Modify Provider Implementation
161+
Edit `provider.py`:
162+
- Change class name from `CustomGeminiProvider` to `MyProvider`
163+
- Update `@register()` patterns to match your model IDs
164+
- Replace Gemini API calls with your backend
165+
- Add any provider-specific parameters
166+
167+
### 4. Add Schema Support (Optional)
168+
Edit `schema.py`:
169+
- Rename to `MyProviderSchema`
170+
- Customize `from_examples()` for your extraction format
171+
- Update `to_provider_config()` for your API requirements
172+
- Set `supports_strict_mode` based on your capabilities
173+
174+
### 5. Install and Test
175+
```bash
176+
# Install in development mode
177+
pip install -e .
178+
179+
# Test your provider
180+
python -c "
181+
import langextract as lx
182+
lx.providers.load_plugins_once()
183+
print('Provider registered:', any('myprovider' in str(e) for e in lx.providers.registry.list_entries()))
184+
"
185+
```
186+
187+
### 6. Write Tests
188+
- Test that your provider loads and handles basic inference
189+
- Verify schema support works (if implemented)
190+
- Test error handling for your specific API
191+
192+
### 7. Publish to PyPI and Share with Community
193+
```bash
194+
# Build package
195+
python -m build
196+
197+
# Upload to PyPI
198+
twine upload dist/*
199+
```
200+
201+
**Share with the community:**
202+
- Open an issue on [LangExtract GitHub](https://github.com/google/langextract/issues) to announce your provider and get feedback
203+
- Consider submitting a PR to add your provider to the community providers list (coming soon)
204+
205+
## Common Pitfalls to Avoid
137206

138-
1. Copy this example as a starting point
139-
2. Update the provider class name and registration pattern
140-
3. Replace the Gemini implementation with your own backend
141-
4. Update package name in `pyproject.toml`
142-
5. Install and test your plugin
207+
1. **Forgetting to trigger plugin loading** - Plugins load lazily, use `load_plugins_once()` in tests
208+
2. **Pattern conflicts** - Avoid patterns that conflict with built-in providers
209+
3. **Missing dependencies** - List all requirements in `pyproject.toml`
210+
4. **Schema mismatches** - Test schema generation with real examples
211+
5. **Not handling None schema** - Provider must clear schema when `apply_schema(None)` is called (see provider.py for implementation)
143212

144213
## License
145214

examples/custom_provider_plugin/langextract_provider_example/provider.py

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,8 @@ def __init__(
6868
temperature: Sampling temperature.
6969
**kwargs: Additional parameters.
7070
"""
71+
super().__init__()
72+
7173
# TODO: Replace with your own client initialization
7274
try:
7375
from google import genai # pylint: disable=import-outside-toplevel
@@ -97,8 +99,6 @@ def __init__(
9799

98100
self._client = genai.Client(api_key=self.api_key)
99101

100-
super().__init__()
101-
102102
@classmethod
103103
def get_schema_class(cls) -> type[lx.schema.BaseSchema] | None:
104104
"""Return our custom schema class.
@@ -111,6 +111,30 @@ def get_schema_class(cls) -> type[lx.schema.BaseSchema] | None:
111111
"""
112112
return custom_schema.CustomProviderSchema
113113

114+
def apply_schema(self, schema_instance: lx.schema.BaseSchema | None) -> None:
115+
"""Apply or clear schema configuration.
116+
117+
This method is called by LangExtract to dynamically apply schema
118+
constraints after the provider is instantiated. It's important to
119+
handle both the application of a new schema and clearing (None).
120+
121+
Args:
122+
schema_instance: The schema to apply, or None to clear existing schema.
123+
"""
124+
super().apply_schema(schema_instance)
125+
126+
if schema_instance:
127+
# Apply the new schema configuration
128+
config = schema_instance.to_provider_config()
129+
self.response_schema = config.get('response_schema')
130+
self.enable_structured_output = config.get(
131+
'enable_structured_output', False
132+
)
133+
else:
134+
# Clear the schema configuration
135+
self.response_schema = None
136+
self.enable_structured_output = False
137+
114138
def infer(
115139
self, batch_prompts: Sequence[str], **kwargs: Any
116140
) -> Iterator[Sequence[lx.inference.ScoredOutput]]:

0 commit comments

Comments
 (0)