Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
2ce2399
docs(pypi): Improve README display and badge reliability
aksg87 Jul 22, 2025
4fe7580
feat: add trusted publishing workflow and prepare v1.0.0 release
aksg87 Jul 22, 2025
e696a48
Fix: Resolve libmagic ImportError (#6)
aksg87 Aug 1, 2025
5447637
docs: clarify output_dir behavior in medication_examples.md
kleeena Aug 1, 2025
9c47b34
Merge pull request #11 from google/fix/libmagic-dependency-issue
aksg87 Aug 1, 2025
175e075
Removed inline comment in medication example
kleeena Aug 2, 2025
9472099
Merge pull request #15 from kleeena/docs/update-medication_examples.md
aksg87 Aug 2, 2025
e6c3dcd
docs: add output_dir="." to all save_annotated_documents examples
aksg87 Aug 2, 2025
1fb1f1d
Merge pull request #17 from google/fix/output-dir-consistency
aksg87 Aug 2, 2025
7905f93
Fix typo in Ollama API parameter name
Mirza-Samad-Ahmed-Baig Aug 2, 2025
06afc9c
Fix security vulnerability and bugs in Ollama API integration
Mirza-Samad-Ahmed-Baig Aug 2, 2025
13fbd2c
build: add formatting & linting pipeline with pre-commit integration
aksg87 Aug 3, 2025
c8d2027
style: apply pyink, isort, and pre-commit formatting
aksg87 Aug 3, 2025
146a095
ci: enable format and lint checks in tox
aksg87 Aug 3, 2025
aa6da18
Merge pull request #24 from google/feat/code-formatting-pipeline
aksg87 Aug 3, 2025
ed65bca
Add LangExtractError base exception for centralized error handling
aksg87 Aug 3, 2025
6c4508b
Merge pull request #26 from google/feat/exception-hierarchy
aksg87 Aug 3, 2025
8b85225
fix: Remove LangFun and pylibmagic dependencies (v1.0.2)
aksg87 Aug 3, 2025
88520cc
Merge pull request #28 from google/fix/remove-breaking-dep-langfun
aksg87 Aug 3, 2025
75a6f12
Fix save_annotated_documents to handle string paths
aksg87 Aug 3, 2025
a415b94
Merge pull request #29 from google/fix-save-annotated-documents-mkdir
aksg87 Aug 3, 2025
8289b3a
feat: Add OpenAI language model support
aksg87 Aug 3, 2025
c8ef723
Merge pull request #31 from google/feature/add-oai-inference
aksg87 Aug 3, 2025
dfe8188
fix(ui): prevent current highlight border from being obscured. Chan…
tonebeta Aug 4, 2025
0d76530
Merge branch 'google:main' into fix-ollama-num-threads-typo
Mirza-Samad-Ahmed-Baig Aug 4, 2025
87c511e
feat: Add live API integration tests (#39)
aksg87 Aug 4, 2025
dc61372
Add PR template validation workflow (#45)
aksg87 Aug 4, 2025
7fc809f
Merge branch 'main' into fix-ollama-num-threads-typo
Mirza-Samad-Ahmed-Baig Aug 5, 2025
da771e6
fix: Change OllamaLanguageModel parameter from 'model' to 'model_id' …
aksg87 Aug 5, 2025
e83d5cf
feat: Add CITATION.cff file for proper software citation
aksg87 Aug 5, 2025
337beee
feat: Add Ollama integration with Docker examples and CI tests (#62)
aksg87 Aug 5, 2025
a7ef0bd
chore: Bump version to 1.0.4 for release
aksg87 Aug 5, 2025
87beb4f
build(deps): bump tj-actions/changed-files (#66)
dependabot[bot] Aug 5, 2025
db140d1
Add PR validation workflows and update contribution guidelines (#74)
aksg87 Aug 5, 2025
ed97f73
Fix custom comment in linked issue check (#77)
aksg87 Aug 5, 2025
ad1f27b
Add infrastructure file protection workflow (#76)
aksg87 Aug 5, 2025
41bc9ed
Allow maintainers to bypass community support requirement
aksg87 Aug 5, 2025
54e57db
Add manual trigger capability to validation workflows (#75)
aksg87 Aug 5, 2025
25ebc17
Fix fork PR labeling by using pull_request_target
aksg87 Aug 5, 2025
1290d63
Add workflow_dispatch trigger to CI workflow
aksg87 Aug 6, 2025
42687fc
Add secure label-based testing for fork PRs
aksg87 Aug 6, 2025
234081e
Add base_url to OpenAILanguageModel (#51)
mariano Aug 6, 2025
46b4f0d
Fix validation workflows that were skipping all checks
aksg87 Aug 6, 2025
6fb66cf
Add commit status to revalidation workflow
aksg87 Aug 6, 2025
47a251e
Fix boolean comparison in revalidation workflow
aksg87 Aug 7, 2025
b28e673
Add maintenance scripts for PR management
aksg87 Aug 7, 2025
6b02efb
Fix IPython import warnings and notebook detection (#86)
aksg87 Aug 7, 2025
e6dcc8e
Fix CI to validate PR branch formatting directly
aksg87 Aug 7, 2025
1c3c1a2
Add PR update automation workflows
aksg87 Aug 7, 2025
b60f0b2
Fix workflow formatting
aksg87 Aug 7, 2025
f888bd8
Minor changes
Mirza-Samad-Ahmed-Baig Aug 7, 2025
8659ef3
Merge branch 'fix-ollama-num-threads-typo'
Mirza-Samad-Ahmed-Baig Aug 7, 2025
ea71754
Fix chunking bug and improve test documentation (#88)
aksg87 Aug 7, 2025
82c6644
Fix: Resolve merge conflict and update docstrings in inference.py
Mirza-Samad-Ahmed-Baig Aug 7, 2025
ce0caa5
Changes
Mirza-Samad-Ahmed-Baig Aug 7, 2025
792fd3e
Merge branch 'main' into fix-ollama-num-threads-typo
Mirza-Samad-Ahmed-Baig Aug 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
style: apply pyink, isort, and pre-commit formatting
  • Loading branch information
aksg87 committed Aug 3, 2025
commit c8d2027adabb8eab8cf823e0b3b780de3085870b
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,4 @@ contact_links:
url: https://g.co/vulnz
about: >
To report a security issue, please use https://g.co/vulnz. The Google Security Team will
respond within 5 working days of your report on https://g.co/vulnz.
respond within 5 working days of your report on https://g.co/vulnz.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -44,4 +44,4 @@ jobs:

- name: Run tox (lint + tests)
run: |
tox
tox
12 changes: 6 additions & 6 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,25 +31,25 @@ jobs:
id-token: write
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'

- name: Install build dependencies
run: |
python -m pip install --upgrade pip
pip install build

- name: Build package
run: python -m build

- name: Verify build artifacts
run: |
ls -la dist/
pip install twine
twine check dist/*

- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
uses: pypa/gh-action-pypi-publish@release/v1
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -51,4 +51,4 @@ docs/_build/
*.swp

# OS-specific
.DS_Store
.DS_Store
2 changes: 1 addition & 1 deletion .hgignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@
# See the License for the specific language governing permissions and
# limitations under the License.

gdm/codeai/codemind/cli/GEMINI.md
gdm/codeai/codemind/cli/GEMINI.md
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,4 @@ WORKDIR /app
RUN pip install --no-cache-dir langextract

# Set default command
CMD ["python"]
CMD ["python"]
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -352,4 +352,4 @@ For health-related applications, use of LangExtract is also subject to the

---

**Happy Extracting!**
**Happy Extracting!**
2 changes: 1 addition & 1 deletion docs/examples/longer_text_example.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,4 +171,4 @@ LangExtract combines precise text positioning with world knowledge enrichment, e

---

¹ Models like Gemini 1.5 Pro show strong performance on many benchmarks, but [needle-in-a-haystack tests](https://cloud.google.com/blog/products/ai-machine-learning/the-needle-in-the-haystack-test-and-how-gemini-pro-solves-it) across million-token contexts indicate that performance can vary in multi-fact retrieval scenarios. This demonstrates how LangExtract's smaller context windows approach ensures consistently high quality across entire documents by avoiding the complexity and potential degradation of massive single-context processing.
¹ Models like Gemini 1.5 Pro show strong performance on many benchmarks, but [needle-in-a-haystack tests](https://cloud.google.com/blog/products/ai-machine-learning/the-needle-in-the-haystack-test-and-how-gemini-pro-solves-it) across million-token contexts indicate that performance can vary in multi-fact retrieval scenarios. This demonstrates how LangExtract's smaller context windows approach ensures consistently high quality across entire documents by avoiding the complexity and potential degradation of massive single-context processing.
6 changes: 3 additions & 3 deletions docs/examples/medication_examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,8 +196,8 @@ for med_name, extractions in medication_groups.items():
lx.io.save_annotated_documents(
[result],
output_name="medical_ner_extraction.jsonl",
output_dir="."
)
output_dir="."
)

# Generate the interactive visualization
html_content = lx.visualize("medical_relationship_extraction.jsonl")
Expand Down Expand Up @@ -243,4 +243,4 @@ This example demonstrates how attributes enable efficient relationship extractio
- **Relationship Extraction**: Groups related entities using attributes
- **Position Tracking**: Records exact positions of extracted entities in the source text
- **Structured Output**: Organizes information in a format suitable for healthcare applications
- **Interactive Visualization**: Generates HTML visualizations for exploring complex medical extractions with entity groupings and relationships clearly displayed
- **Interactive Visualization**: Generates HTML visualizations for exploring complex medical extractions with entity groupings and relationships clearly displayed
2 changes: 1 addition & 1 deletion kokoro/presubmit.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,4 @@ container_properties {
xunit_test_results {
target_name: "pytest_results"
result_xml_path: "git/repo/pytest_results/test.xml"
}
}
2 changes: 1 addition & 1 deletion kokoro/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -103,4 +103,4 @@ deactivate

echo "========================================="
echo "Kokoro test script for langextract finished successfully."
echo "========================================="
echo "========================================="
7 changes: 3 additions & 4 deletions langextract/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,13 @@
# Ensure libmagic is available before langfun imports python-magic.
# pylibmagic provides pre-built binaries that python-magic needs.
try:
import pylibmagic # noqa: F401 (side-effect import)
import pylibmagic # noqa: F401 (side-effect import)
except ImportError:
pass
pass

from collections.abc import Iterable, Sequence
import os
from typing import Any, Type, TypeVar, cast
from typing import Any, cast, Type, TypeVar
import warnings

import dotenv
Expand All @@ -39,7 +39,6 @@
from langextract import schema
from langextract import visualization


LanguageModelT = TypeVar("LanguageModelT", bound=inference.BaseLanguageModel)

# Set up visualization helper at the top level (lx.visualize).
Expand Down
3 changes: 0 additions & 3 deletions langextract/inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,9 @@
from typing_extensions import override
import yaml



from langextract import data
from langextract import schema


_OLLAMA_DEFAULT_MODEL_URL = 'http://localhost:11434'


Expand Down
5 changes: 1 addition & 4 deletions langextract/io.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,12 @@
import dataclasses
import json
import os
import pathlib
from typing import Any, Iterator

import pandas as pd
import requests

import os
import pathlib
import os
import pathlib
from langextract import data
from langextract import data_lib
from langextract import progress
Expand Down
1 change: 1 addition & 0 deletions langextract/progress.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@

from typing import Any
import urllib.parse

import tqdm

# ANSI color codes for terminal output
Expand Down
4 changes: 2 additions & 2 deletions langextract/prompting.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,12 @@

import dataclasses
import json
import os
import pathlib

import pydantic
import yaml

import os
import pathlib
from langextract import data
from langextract import schema

Expand Down
1 change: 0 additions & 1 deletion langextract/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@
import enum
from typing import Any


from langextract import data


Expand Down
16 changes: 8 additions & 8 deletions langextract/visualization.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,10 @@
import html
import itertools
import json
import textwrap

import os
import pathlib
import textwrap

from langextract import data as _data
from langextract import io as _io

Expand Down Expand Up @@ -130,9 +130,9 @@
50% { text-decoration-color: #ff0000; }
100% { text-decoration-color: #ff4444; }
}
.lx-legend {
font-size: 12px; margin-bottom: 8px;
padding-bottom: 8px; border-bottom: 1px solid #e0e0e0;
.lx-legend {
font-size: 12px; margin-bottom: 8px;
padding-bottom: 8px; border-bottom: 1px solid #e0e0e0;
}
.lx-label {
display: inline-block;
Expand Down Expand Up @@ -456,12 +456,12 @@ def _extraction_sort_key(extraction):
<button class="lx-control-btn" onclick="nextExtraction()">⏭ Next</button>
</div>
<div class="lx-progress-container">
<input type="range" id="progressSlider" class="lx-progress-slider"
min="0" max="{len(extractions)-1}" value="0"
<input type="range" id="progressSlider" class="lx-progress-slider"
min="0" max="{len(extractions)-1}" value="0"
onchange="jumpToExtraction(this.value)">
</div>
<div class="lx-status-text">
Entity <span id="entityInfo">1/{len(extractions)}</span> |
Entity <span id="entityInfo">1/{len(extractions)}</span> |
Pos <span id="posInfo">{pos_info_str}</span>
</div>
</div>
Expand Down
2 changes: 1 addition & 1 deletion tests/.pylintrc
Original file line number Diff line number Diff line change
Expand Up @@ -49,4 +49,4 @@ max-branches = 15 # Multiple test conditions
good-names=i,j,k,ex,Run,_,id,ok,fd,fp,maxDiff,setUp,tearDown

# Include test-specific naming patterns
method-rgx=[a-z_][a-z0-9_]{2,50}$|test[A-Z_][a-zA-Z0-9]*$|assert[A-Z][a-zA-Z0-9]*$
method-rgx=[a-z_][a-z0-9_]{2,50}$|test[A-Z_][a-zA-Z0-9]*$|assert[A-Z][a-zA-Z0-9]*$
1 change: 1 addition & 0 deletions tests/annotation_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@

from absl.testing import absltest
from absl.testing import parameterized

from langextract import annotation
from langextract import data
from langextract import inference
Expand Down
9 changes: 6 additions & 3 deletions tests/chunking_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,12 @@

import textwrap

from absl.testing import absltest
from absl.testing import parameterized

from langextract import chunking
from langextract import data
from langextract import tokenizer
from absl.testing import absltest
from absl.testing import parameterized


class SentenceIterTest(absltest.TestCase):
Expand Down Expand Up @@ -368,7 +369,9 @@ def test_string_output(self):
)""")
document = data.Document(text=text, document_id="test_doc_123")
tokenized_text = tokenizer.tokenize(text)
chunk_iter = chunking.ChunkIterator(tokenized_text, max_char_buffer=7, document=document)
chunk_iter = chunking.ChunkIterator(
tokenized_text, max_char_buffer=7, document=document
)
text_chunk = next(chunk_iter)
self.assertEqual(str(text_chunk), expected)

Expand Down
4 changes: 2 additions & 2 deletions tests/data_lib_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,13 @@

import json

from absl.testing import absltest
from absl.testing import parameterized
import numpy as np

from langextract import data
from langextract import data_lib
from langextract import tokenizer
from absl.testing import absltest
from absl.testing import parameterized


class DataLibToDictParameterizedTest(parameterized.TestCase):
Expand Down
5 changes: 4 additions & 1 deletion tests/inference_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,15 @@
# limitations under the License.

from unittest import mock
import langfun as lf

from absl.testing import absltest
import langfun as lf

from langextract import inference


class TestLangFunLanguageModel(absltest.TestCase):

@mock.patch.object(
inference.lf.core.language_model, "LanguageModel", autospec=True
)
Expand Down
4 changes: 3 additions & 1 deletion tests/init_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,12 @@
from unittest import mock

from absl.testing import absltest
import langextract as lx

from langextract import data
from langextract import inference
from langextract import prompting
from langextract import schema
import langextract as lx


class InitTest(absltest.TestCase):
Expand Down Expand Up @@ -142,5 +143,6 @@ def test_lang_extract_as_lx_extract(

self.assertDataclassEqual(expected_result, actual_result)


if __name__ == "__main__":
absltest.main()
1 change: 1 addition & 0 deletions tests/prompting_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@

from absl.testing import absltest
from absl.testing import parameterized

from langextract import data
from langextract import prompting
from langextract import schema
Expand Down
1 change: 1 addition & 0 deletions tests/resolver_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@

from absl.testing import absltest
from absl.testing import parameterized

from langextract import chunking
from langextract import data
from langextract import resolver as resolver_lib
Expand Down
4 changes: 1 addition & 3 deletions tests/schema_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,9 @@
import textwrap
from unittest import mock




from absl.testing import absltest
from absl.testing import parameterized

from langextract import data
from langextract import schema

Expand Down
3 changes: 2 additions & 1 deletion tests/tokenizer_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,11 @@

import textwrap

from langextract import tokenizer
from absl.testing import absltest
from absl.testing import parameterized

from langextract import tokenizer


class TokenizerTest(parameterized.TestCase):

Expand Down
1 change: 1 addition & 0 deletions tests/visualization_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
from unittest import mock

from absl.testing import absltest

from langextract import data as lx_data
from langextract import visualization

Expand Down