Skip to content
Open
Changes from 1 commit
Commits
Show all changes
8966 commits
Select commit Hold shift + click to select a range
f3d353c
Fix notebook error format
DevinTDHa Mar 14, 2025
05000ab
Merge pull request #14242 from JohnSnowLabs/SPARKNLP-1006-Implement-OLMo
maziyarpanahi Mar 16, 2025
6d71770
Merge branch 'release/600-release-candidate' into SPARKNLP-1060-Imple…
maziyarpanahi Mar 16, 2025
44fb92a
Merge pull request #14444 from JohnSnowLabs/SPARKNLP-1060-Implement-P…
maziyarpanahi Mar 16, 2025
f33ce00
Merge branch 'release/600-release-candidate' into SPARKNLP-1033-Imple…
maziyarpanahi Mar 16, 2025
7b65030
Merge pull request #14450 from JohnSnowLabs/SPARKNLP-1033-Implement-L…
maziyarpanahi Mar 16, 2025
92c7e12
Merge branch 'release/600-release-candidate' into SPARKNLP-1032-CoHere
maziyarpanahi Mar 16, 2025
2c867de
Merge pull request #14457 from JohnSnowLabs/SPARKNLP-1032-CoHere
maziyarpanahi Mar 16, 2025
39ed5e7
Merge branch 'release/600-release-candidate' into SPARKNLP-1077-Imple…
maziyarpanahi Mar 16, 2025
c31306f
Merge pull request #14474 from JohnSnowLabs/SPARKNLP-1077-Implementin…
maziyarpanahi Mar 16, 2025
5417d91
updating python and scala model names (#14488)
ahmedlone127 Mar 16, 2025
b35c90c
Merge pull request #14489 from JohnSnowLabs/feature/SPARKNLP-1102-Add…
maziyarpanahi Mar 16, 2025
e7a79fb
Merge pull request #14491 from JohnSnowLabs/feature/SPARKNLP-1103-Add…
maziyarpanahi Mar 16, 2025
dd57c97
Merge pull request #14492 from JohnSnowLabs/feature/SPARKNLP-1105-Imp…
maziyarpanahi Mar 16, 2025
d7e2851
Merge branch 'release/600-release-candidate' into feature/SPARKNLP-11…
maziyarpanahi Mar 16, 2025
cbbca68
Merge pull request #14493 from JohnSnowLabs/feature/SPARKNLP-1106-Imp…
maziyarpanahi Mar 16, 2025
06ef557
Merge branch 'release/600-release-candidate' into feature/SPARKNLP-11…
maziyarpanahi Mar 16, 2025
7bd3ca0
Merge pull request #14495 from JohnSnowLabs/feature/SPARKNLP-1107-Imp…
maziyarpanahi Mar 16, 2025
c737a27
Merge branch 'release/600-release-candidate' into feature/SPARKNLP-11…
maziyarpanahi Mar 16, 2025
0420d04
Merge pull request #14497 from JohnSnowLabs/feature/SPARKNLP-1108-Imp…
maziyarpanahi Mar 16, 2025
2b363e2
Merge branch 'release/600-release-candidate' into feature/SPARKNLP-10…
maziyarpanahi Mar 16, 2025
3999409
Merge pull request #14499 from JohnSnowLabs/feature/SPARKNLP-1098-Add…
maziyarpanahi Mar 16, 2025
7673843
Merge branch 'release/600-release-candidate' into SPARKNLP-1078-Imple…
maziyarpanahi Mar 16, 2025
6283a8f
Merge pull request #14502 from JohnSnowLabs/SPARKNLP-1078-Implement-L…
maziyarpanahi Mar 16, 2025
6194f03
Merge branch 'release/600-release-candidate' into feature/SPARKNLP-10…
maziyarpanahi Mar 16, 2025
3def1a0
Merge pull request #14505 from DevinTDHa/feature/SPARKNLP-1079-AutoGG…
maziyarpanahi Mar 16, 2025
e4f1961
Merge pull request #14510 from JohnSnowLabs/Fixing-MXBAI-Embedding-no…
maziyarpanahi Mar 16, 2025
a9d7980
SPARKNLP-1109 Adding Extractor to Sparknlp (#14519)
danilojsl Mar 16, 2025
94290cc
Merge pull request #14524 from JohnSnowLabs/feature/SPARKNLP-1113-Add…
maziyarpanahi Mar 16, 2025
d807b49
Merge branch 'release/600-release-candidate' into SPARKNLP-1088-Imple…
maziyarpanahi Mar 16, 2025
6b80b40
Merge pull request #14532 from JohnSnowLabs/SPARKNLP-1088-Implement-D…
maziyarpanahi Mar 16, 2025
39e60e3
Merge pull request #14533 from DevinTDHa/bug/gguf-embeddings-context
maziyarpanahi Mar 16, 2025
0dc22eb
Adding missing bracket in SparkNLPReader and formatting some files
danilojsl Mar 17, 2025
311f988
Adding misssing return dataframe for PDF reader in Python
danilojsl Mar 18, 2025
9e7c2fc
Updating reader notebooks
danilojsl Mar 19, 2025
8ba1d4c
add janus to resourcedownloader
prabod Mar 27, 2025
3ca997b
update to use pretrained model
prabod Mar 27, 2025
7d48e2f
Bump version [run doc]
maziyarpanahi Apr 24, 2025
77809b8
Update VisionEncoderDecoder.scala (#14553)
ahmedlone127 Apr 24, 2025
57c3211
use ubuntu-latest for docs [run doc]
maziyarpanahi Apr 24, 2025
b005a7d
Merge branch 'release/600-release-candidate' of https://github.com/Jo…
maziyarpanahi Apr 24, 2025
fe66723
fixing name (#14554)
ahmedlone127 Apr 24, 2025
c3baa51
run doc [run doc]
maziyarpanahi Apr 24, 2025
518019c
update build yaml [run doc]
maziyarpanahi Apr 27, 2025
f914289
add missing sbt install [run doc]
maziyarpanahi Apr 27, 2025
308dc87
add sbt deps [run doc]
maziyarpanahi Apr 27, 2025
5c888c0
fix bad docstring [run doc]
maziyarpanahi Apr 27, 2025
17223a7
Update Scala and Python APIs
actions-user Apr 27, 2025
a95c2b6
Models hub (#14557)
maziyarpanahi Apr 28, 2025
7f160f3
release conda 6.0.0 [skip test]
maziyarpanahi Apr 28, 2025
3fce83f
Merge pull request #14534 from JohnSnowLabs/release/600-release-candi…
maziyarpanahi Apr 28, 2025
17e8c4c
Update master with models_hub (#14574)
DevinTDHa May 14, 2025
29b5952
[SPARKNLP-1164] Updating python 3.7 to 3.8 for jobs spark34 and spark…
danilojsl May 5, 2025
3b8b62e
[SPARKNLP-1175] colab_setup.sh: Update default PySpark version to 3.4.4
DevinTDHa May 5, 2025
03c775d
[SPARKNLP-1177] Solving typo in Python wrapper for RoBertaForMultiple…
danilojsl May 6, 2025
bc3f105
[SPARKNLP-1151] Update Apple Silicon Installation
DevinTDHa May 7, 2025
4e44281
[SPARKNLP-1115] Introducing SmolVLM (#14552)
prabod May 12, 2025
95207ef
Fixing typos in notebooks (#14570)
ahmedlone127 May 12, 2025
256ecfc
Default assignee for issue templates
DevinTDHa May 8, 2025
79f474d
AutoGGUFVisionModel notebook minor fixes [skip test]
DevinTDHa May 12, 2025
7462852
[SPARKNLP-1121] Introducing PaliGemma (#14551)
prabod May 12, 2025
81285b4
handle the empty image edge case
prabod May 12, 2025
008adac
update padding tests to c,h,w format
prabod May 12, 2025
7c9727e
[SPARKNLP-1124] Introducing Gemma 3 (#14556)
prabod May 12, 2025
a25e481
[SPARKNLP-1158] Adding Parameters Options to PDF Reader (#14562)
danilojsl May 12, 2025
bbf9595
[SPARKNLP-1179] Bump Version
DevinTDHa May 13, 2025
c57ff8c
[SPARKNLP-1179] Bump Version [skip test]
DevinTDHa May 13, 2025
d1b0fac
Updating demo notebook with new PDF parameter options
danilojsl May 13, 2025
d3ab1e7
BartTransformer Notebook (#14572) [skip test]
ahmedlone127 May 14, 2025
62255e3
Update CHANGELOG [run doc]
DevinTDHa May 13, 2025
e89f8ec
Update Scala and Python APIs
actions-user May 14, 2025
f6d72d0
Update CHANGELOG [skip test]
DevinTDHa May 14, 2025
c3627a2
Update conda meta.yaml for 6.0.1 [skip test]
DevinTDHa May 15, 2025
520d151
Merge Models Hub into Master (#14588)
DevinTDHa May 28, 2025
fb8d2f5
AutoGGUFModel: fix type annotations on python side
DevinTDHa May 20, 2025
9533d68
[SPARKNLP-1123] Introducing InternVL (#14578)
prabod May 23, 2025
d547baa
[SPARKNLP-1113] Adding Partition feature
danilojsl Mar 17, 2025
324acb6
[SPARKNLP-1118] Adding headers, ssl-verify, request timeout, page bre…
danilojsl Mar 31, 2025
176f306
[SPARKNLP-1116] Adding groupBrokenParagraphs option
danilojsl Apr 4, 2025
d053f3d
[SPARKNLP-1116] Adding includeSlideNotes option
danilojsl Apr 7, 2025
d6d54f6
[SPARKNLP-1116] Adding findSubtable option
danilojsl Apr 8, 2025
58241e1
[SPARKNLP-1116] Adding findSubtable option in SparkNLPReader
danilojsl Apr 8, 2025
190748b
[SPARKNLP-1116] Renaming findSubtable to appendCells option in SparkN…
danilojsl Apr 8, 2025
66da5d8
[SPARKNLP-1116] Handling headers null issue in SparkNLPReader for Pyt…
danilojsl Apr 10, 2025
520f0a8
[SPARKNLP-1116] Refactoring parameters spark-nlp reader getters to ma…
danilojsl Apr 11, 2025
4eb5350
[SPARKNLP-1116] Adding Partitioning demo notebook
danilojsl Apr 30, 2025
18d17b3
[SPARKNLP-1174] Adding PartitionTransformer
danilojsl May 5, 2025
9f48048
[SPARKNLP-1174] Adding missing unit tests in readers
danilojsl May 14, 2025
d629ff6
[SPARKNLP-1174] Moving PDF parameters to HasPdfProperties
danilojsl May 14, 2025
d94999c
[SPARKNLP-1174] Adding validation for partition URL content
danilojsl May 16, 2025
83a59bd
[SPARKNLP-1174] Formatting modified files
danilojsl May 16, 2025
4bcd222
[SPARKNLP-1174] Fix reading as text file content
danilojsl May 24, 2025
6771661
[SPARKNLP-1174] Adding PartitionTransformer demo notebook [skip test]
danilojsl May 24, 2025
f899ccc
[SPARKNLP-1174] Updates PartitionTransformer demo notebook [skip test]
danilojsl May 24, 2025
029341a
[SPARKNLP-1174] Updates PartitionTransformer file link [skip test]
danilojsl May 24, 2025
8f16acf
Documentation for SparkNLP Readers and Partition class (#14581)
paulamib123 May 26, 2025
8f89239
[SPARKNLP-1131] - Introducing Florance-2 (#14585)
prabod May 26, 2025
9d12673
Update Partiton demo notbook [skip test]
danilojsl May 26, 2025
ac85a15
Updating Partiton demo notbook [skip test]
danilojsl May 26, 2025
f26054b
Docs: Fix various Issues [skip test]
DevinTDHa May 26, 2025
8a6ea08
Docs: add versioning handling to jekyll [skip test]
DevinTDHa May 26, 2025
fa849e9
Adjust versioning for python and scala [skip test]
DevinTDHa May 26, 2025
30e01e2
Bump Version to 6.0.2 [skip test] [run doc]
DevinTDHa May 26, 2025
780b7a8
Update Scala and Python APIs
actions-user May 28, 2025
4d395c3
Merge Models Hub into Master (#14601)
DevinTDHa Jun 11, 2025
e0bcf88
[SPARKNLP-1138] Adding basic chunking to partition (#14593)
danilojsl May 29, 2025
b91ca43
[SPARKNLP-1163] Adding title chunking strategy (#14594)
danilojsl Jun 2, 2025
5954a57
[SPARKNLP-1125] Adding partition with chunk demo notebook [skip test]
danilojsl Jun 6, 2025
966199f
[SPARKNLP-1125] Fixs partition unit test in python
danilojsl Jun 6, 2025
62f267a
[SPARKNLP-1125] Adding notebook example with RAG showcase
danilojsl Jun 6, 2025
4fa64af
Update SparkNLP_PowerPoint_Reader_Demo.ipynb
thec0dewriter May 29, 2025
f1dd545
scala api for E5V
prabod Jun 9, 2025
15a4cfa
python api for e5v
prabod Jun 9, 2025
54c2479
Add documentation for E5VEmbeddings, detailing usage for multimodal e…
prabod Jun 9, 2025
bc94f90
add resource downloader
prabod Jun 9, 2025
0bd1d2b
add notebook and documentation
prabod Jun 10, 2025
d0c180b
[SPARKNLP-1119] Adding XML reader
danilojsl Jun 9, 2025
b53d541
[SPARKNLP-1119] Adding documentation for XML reader [skip test]
danilojsl Jun 9, 2025
472a236
scalafmt
DevinTDHa Jun 11, 2025
8d840eb
Bump Version [skip test]
DevinTDHa Jun 11, 2025
072e99b
Changelog [run doc] [skip test]
DevinTDHa Jun 11, 2025
e981233
Update Scala and Python APIs
actions-user Jun 11, 2025
2acc77c
Merge Models Hub into Master (#14614)
DevinTDHa Jun 30, 2025
4cfef88
[SPARKNLP-1161] Adding features to PDF Reader (#14596)
danilojsl Jun 23, 2025
7538f7e
[SPARKNLP-1086] Introducing DataFrameOptimizer (#14607)
danilojsl Jun 23, 2025
3646b4e
[SPARKNLP-282] Introducing MiniLMEmbeddings (#14610)
prabod Jun 24, 2025
6b0e620
scalafmt
DevinTDHa Jun 24, 2025
1d5bf1c
Bump Version [run doc]
DevinTDHa Jun 24, 2025
99540a4
Update Scala and Python APIs
actions-user Jun 24, 2025
bd021b7
Remove duplicate notebook
DevinTDHa Jun 24, 2025
78de417
switch release to maven central
DevinTDHa Jun 27, 2025
0d064a7
revert to non sonatype bundle release
DevinTDHa Jun 27, 2025
9a16527
Fix Spark NLP example notebooks (#14620)
AbdullahMubeenAnwar Jul 7, 2025
77f995d
Merge Models hub (#14622)
DevinTDHa Jul 10, 2025
1d0aad7
[SPARKNLP-1215] Updating support for Microsoft Fabric to allow downlo…
danilojsl Jul 1, 2025
7d6aab4
[SPARKNLP-1213] Introducing MarkdownReader (#14618)
danilojsl Jul 7, 2025
5c11366
Fix default pretrained gemma3 model
DevinTDHa Jul 7, 2025
305f584
Bump Version [run doc]
DevinTDHa Jul 7, 2025
9d7709a
Update Scala and Python APIs
actions-user Jul 7, 2025
d6f2ce5
change sbt version and remove redundant sbt plugin [skip test]
DevinTDHa Jul 9, 2025
a0eb537
bump conda version [skip test]
DevinTDHa Jul 10, 2025
cc7e6a5
Merge Model Hub (#14635)
DevinTDHa Jul 23, 2025
624311d
[SPARKNLP-1262] Update HuggingFace_OpenVINO_in_Spark_NLP_Qwen2VL.ipy…
AbdullahMubeenAnwar Jul 23, 2025
8e8b76e
[SPARKNLP-1189] Introducing Phi4 (#14606)
prabod Jul 23, 2025
a104d9a
[SPARKNLP-1235] Adding CSV Reader
danilojsl Jul 4, 2025
c30ecce
[SPARKNLP-1259] Enhancing HTMLReader parsing capabilities
danilojsl Jul 15, 2025
e891e11
[SPARKNLP-1259] Adding sentence metadata to TextReader
danilojsl Jul 16, 2025
c929d8c
[SPARKNLP-1259] Introducing Reader2Doc Annotator
danilojsl Jul 18, 2025
eacc232
[SPARKNLP-1259] Adding XML support to Reader2Doc
danilojsl Jul 20, 2025
77ee026
[SPARKNLP-1259] Adding Reader2Doc documentation
danilojsl Jul 20, 2025
12a034d
[SPARKNLP-1259] Adding missing file for readers tests
danilojsl Jul 20, 2025
4257b46
[SPARKNLP-1259] Adding Reader2Doc demo notebook
danilojsl Jul 21, 2025
70191a6
[SPARKNLP-1259] Adding slow mark for URLs readers tests
danilojsl Jul 21, 2025
f2be633
[SPARKNLP-1259] Adjust doc
DevinTDHa Jul 23, 2025
24eb4f5
[SPARKNLP-1194] Upgrade jsl-llamacpp to newest version (#14633)
DevinTDHa Jul 23, 2025
60fbc62
Using larger github runner + Move models.json to s3 bucket [skip-test…
KshitizGIT Jul 23, 2025
99c3910
Setting default explodeDocs to false in Reader2Doc
danilojsl Jul 23, 2025
8b175a4
Bump Version [run doc] [skip test]
DevinTDHa Jul 23, 2025
6e34a6b
Update Scala and Python APIs
actions-user Jul 23, 2025
b5b4381
Update python/conda to 6.1.0 [skip test]
DevinTDHa Jul 23, 2025
7a3d29f
bump docs version to 6.1.0
DevinTDHa Aug 1, 2025
3283dd2
Merge Models hub (#14645)
DevinTDHa Aug 5, 2025
fa091f7
[SPARKNLP-1183] openVINO: added hyperthreading related options (#14641)
prabod Aug 4, 2025
277f433
[SPARKNLP-1260] Adding non-asci and xml support to MarkdownReader
danilojsl Jul 21, 2025
435195a
[SPARKNLP-1260] Introducing Reader2Table Annotator
danilojsl Jul 26, 2025
d627c6e
[SPARKNLP-1260] Adding outputformat parameter and demo notebook
danilojsl Jul 28, 2025
a1c6e52
[SPARKNLP-1260] Adding support for mixed files to Reader2Doc and Read…
danilojsl Jul 31, 2025
595e489
[SPARKNLP-1260] Updating demo notebooks to Reader2Doc and Reader2Table
danilojsl Jul 31, 2025
4994aa3
[SPARKNLP-1260] Fix python test files path
danilojsl Aug 1, 2025
cb0a227
bump OV version
DevinTDHa Aug 5, 2025
ce16df2
[SPARKNLP-1267] [SPARKNLP-1268] [SPARKNLP-1272] llama.cpp Engine upgr…
DevinTDHa Aug 5, 2025
7025e1d
Bump Version [run doc]
DevinTDHa Aug 5, 2025
cb2c83e
Update Scala and Python APIs
actions-user Aug 5, 2025
d46ae3a
Delete docs/_posts/AbdullahMubeenAnwar/2025-07-18-nuextract_2.0_2B_en.md
AbdullahMubeenAnwar Aug 6, 2025
7e3b0cc
Models Hub (#14654)
DevinTDHa Aug 20, 2025
f52be16
[SPARKNLP-1280] Python side: fix pretrained model AutoGGUFVision
DevinTDHa Aug 8, 2025
883a343
[SPARKNLP-1284] Adjust tmp jsl-llamacpp version
DevinTDHa Aug 18, 2025
713c3a0
Allow protocol prepended paths for GGUF model saving and loading
DevinTDHa Aug 18, 2025
335bf1b
[SPARKNLP-1256] - Introducing AutoGGUFReranker (#14649)
prabod Aug 19, 2025
71525de
Bump Version [run doc]
DevinTDHa Aug 20, 2025
f4df856
Update Scala and Python APIs
actions-user Aug 20, 2025
d197f9b
[SPARKNLP-1244] NerDLGraphChecker
DevinTDHa Aug 26, 2025
0e14db4
searchForSuitableGraph protected function
DevinTDHa Aug 28, 2025
32b9224
[SPARKNLP-1282] Adding improvements to Reader2Doc
danilojsl Aug 20, 2025
2beca05
[SPARKNLP-1282] Minor doc changes
danilojsl Aug 20, 2025
f6ba0d1
[SPARKNLP-1286] GGUFRankingFinisher (#14653)
prabod Sep 1, 2025
6e44010
Bump Version [run doc]
DevinTDHa Sep 1, 2025
0491594
Update Scala and Python APIs
actions-user Sep 1, 2025
4d3aca1
[SPARKNLP-1261] Adding to Reader2Image annotator
danilojsl Aug 27, 2025
7dff6f9
[SPARKNLP-1261] Adding support to mix content in Reader2Image
danilojsl Aug 31, 2025
76cb666
[SPARKNLP-1261] Adding tests and demo notebook to Reader2Image
danilojsl Aug 31, 2025
ae2206c
[SPARKNLP-1261] Updating to right version
danilojsl Aug 31, 2025
0929c7d
[SPARKNLP-1261] Adding support to reading images for emails
danilojsl Sep 3, 2025
bb2d90e
[SPARKNLP-1261] Adding support to reading images to PDF and MSOffice …
danilojsl Sep 20, 2025
8fe9fdc
[SPARKNLP-1261] Refactoring Python wrappers for readers
danilojsl Sep 21, 2025
e047249
[SPARKNLP-1261] Adding backward pyspark compatibility for ner_dl tests
danilojsl Sep 22, 2025
095d711
[SPARKNLP-1261] Adding slow tag to python Reader2Image test
danilojsl Sep 22, 2025
efcc1c2
[SPARKNLP-1261] Adding import for python < 3.4
danilojsl Sep 22, 2025
e71c33f
Commenting flaky test
danilojsl Sep 22, 2025
7fc56b0
[skip test] Adding demo notebook for Reader2Image
danilojsl Sep 23, 2025
d4e84d5
Bump Version [run doc]
danilojsl Sep 23, 2025
7f40ec5
[SPARKNLP-1291] Adding support fort input string column on readers
danilojsl Sep 30, 2025
b76c3e0
[SPARKNLP-1291] Removing unnecessary tests
danilojsl Sep 30, 2025
04dfd33
[SPARKNLP-1292] Adding fault-tolerance support when reading malformed…
danilojsl Oct 8, 2025
b8a3de4
[SPARKNLP-1290] Introducing ReaderAssembler Annotator (#14668)
danilojsl Oct 8, 2025
60aaddd
[SPARKNLP-1296] Improve AutoGGUF FallbackReader (#14667)
C-K-Loan Oct 8, 2025
16b47be
Bump Version [run doc]
DevinTDHa Oct 9, 2025
b193c44
Update Scala and Python APIs
actions-user Oct 9, 2025
b827818
Models Hub Update
AbdullahMubeenAnwar Oct 9, 2025
608e3fe
[SPARKNLP-1288] AutoGGUF close model
DevinTDHa Oct 6, 2025
2f2b9ad
[SPARKNLP-1283] Add remove thinking flag
DevinTDHa Oct 14, 2025
15546f8
[SPARKNLP-1293] Adding extract entities to EntityRuler
danilojsl Oct 15, 2025
79cbbfb
[SPARKNLP-1293] Adding auto mode to DocumentNormalizer and EntityRuler
danilojsl Oct 17, 2025
f0e7e93
[SPARKNLP-1293] Adding unit tests and python wrapper parameters for …
danilojsl Oct 17, 2025
d4a1610
[SPARKNLP-1293] Add FastTest tag to Scala tests
DevinTDHa Oct 20, 2025
0b4a24c
[SPARKNLP-1299] Add Hierarchical Element Identification to HTMLReader
danilojsl Oct 18, 2025
bcfa0e4
[SPARKNLP-1299] Include metadata to Sentence Detectors
danilojsl Oct 19, 2025
efda0a3
[SPARKNLP-1299] Adding python test
danilojsl Oct 20, 2025
553c020
[SPARKNLP-1300] changing token sequence in warmup test (#14677)
ahmedlone127 Oct 21, 2025
bd2a859
[SPARKNLP-1149] Update documentation (#14673)
AbdullahMubeenAnwar Oct 21, 2025
b6a75ff
Bump Version [run doc]
DevinTDHa Oct 21, 2025
91a92e2
Update Scala and Python APIs
actions-user Oct 22, 2025
17ad91c
restore Gemfile
DevinTDHa Oct 22, 2025
3bdb5af
Restore previous doc dependencies
DevinTDHa Oct 22, 2025
fe1fb82
Fix google colab link on Readers Notebooks
danilojsl Oct 22, 2025
554735a
[SPARKNLP-1306] Adding hierarchical support for PDF, MS Word and Mark…
danilojsl Oct 27, 2025
3b9acda
Add Java installation to Colab setup script
AbdullahMubeenAnwar Nov 3, 2025
4ea91e0
[SPARKNLP-1297] Improve performance of prepareBatchWordEmbeddings
DevinTDHa Oct 21, 2025
26e263f
[SPARKNLP-1297] Improve DatasetHelper.randomize implementation
DevinTDHa Oct 28, 2025
5693c4f
[SPARKNLP-1297] NerDLGraphChecker fills label column metadata for gra…
DevinTDHa Oct 30, 2025
b1ce981
[SPARKNLP-1297] NerDLApproach skips dataset params extraction if NerD…
DevinTDHa Oct 30, 2025
cc37df2
[SPARKNLP-1297] NerDLGraphChecker model fills metadata python side
DevinTDHa Oct 30, 2025
a02eb5e
[SPARKNLP-1297] Cache by default to reduce processing
DevinTDHa Oct 28, 2025
afa2c20
[SPARKNLP-1297] Use ArrayBuffer for batch creation
DevinTDHa Oct 30, 2025
916253d
[SPARKNLP-1297] NerDLGraphChecker: Fix tests for pyspark 3.3
DevinTDHa Oct 31, 2025
70b80fd
[SPARKNLP-1297] AutoGGUF python tests fix warning
DevinTDHa Oct 31, 2025
2390a18
[SPARKNLP-1297] NerDL+GraphChecker: Move dataset length computation t…
DevinTDHa Nov 4, 2025
8537c95
Update NerDL documentation
DevinTDHa Nov 5, 2025
3479083
XML Reader and Reader2Doc Improvements (#14691)
DevinTDHa Nov 7, 2025
ad2ee11
Bump Version [run doc]
DevinTDHa Nov 7, 2025
6600098
Update Scala and Python APIs
actions-user Nov 7, 2025
de9ef32
[SPARKNLP-1309] Fix repeating tokens in WordEmbeddings
ahmedlone127 Nov 13, 2025
951331c
NerDL: Improve calculation of dataset count
DevinTDHa Nov 13, 2025
116d2a5
Fix ReaderAssemblerTest PDF test failing
DevinTDHa Nov 13, 2025
da7a6ce
Bump Version
DevinTDHa Nov 13, 2025
5a43dfc
Merge Models Hub (#14697)
DevinTDHa Nov 13, 2025
99fe94d
[SPARKNLP-1315] changing input data type for CamemBertForTokenClassif…
ahmedlone127 Dec 1, 2025
dba4495
[SPARKNLP-1317] Further NerDL Optimizations (#14699)
DevinTDHa Dec 2, 2025
aa63267
Bump Version [run doc]
DevinTDHa Dec 3, 2025
34b1ab5
Update Scala and Python APIs
actions-user Dec 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
searchForSuitableGraph protected function
  • Loading branch information
DevinTDHa committed Sep 1, 2025
commit 0e14db414e835ac693dec71f23e1a2b5eb84d2a5
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ class NerDLGraphChecker(override val uid: String)
new Param[String](this, "graphFolder", "Folder path that contain external graph files")

/** @group getParam */
private def getGraphFolder: Option[String] = get(graphFolder)
protected def getGraphFolder: Option[String] = get(graphFolder)

/** Extracts the graph hyperparameters from the training data (dataset).
*
Expand All @@ -177,7 +177,7 @@ class NerDLGraphChecker(override val uid: String)
* a tuple containing the number of labels, number of unique characters, and the embedding
* dim
*/
private def getGraphParamsDs(
protected def getGraphParamsDs(
dataset: Dataset[_],
inputCols: Array[String],
labelsCol: String): (Int, Int, Int) = {
Expand Down Expand Up @@ -219,14 +219,16 @@ class NerDLGraphChecker(override val uid: String)
(nLabels, nChars, embeddingsDim)
}

protected def searchForSuitableGraph(nLabels: Int, nChars: Int, embeddingsDim: Int): String =
NerDLApproach.searchForSuitableGraph(nLabels, embeddingsDim, nChars + 1, getGraphFolder)

override def fit(dataset: Dataset[_]): NerDLGraphCheckerModel = {
val (nLabels, nChars, embeddingsDim) =
getGraphParamsDs(dataset, $(inputCols), $(labelColumn))

// Throws exception if no suitable graph found
Try {
NerDLApproach
.searchForSuitableGraph(nLabels, embeddingsDim, nChars + 1, getGraphFolder)
searchForSuitableGraph(nLabels, nChars, embeddingsDim)
} match {
case Failure(exception: IllegalArgumentException) =>
throw new IllegalArgumentException("NerDLGraphChecker: " + exception.getMessage)
Expand Down