forked from JohnSnowLabs/spark-nlp
-
Notifications
You must be signed in to change notification settings - Fork 0
Dataset & Encoder refactoring #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
a0x8o
wants to merge
8,966
commits into
alexxx-db:master
Choose a base branch
from
JohnSnowLabs:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SPARKNLP-1006: Introducing OLMo
…ment-Phi-3.5-Vision
…hi-3.5-Vision Sparknlp 1060 implement phi 3.5 vision
…ment-LLaVA-and-LLaVA-NeXT
…LaVA-and-LLaVA-NeXT SparkNLP 1033: Introducing LLAVA
SparkNLP 1032: Introducing CoHere
…menting-new-Qwen2-VL-models
…g-new-Qwen2-VL-models SparkNLP 1077- Introducing Qwen2 - VL
…ing-support-to-read-Excel-files [SPARKNLP-1102] Adding support to read Excel files
…ing-support-to-read-PowerPoint-files [SPARKNLP-1103] Adding support to read power point files
…lement-ForMultipleChoice-for-ALBERT [SPARKNLP-1105] Introducing AlbertForMultipleChoice Transformer
…06-Implement-ForMultipleChoice-for-DistilBERT
…lement-ForMultipleChoice-for-DistilBERT [SPARKNLP-1106] Introducing DistilBertForMultipleChoice Transformer
…07-Implement-ForMultipeChoice-for-RoBERTa
…lement-ForMultipeChoice-for-RoBERTa [SPARKNLP-1107] Introducing RoBertaForMultipleChoice
…08-Implement-ForMultipleChoice-for-XLMRoBERTa
…lement-ForMultipleChoice-for-XLMRoBERTa [SPARKNLP-1108] Introducing XlmRoBertaForMultipleChoice Transformer
…98-Adding-a-PDF-Reader-to-Spark-NLP
…ing-a-PDF-Reader-to-Spark-NLP [SPARKNLP-1098] Adding PDF reader support
…ment-Llama-3.2-Vision-models
…lama-3.2-Vision-models Sparknlp 1078 Introducing llama 3.2 vision models
…79-AutoGGUFVisionModel
…UFVisionModel [SPARKNLP-1079] AutoGGUFVisionModel
…tebook fixing typo in MXBAI notebook
* [SPARKNLP-1109] Adding Extractor annotator * [SPARKNLP-1109] Adding Cleaner annotator * [SPARKNLP-1109] Adding missing index parameter in python * [SPARKNLP-1109] Adding right inheritance for Cleaner in python * [SPARKNLP-1109] Adding notebooks demo for Cleaner and Extractor * [SPARKNLP-1110] Adding notebook demo for Email reader and Cleaner
…ing-support-to-enhance-read-TXT-files [SPARKNLP-1113] Adding Text Reader
…ment-DeepSeek-any-to-any-model
The script now checks for Java and installs OpenJDK 11 if not present. JAVA_HOME and PATH are also set to ensure Java is available for subsequent steps.
- Fill output array in place to reduce RAM usage
- use sortWithinPartitions instead of a custom map over partitions to not materialize rows
…LGraphChecker added metadata to dataframe
* Reader2Doc new defaults to always output single document * XMLReader improvements - doesn't output empty text anymore - Can extract tag attribute values * Reader2Doc improvements - adjusted defaults, so we always output a single large document - can specify join char with new parameter - adjusted other readers for new defaults * Reader2Doc improvements python side * ReaderAssembler: Fix failing test
* Add model 2025-04-09-sent_arabic_monomodel_monotok_en * Add model 2025-04-09-sent_schwurpert_pipeline_de * Add model 2025-04-08-wav2vec2_large_xls_r_300m_hindi_devendr_en * Add model 2025-04-08-dialogpt_medium_harry_pipeline_en * Add model 2025-04-09-gpt_2_finetuning_airaid_en * Add model 2025-04-08-mchammer_pipeline_en * Add model 2025-04-09-wav2vec2_large_xls_r_300m_kor_11385_2_en * Add model 2025-04-09-sent_bert_base_stackoverflow_comments_2m_pipeline_en * Add model 2025-04-08-shape_nato_pipeline_en * Add model 2025-04-09-burmese_awesome_wnut_model_ai_pipeline_en * Add model 2025-04-09-vit_female_age_classification_en * Add model 2025-04-09-vit_base_oxford_iiit_pets_niko132_pipeline_en * Add model 2025-04-09-koriposting_en * Add model 2025-04-09-rockdrigoma_pipeline_en * Add model 2025-04-09-vit_base_patch16_224_finetuned_cedar_en * Add model 2025-04-09-williamblakebot_pipeline_en * Add model 2025-04-09-bert_base_train_book_ent_15p_ra_en * Add model 2025-04-09-tinybert_train_book_ent_15p_en * Add model 2025-04-08-exp_w2v2t_indonesian_xlsr_53_s358_id * Add model 2025-04-08-bert_finetuned_ner_accelerate_atichets_pipeline_en * Add model 2025-04-09-brad_buchsbaum_en * Add model 2025-04-09-honeytech_pipeline_en * Add model 2025-04-09-extended_gender_classifier_en * Add model 2025-04-09-smids_1x_deit_tiny_rms_001_fold3_pipeline_en * Add model 2025-04-09-icelynjennings_pipeline_en * Add model 2025-04-09-jackposobiec_pipeline_en * Add model 2025-04-09-sent_finnish_monomodel_monotok_pipeline_en * Add model 2025-04-08-exp5_10partition_modelo_asl6000_pipeline_en * Add model 2025-04-08-output_pipeline_pt * Add model 2025-04-09-bert_finetuned_ner_huizhoucheng_en * Add model 2025-04-09-icelynjennings_en * Add model 2025-04-09-sent_tiny_mlm_glue_mnli_from_scratch_custom_tokenizer_expand_vocab_en * Add model 2025-04-09-sent_drclips_en * Add model 2025-04-09-sent_nbme_bio_clinicalbert_en * Add model 2025-04-09-finetune_model_bert_en * Add model 2025-04-09-bert_finetuned_ner_fundrais123_en * Add model 2025-04-09-filler_username_pipeline_en * Add model 2025-04-09-gpt2_chatbot_kuttersn_en * Add model 2025-04-09-musebiihi_pipeline_en * Add model 2025-04-09-disconcision_pipeline_en * Add model 2025-04-09-arxiv_classifier_debertav3_en * Add model 2025-04-08-wenger_en * Add model 2025-04-08-burmese_awesome_model_recod_en * Add model 2025-04-09-exp_w2v2t_portuguese_norwegian_pretraining_s84_pt * Add model 2025-04-09-sent_bert_base_uncased_finetuned_mol_mlm_0_3_en * Add model 2025-04-09-sent_tlm_rct_20k_large_scale_pipeline_en * Add model 2025-04-08-jen_122_pipeline_en * Add model 2025-04-09-dkulchar_pipeline_en * Add model 2025-04-09-pico8degalaleo_pipeline_en * Add model 2025-04-09-dialogpt_medium_captainprice_extended_en * Add model 2025-04-09-wav2vec2_gujarati_stt_pipeline_en * Add model 2025-04-08-smids_5x_deit_small_rms_00001_fold1_en * Add model 2025-04-09-sent_minilm_l12_h384_uncased_finetuned_imdb_en * Add model 2025-04-09-bert_suicide_detection_hk_large_nepal_bhasa_pipeline_en * Add model 2025-04-09-distilbert_base_uncased_news_sentiment_finetuned_english_en * Add model 2025-04-08-monopolyfornite_en * Add model 2025-04-08-dialogpt_small_shy_en * Add model 2025-04-09-distilbert_token_itr0_0_0001_editorials_01_03_2022_15_20_12_pipeline_en * Add model 2025-04-09-kehlani_pipeline_en * Add model 2025-04-09-burmese_awesome_humanaction_model_pipeline_en * Add model 2025-04-09-tigers_side_vit_en * Add model 2025-04-09-stp_classifier_13_1_en * Add model 2025-04-08-nepali_grammar_error_detection_20250311_1323_en * Add model 2025-04-09-mldz4shad_en * Add model 2025-04-09-exp_w2v2t_swedish_northern_sami_xlsr_53_s328_en * Add model 2025-04-09-bert_base_uncased_token_itr0_0_0001_train_essays_test_test_set_05_03_2022_05_58_31_en * Add model 2025-04-09-wav2vec2_xlsr_53_marathi_large_en * Add model 2025-04-09-hushem_5x_deit_small_adamax_0001_fold1_pipeline_en * Add model 2025-04-09-lora_toxic_comment_pipeline_en * Add model 2025-04-09-absa_turkish_bert_based_small_tr * Add model 2025-04-08-smids_1x_deit_tiny_rms_001_fold5_en * Add model 2025-04-09-wav2vec2_base_timit_demo_colab_bsen_pipeline_en * Add model 2025-04-09-bert_base_turkish_sentiment_analysis_pipeline_tr * Add model 2025-04-09-bert_base_turkish_sentiment_analysis_tr * Add model 2025-04-09-bert_base_turkish_offensive_pipeline_tr * Add model 2025-04-09-document_type_identification_en * Add model 2025-04-09-sent_bnbert_pipeline_en * Add model 2025-04-09-wav2vec2_large_xls_r_300m_tamil_colab_aakhilesh_en * Add model 2025-04-08-sent_mbert_tlm_sent_english_chinese_en * Add model 2025-04-08-pii_protection_model_pipeline_en * Add model 2025-04-09-bert_tiny_finetuned_xglue_ner_en * Add model 2025-04-08-wav2vec2_large_xls_r_300m_urdu_colab_pipeline_en * Add model 2025-04-09-sent_bert_base_uncased_issues_128_xxr_pipeline_en * Add model 2025-04-09-sent_mbert_tlm_chat_english_german_en * Add model 2025-04-09-db_slr_1_1e_en * Add model 2025-04-08-cher_pipeline_en * Add model 2025-04-09-wav2vec2_base_libir_zenodo_pipeline_en * Add model 2025-04-09-vit_epochs5_batch32_lr5e_05_size224_tiles4_seed3_q3_dropout_v2_en * Add model 2025-04-09-wav2vec2_base_test_pipeline_en * Add model 2025-04-09-lesseyecontact_en * Add model 2025-04-09-wav2vec2_base_swbd_turn_eos_long_short_utt_removed_5percent_pipeline_en * Add model 2025-04-09-micbucci_pipeline_en * Add model 2025-04-09-veganseltzer_pipeline_en * Add model 2025-04-08-dialogpt_medium_ff7_en * Add model 2025-04-09-sent_storieslm_v1_1945_pipeline_en * Add model 2025-04-09-sent_mbert_tlm_chat_english_chinese_pipeline_en * Add model 2025-04-09-dialogpt_medium_milo_en * Add model 2025-04-09-dataandme_en * Add model 2025-04-09-lumetroid_en * Add model 2025-04-09-dialogpt_medium_milo_pipeline_en * Add model 2025-04-09-bbcqos_fitslut63_kellyg_official_en * Add model 2025-04-09-stp_classifier_13_1_pipeline_en * Add model 2025-04-09-vit_base_beans_demo_v5_hwooo92_pipeline_en * Add model 2025-04-09-ridiculouscrabs_en * Add model 2025-04-08-autotrain_20_12_2022_exam_part3_2543877946_pipeline_en * Add model 2025-04-09-zemfira_en * Add model 2025-04-09-michaeltrazzi_pipeline_en * Add model 2025-04-09-absa_turkish_bert_based_small_pipeline_tr * Add model 2025-04-09-gunna_pipeline_en * Add model 2025-04-09-ourqueeningreen_pipeline_en * Add model 2025-04-09-jenslennartsson_pipeline_en * Add model 2025-04-09-sent_bottleneckbertsmall_en * Add model 2025-04-09-dialogpt_mid_hpai_en * Add model 2025-04-09-shelbythanna_en * Add model 2025-04-09-macintoxic_en * Add model 2025-04-09-square_rundi_square_rundi_second_vote_full_pic_25_age_gender_en * Add model 2025-04-09-sent_first_try_rubert_200_16_16_25ep_en * Add model 2025-04-09-postpostpostr_en * Add model 2025-04-09-richardsocher_en * Add model 2025-04-09-bert_base_german_cased_finetuned_subj_v1_pipeline_en * Add model 2025-04-09-guggersylvain_pipeline_en * Add model 2025-04-09-guggersylvain_en * Add model 2025-04-09-macegrunow_en * Add model 2025-04-09-macegrunow_pipeline_en * Add model 2025-04-09-nueclear333_pipeline_en * Add model 2025-04-09-olikuchi_en * Add model 2025-04-09-wav2vec2_large_xlsr_53_full_train_full_train_pipeline_en * Add model 2025-04-09-lanalilligant_en * Add model 2025-04-08-peppa_pipeline_en * Add model 2025-04-08-3_epochs_classifier_en * Add model 2025-04-08-bert_base_greek_uncased_v1_finetuned_ner_pipeline_en * Add model 2025-04-09-deit_base_patch16_224_rice_leaf_disease_augmented_tagalog_pipeline_en * Add model 2025-04-08-wav2vec2_large_xlsr_estonian_m3hrdadfi_pipeline_et * Add model 2025-04-08-sent_bert_base_uncased_multi_128_pipeline_en * Add model 2025-04-09-mspunks_en * Add model 2025-04-09-mspunks_pipeline_en * Add model 2025-04-09-vit_base_patch16_224_masaratti_pipeline_en * Add model 2025-04-09-burmese_awesome_emotion_identifier_model_en * Add model 2025-04-09-wav2vec2_large_xls_r_300m_chichewa_colab_en * Add model 2025-04-09-lesseyecontact_pipeline_en * Add model 2025-04-07-dialogpt_small_rick_havokx_pipeline_en * Add model 2025-04-08-wav2vec2_large_uralic_voxpopuli_v2_sami_parl_ext_ft_en * Add model 2025-04-09-dnlklr_pipeline_en * Add model 2025-04-09-wav2vec2_base_cynthia_timit_pipeline_en * Add model 2025-04-09-mri_classifier_djibri_pipeline_en * 2025-04-11-smolvlm_instruct_int4_en (#14550) * Add model 2025-04-11-smolvlm_instruct_int4_en * Add model 2025-04-14-paligemma_3b_pt_224_int4_en * Add model 2025-04-15-paligemma_3b_ft_vqav2_448_int4_en * Add model 2025-04-15-paligemma_3b_pt_224_int4_en * Add model 2025-04-15-paligemma2_3b_pt_448_int4_en * Add model 2025-04-15-paligemma2_3b_mix_224_int4_en * Add model 2025-04-28-gemma_3_4b_it_int4_en * Add model 2025-04-28-gemma_3_4b_pt_int4_en --------- Co-authored-by: prabod <[email protected]> * 2025-05-16-internvl2_1b_int4_en (#14577) * Add model 2025-05-16-internvl2_1b_int4_en * Add model 2025-05-16-internvl2_5_1b_int4_en * Add model 2025-05-16-internvl3_1b_int4_en * Add model 2025-05-16-internvl3_2b_int4_en * Add model 2025-05-16-internvl3_8b_int4_en * Add model 2025-05-16-internvl2_5_4b_int4_en * Add model 2025-05-27-florence_2_base_ft_int4_en * Add model 2025-05-27-florence_2_base_int4_en * Add model 2025-05-27-florence_2_large_ft_int4_en * Add model 2025-05-27-florence_2_large_int4_en --------- Co-authored-by: prabod <[email protected]> * 2025-05-17-internvl3_8b_int4_en (#14580) * Add model 2025-05-17-internvl3_8b_int4_en * Add model 2025-05-20-mmarco_mminilmv2_l12_h384_v1_nreimers_en * Add model 2025-05-20-mmarco_mminilmv2_l12_h384_v1_nreimers_pipeline_en * Add model 2025-05-20-bge_reranker_base_baai_en * Add model 2025-05-20-xlm_roberta_base_language_detection_xx * Add model 2025-05-20-bge_reranker_base_baai_pipeline_en * Add model 2025-05-20-xlm_roberta_base_language_detection_pipeline_xx * Add model 2025-05-20-twitter_xlm_roberta_base_sentiment_multilingual_xx * Add model 2025-05-20-korean_reranker_ko * Add model 2025-05-20-korean_reranker_pipeline_ko * Add model 2025-05-20-twitter_xlm_roberta_base_sentiment_multilingual_pipeline_xx * Add model 2025-05-20-bce_reranker_base_v1_maidalun1020_pipeline_en * Add model 2025-05-20-bce_reranker_base_v1_maidalun1020_en * Add model 2025-05-20-multilingual_iptc_news_topic_classifier_xx * Add model 2025-05-20-bge_reranker_v2_m3_en * Add model 2025-05-20-multilingual_iptc_news_topic_classifier_pipeline_xx * Add model 2025-05-20-bge_reranker_v2_m3_pipeline_en * Add model 2025-05-20-xlm_roberta_base_romanian_ner_ronec_ro * Add model 2025-05-20-xlm_roberta_ner_japanese_ja * Add model 2025-05-20-xlm_roberta_base_romanian_ner_ronec_pipeline_ro * Add model 2025-05-20-xlm_roberta_ner_japanese_pipeline_ja * Add model 2025-05-20-xlm_roberta_large_finetuned_conll03_english_xx * Add model 2025-05-20-xlm_roberta_large_finetuned_conll03_german_xx * Add model 2025-05-20-fullstop_punctuation_multilang_large_en * Add model 2025-05-20-xlm_roberta_large_finetuned_conll03_english_pipeline_xx * Add model 2025-05-20-fullstop_punctuation_multilang_large_pipeline_en * Add model 2025-05-20-xlm_roberta_large_finetuned_conll03_german_pipeline_xx * Add model 2025-05-20-xlm_roberta_large_ner_spanish_es * Add model 2025-05-20-sent_twitter_xlm_roberta_base_en * Add model 2025-05-20-sent_twitter_xlm_roberta_base_pipeline_en * Add model 2025-05-20-sent_infoxlm_base_en * Add model 2025-05-20-sent_mminilmv2_l12_h384_distilled_from_xlmr_large_en * Add model 2025-05-20-sent_infoxlm_base_pipeline_en * Add model 2025-05-20-sent_mminilmv2_l12_h384_distilled_from_xlmr_large_pipeline_en * Add model 2025-05-20-sent_infoxlm_large_en * Add model 2025-05-20-sent_xlm_roberta_large_xx * Add model 2025-05-21-clip_vit_base_patch16_en * Add model 2025-05-21-fashion_clip_en * Add model 2025-05-21-clip_vit_base_patch16_pipeline_en * Add model 2025-05-21-fashion_clip_pipeline_en * Add model 2025-05-21-zero_shot_classifier_clip_vit_base_patch32_en * Add model 2025-05-21-zero_shot_classifier_clip_vit_base_patch32_pipeline_en * Add model 2025-05-21-clip_vit_large_patch14_336_en * Add model 2025-05-21-xlmroberta_qa_ukrainian_uk * Add model 2025-05-21-xlmroberta_qa_ukrainian_pipeline_uk * Add model 2025-05-21-xlm_roberta_qa_xlm_roberta_base_arabic_ar * Add model 2025-05-21-xlm_roberta_qa_xlm_roberta_base_arabic_pipeline_ar * Add model 2025-05-21-xlm_roberta_qa_xlm_roberta_base_squad2_distilled_en * Add model 2025-05-21-xlm_roberta_qa_xlm_roberta_base_squad2_distilled_pipeline_en * Add model 2025-05-21-xlmr_large_qa_persian_farsi_fa * Add model 2025-05-21-persian_xlm_roberta_large_en * Add model 2025-05-21-xlmr_large_qa_persian_farsi_pipeline_fa * Add model 2025-05-21-persian_xlm_roberta_large_pipeline_en * Add model 2025-05-21-xlm_roberta_large_qa_multilingual_finedtuned_russian_xx * Add model 2025-05-21-xlm_roberta_large_qa_multilingual_finedtuned_russian_pipeline_xx * Add model 2025-05-21-xlm_roberta_large_xquad_en * Add model 2025-05-21-xlm_roberta_large_xquad_pipeline_en * Add model 2025-05-21-mminilmv2_l12_h384_distilled_from_xlmr_large_en * Add model 2025-05-21-mminilmv2_l12_h384_distilled_from_xlmr_large_pipeline_en * Add model 2025-05-21-twitter_xlm_roberta_base_en * Add model 2025-05-21-twitter_xlm_roberta_base_pipeline_en * Add model 2025-05-21-xlm_roberta_base_xx * Add model 2025-05-21-xlm_roberta_base_pipeline_xx * Add model 2025-05-21-infoxlm_large_en * Add model 2025-05-21-infoxlm_base_en * Add model 2025-05-21-infoxlm_base_pipeline_en * Add model 2025-05-21-xlm_roberta_large_xx * Add model 2025-05-21-xlm_v_base_xx * Add model 2025-05-21-infoxlm_large_pipeline_en * Add model 2025-05-21-xlm_roberta_large_pipeline_xx * Add model 2025-05-21-xlm_v_base_pipeline_xx * Add model 2025-05-21-robbert_v2_dutch_ner_nl * Add model 2025-05-21-roberta_large_ner_english_en * Add model 2025-05-21-robbert_v2_dutch_ner_pipeline_nl * Add model 2025-05-21-roberta_large_tweetner7_all_en * Add model 2025-05-21-roberta_large_ner_english_pipeline_en * Add model 2025-05-21-roberta_token_classifier_sayula_popoluca_tagger_id * Add model 2025-05-21-roberta_token_classifier_sayula_popoluca_tagger_pipeline_id * Add model 2025-05-21-roberta_large_tweetner7_all_pipeline_en * Add model 2025-05-22-twitter_roberta_base_sentiment_en * Add model 2025-05-22-roberta_hate_speech_dynabench_r4_target_en * Add model 2025-05-22-twitter_roberta_base_sentiment_latest_en * Add model 2025-05-22-robertuito_sentiment_analysis_pipeline_es * Add model 2025-05-22-roberta_base_go_emotions_en * Add model 2025-05-22-roberta_hate_speech_dynabench_r4_target_pipeline_en * Add model 2025-05-22-roberta_classifier_emotion_english_distil_base_pipeline_en * Add model 2025-05-22-robertuito_sentiment_analysis_es * Add model 2025-05-22-twitter_roberta_base_sentiment_latest_pipeline_en * Add model 2025-05-22-twitter_roberta_base_sentiment_pipeline_en * Add model 2025-05-22-roberta_classifier_emotion_english_distil_base_en * Add model 2025-05-22-roberta_large_mnli_pipeline_en * Add model 2025-05-22-roberta_large_mnli_en * Add model 2025-05-22-roberta_base_go_emotions_pipeline_en * Add model 2025-05-22-twitter_roberta_base_sentiment_latest_en * Add model 2025-05-22-roberta_hate_speech_dynabench_r4_target_en * Add model 2025-05-22-twitter_roberta_base_sentiment_en * Add model 2025-05-22-robertuito_sentiment_analysis_pipeline_es * Add model 2025-05-22-twitter_roberta_base_sentiment_pipeline_en * Add model 2025-05-22-roberta_base_go_emotions_pipeline_en * Add model 2025-05-22-roberta_hate_speech_dynabench_r4_target_pipeline_en * Add model 2025-05-22-roberta_classifier_emotion_english_distil_base_en * Add model 2025-05-22-robertuito_sentiment_analysis_es * Add model 2025-05-22-roberta_large_mnli_en * Add model 2025-05-22-roberta_large_mnli_pipeline_en * Add model 2025-05-22-roberta_classifier_emotion_english_distil_base_pipeline_en * Add model 2025-05-22-roberta_base_go_emotions_en * Add model 2025-05-22-twitter_roberta_base_sentiment_latest_pipeline_en * Add model 2025-05-22-distilroberta_base_en * Add model 2025-05-22-codebert_python_en * Add model 2025-05-22-distilroberta_base_pipeline_en * Add model 2025-05-22-roberta_base_en * Add model 2025-05-22-chemberta_zinc_base_v1_en * Add model 2025-05-22-roberta_base_pipeline_en * Add model 2025-05-22-roberta_large_en * Add model 2025-05-22-chemberta_zinc_base_v1_pipeline_en * Add model 2025-05-22-codebert_python_pipeline_en * Add model 2025-05-22-roberta_large_pipeline_en * Add model 2025-05-22-amd_power_dialer_v1_en * Add model 2025-05-22-coherence_all_mpnet_base_v2_en * Add model 2025-05-22-information_content_model_en * Add model 2025-05-22-icelandic_nepal_bhasa_dataset_teacher_model_en * Add model 2025-05-22-amd_full_phonetree_v1_pipeline_en * Add model 2025-05-22-amd_partial_phonetree_v1_en * Add model 2025-05-22-amd_partial_v1_en * Add model 2025-05-22-burmese_setfit_classifier_threat_en * Add model 2025-05-22-coherence_all_mpnet_base_v2_pipeline_en * Add model 2025-05-22-hub_report_20241202125641_pipeline_en * Add model 2025-05-22-amd_partial_v1_pipeline_en * Add model 2025-05-22-amd_partial_phonetree_v1_pipeline_en * Add model 2025-05-22-burmese_setfit_classifier_threat_pipeline_en * Add model 2025-05-22-setfit_model_en * Add model 2025-05-22-icelandic_nepal_bhasa_dataset_teacher_model_pipeline_en * Add model 2025-05-22-setfit_model_pipeline_en * Add model 2025-05-22-amd_power_dialer_v1_pipeline_en * Add model 2025-05-22-hub_report_20241202125641_en * Add model 2025-05-22-amd_full_phonetree_v1_en * Add model 2025-05-22-information_content_model_pipeline_en * Add model 2025-05-22-autotrain_kjxi3_hql8x_en * Add model 2025-05-22-multi_qa_mpnet_base_dot_v1_finetuned_squad2_all_en * Add model 2025-05-22-covid_qa_mpnet_en * Add model 2025-05-22-multi_qa_mpnet_base_dot_v1_finetuned_squad2_all_pipeline_en * Add model 2025-05-22-covid_qa_mpnet_pipeline_en * Add model 2025-05-22-autotrain_kjxi3_hql8x_pipeline_en * Add model 2025-05-22-multi_qa_mpnet_base_cos_v1_sentence_transformers_en * Add model 2025-05-22-multi_qa_mpnet_base_dot_v1_en * Add model 2025-05-22-paraphrase_mpnet_base_v2_en * Add model 2025-05-22-patentsberta_en * Add model 2025-05-22-all_mpnet_base_v2_sentence_transformers_pipeline_en * Add model 2025-05-22-multi_qa_mpnet_base_cos_v1_sentence_transformers_pipeline_en * Add model 2025-05-22-patentsberta_pipeline_en * Add model 2025-05-22-fin_mpnet_base_en * Add model 2025-05-22-nli_mpnet_base_v2_en * Add model 2025-05-22-fin_mpnet_base_pipeline_en * Add model 2025-05-22-biolord_2023_c_en * Add model 2025-05-22-all_mpnet_base_v2_sentence_transformers_en * Add model 2025-05-22-paraphrase_mpnet_base_v2_pipeline_en * Add model 2025-05-22-biolord_2023_pipeline_en * Add model 2025-05-22-multi_qa_mpnet_base_dot_v1_pipeline_en * Add model 2025-05-22-biolord_2023_c_pipeline_en * Add model 2025-05-22-biolord_2023_en * Add model 2025-05-22-nli_mpnet_base_v2_pipeline_en * Add model 2025-05-22-e5_small_v2_intfloat_en * Add model 2025-05-22-e5_small_en * Add model 2025-05-22-e5_small_v2_intfloat_pipeline_en * Add model 2025-05-22-e5_small_pipeline_en * Add model 2025-05-22-e5_base_v2_intfloat_pipeline_en * Add model 2025-05-22-e5_base_pipeline_en * Add model 2025-05-22-sentence_transformers_e5_large_v2_en * Add model 2025-05-22-e5_large_en * Add model 2025-05-22-sentence_transformers_e5_large_v2_pipeline_en * Add model 2025-05-22-e5_base_en * Add model 2025-05-22-e5_base_v2_intfloat_en * Add model 2025-05-22-e5_large_pipeline_en * Add model 2025-05-22-e5_large_v2_intfloat_en * Add model 2025-05-22-e5_large_v2_intfloat_pipeline_en * Add model 2025-05-24-e5_small_en * Add model 2025-05-24-e5_small_v2_intfloat_en * Add model 2025-05-24-e5_small_pipeline_en * Add model 2025-05-24-e5_base_v2_intfloat_pipeline_en * Add model 2025-05-24-e5_small_v2_intfloat_pipeline_en * Add model 2025-05-24-sentence_transformers_e5_large_v2_en * Add model 2025-05-24-e5_base_v2_intfloat_en * Add model 2025-05-24-e5_large_en * Add model 2025-05-24-e5_base_pipeline_en * Add model 2025-05-24-sentence_transformers_e5_large_v2_pipeline_en * Add model 2025-05-24-e5_large_v2_intfloat_en * Add model 2025-05-24-e5_large_pipeline_en * Add model 2025-05-24-e5_base_en * Add model 2025-05-25-distilbert_tok_classifier_typo_detector_en * Add model 2025-05-25-biomedical_ner_all_d4data_en * Add model 2025-05-25-distilbert_ner_distilbert_base_cased_finetuned_conll03_english_en * Add model 2025-05-25-distilbert_ner_distilbert_base_cased_finetuned_conll03_english_pipeline_en * Add model 2025-05-25-distilbert_finetuned_ai4privacy_v2_en * Add model 2025-05-25-distilbert_ner_distilbert_base_multilingual_cased_ner_hrl_nl * Add model 2025-05-25-biomedical_ner_all_d4data_pipeline_en * Add model 2025-05-25-distilbert_base_multilingual_cased_pii_xx * Add model 2025-05-25-distilbert_token_classifier_keyphrase_extraction_inspec_pipeline_en * Add model 2025-05-25-chonky_distilbert_base_uncased_1_en * Add model 2025-05-25-distilbert_ner_dslim_en * Add model 2025-05-25-distilbert_tok_classifier_typo_detector_pipeline_en * Add model 2025-05-25-chonky_distilbert_base_uncased_1_pipeline_en * Add model 2025-05-25-distilbert_finetuned_ai4privacy_v2_pipeline_en * Add model 2025-05-25-distilbert_base_multilingual_cased_pii_pipeline_xx * Add model 2025-05-25-distilbert_ner_distilbert_base_multilingual_cased_ner_hrl_pipeline_nl * Add model 2025-05-25-distilbert_ner_dslim_pipeline_en * Add model 2025-05-25-distilbert_token_classifier_keyphrase_extraction_inspec_en * Add model 2025-05-25-distilbert_base_uncased_go_emotions_student_en * Add model 2025-05-25-toxic_comment_model_en * Add model 2025-05-25-nsfw_text_classifier_en * Add model 2025-05-25-distilbert_nsfw_text_classifier_pipeline_en * Add model 2025-05-25-distilbert_base_uncased_go_emotions_student_pipeline_en * Add model 2025-05-25-toxic_comment_model_pipeline_en * Add model 2025-05-25-nsfw_text_classifier_pipeline_en * Add model 2025-05-25-distilbert_nsfw_text_classifier_en * Add model 2025-05-25-multilingual_sentiment_analysis_xx * Add model 2025-05-25-multilingual_sentiment_analysis_pipeline_xx * Add model 2025-05-27-thainer_corpus_v2_base_model_th * Add model 2025-05-27-thainer_corpus_v2_base_model_pipeline_th * Add model 2025-05-27-phayathaibert_thainer_th * Add model 2025-05-27-nermembert_base_4entities_fr * Add model 2025-05-27-cas_biomedical_sayula_popoluca_tagging_fr * Add model 2025-05-27-phayathaibert_thainer_pipeline_th * Add model 2025-05-27-nermembert_large_3entities_fr * Add model 2025-05-27-nermembert_large_3entities_pipeline_fr * Add model 2025-05-27-cas_biomedical_sayula_popoluca_tagging_pipeline_fr * Add model 2025-05-27-nermembert_base_4entities_pipeline_fr * Add model 2025-05-27-rubert_base_cased_nli_threeway_ru * Add model 2025-05-27-rubert_base_cased_nli_threeway_pipeline_ru --------- Co-authored-by: ahmedlone127 <[email protected]> * Add model 2025-06-10-e5v_int4_en (#14599) Co-authored-by: prabod <[email protected]> * Add model 2025-06-23-minilm_l6_v2_en * Add model 2025-06-22-bert_classifier_finbert_tone_en * Add model 2025-06-22-bert_classifier_finbert_tone_pipeline_en * Add model 2025-06-22-finbert_pipeline_en * Add model 2025-06-22-bert_base_multilingual_uncased_sentiment_xx * Add model 2025-06-22-bert_base_multilingual_uncased_sentiment_pipeline_xx * Add model 2025-06-22-finbert_en * Add model 2025-06-22-bert_base_multilingual_cased_google_bert_xx * Add model 2025-06-22-bert_base_multilingual_cased_google_bert_pipeline_xx * Add model 2025-06-22-bert_base_uncased_google_bert_en * Add model 2025-06-22-bert_base_cased_google_bert_pipeline_en * Add model 2025-06-22-bert_base_cased_google_bert_en * Add model 2025-06-22-bert_base_uncased_google_bert_pipeline_en * Add model 2025-06-22-sent_bert_base_multilingual_cased_xx * Add model 2025-06-22-sent_bert_base_multilingual_cased_pipeline_xx * Add model 2025-06-22-sent_bert_base_cased_en * Add model 2025-06-22-sent_bert_base_cased_pipeline_en * Add model 2025-06-22-sent_bert_base_uncased_pipeline_en * Add model 2025-06-22-sent_bert_base_uncased_en * Add model 2025-06-22-camembert_bio_base_fr * Add model 2025-06-22-camembert_bio_base_pipeline_fr * Add model 2025-06-22-drbert_7gb_fr * Add model 2025-06-22-umberto_commoncrawl_cased_v1_it * Add model 2025-06-22-camembert_base_fr * Add model 2025-06-22-umberto_commoncrawl_cased_v1_pipeline_it * Add model 2025-06-22-drbert_7gb_pipeline_fr * Add model 2025-06-22-sloberta_pipeline_sl * Add model 2025-06-22-camembert_base_pipeline_fr * Add model 2025-06-22-sloberta_sl * Add model 2025-06-22-wangchanberta_finetuned_sentiment_th * Add model 2025-06-22-wangchanberta_finetuned_sentiment_pipeline_th * Add model 2025-06-22-feel_italian_italian_emotion_it * Add model 2025-06-22-feel_italian_italian_sentiment_it * Add model 2025-06-22-finance_sentiment_french_base_fr * Add model 2025-06-22-feel_italian_italian_emotion_pipeline_it * Add model 2025-06-22-finance_sentiment_french_base_pipeline_fr * Add model 2025-06-22-ag_nli_dets_sentence_similarity_v4_pipeline_xx * Add model 2025-06-22-ag_nli_dets_sentence_similarity_v4_xx * Add model 2025-06-22-feel_italian_italian_sentiment_pipeline_it * Add model 2025-06-24-efficient_splade_vietnamese_bt_large_doc_en * Add model 2025-06-24-distilbert_base_cased_en * Add model 2025-06-24-distilbert_base_multilingual_cased_pipeline_xx * Add model 2025-06-24-distilbert_base_german_cased_de * Add model 2025-06-24-distilbert_base_cased_pipeline_en * Add model 2025-06-24-distilbert_base_multilingual_cased_xx * Add model 2025-06-24-distilbert_base_uncased_en * Add model 2025-06-24-opensearch_neural_sparse_encoding_v2_distill_en * Add model 2025-06-24-opensearch_neural_sparse_encoding_v2_distill_pipeline_en * Add model 2025-06-24-distilbert_base_uncased_pipeline_en * Add model 2025-06-24-efficient_splade_vietnamese_bt_large_doc_pipeline_en * Add model 2025-06-24-clinicalbert_pipeline_en * Add model 2025-06-24-opensearch_neural_sparse_encoding_doc_v2_distill_pipeline_en * Add model 2025-06-24-opensearch_neural_sparse_encoding_doc_v2_distill_en * Add model 2025-06-24-clinicalbert_en * Add model 2025-06-24-distilbert_base_german_cased_pipeline_de * Add model 2025-06-24-tiny_distilbert_base_cased_distilled_squad_en * Add model 2025-06-24-tiny_distilbert_base_cased_distilled_squad_pipeline_en * Add model 2025-06-24-distilbert_base_uncased_distilled_squad_distilbert_en * Add model 2025-06-24-distilbert_base_uncased_distilled_squad_distilbert_pipeline_en * Add model 2025-06-24-question_answering_v2_pipeline_en * Add model 2025-06-24-distilbert_base_cased_distilled_squad_distilbert_en * Add model 2025-06-24-distilbert_base_uncased_finetuned_squad_full_pipeline_en * Add model 2025-06-24-distilbert_base_cased_distilled_squad_distilbert_pipeline_en * Add model 2025-06-24-question_answering_v2_en * Add model 2025-06-24-distilbert_base_uncased_finetuned_squad_full_en * Add model 2025-06-24-hubert_large_japanese_asr_ja * Add model 2025-06-24-hubert_large_arabic_egyptian_ar * Add model 2025-06-24-hubert_large_japanese_asr_pipeline_ja * Add model 2025-06-24-hubert_large_arabic_egyptian_pipeline_ar * Add model 2025-06-24-distilbart_mnli_12_6_en * Add model 2025-06-24-distilbart_mnli_12_3_en * Add model 2025-06-24-distilbart_mnli_12_6_pipeline_en * Add model 2025-06-24-distilbart_mnli_12_1_en * Add model 2025-06-24-awesome_fb_model_en * Add model 2025-06-24-distilbart_mnli_12_3_pipeline_en * Add model 2025-06-24-distilbart_mnli_12_1_pipeline_en * Add model 2025-06-24-distilbart_mnli_12_9_pipeline_en * Add model 2025-06-24-bart_mnli_cnn_256_pipeline_en * Add model 2025-06-24-distilbart_mnli_12_9_en * Add model 2025-06-24-bart_mnli_cnn_256_en * Add model 2025-06-24-awesome_fb_model_pipeline_en * Add model 2025-06-24-bart_large_mnli_yahoo_answers_joeddav_pipeline_en * Add model 2025-06-24-bart_large_mnli_yahoo_answers_joeddav_en * Add model 2025-07-03-phi_3.5_mini_instruct_int4_en * 2025-07-15-bge_medembed_base_v0_1_openvino_en (#14629) * Add model 2025-07-15-bge_medembed_base_v0_1_openvino_en * Add model 2025-07-15-bge_medembed_large_v0_1_openvino_en * Add model 2025-07-15-all_mpnet_base_v2_openvino_en * Update 2025-07-15-bge_medembed_base_v0_1_openvino_en.md * Update 2025-07-15-bge_medembed_large_v0_1_openvino_en.md * Add model 2025-07-18-nuextract_2.0_2B_en --------- Co-authored-by: AbdullahMubeenAnwar <[email protected]> Co-authored-by: Abdullah mubeen <[email protected]> * 2025-07-25-phi_4_mini_instruct_q4_k_m_gguf_en (#14638) * Add model 2025-07-25-phi_4_mini_instruct_q4_k_m_gguf_en * Add model 2025-07-25-phi_4_mini_instruct_q8_0_gguf_en * Add model 2025-07-25-phi_4_mini_instruct_bf16_gguf_en * Add model 2025-07-25-Phi_4_mini_instruct_int4_openvino_en * Update 2025-07-25-Phi_4_mini_instruct_int4_openvino_en.md * Update 2025-07-25-Phi_4_mini_instruct_int4_openvino_en.md * Add model 2025-07-25-phi_4_mini_instruct_int8_openvino_en --------- Co-authored-by: AbdullahMubeenAnwar <[email protected]> Co-authored-by: Abdullah mubeen <[email protected]> * 2025-07-31-qwen3_4b_q4_k_m_gguf_en (#14639) * Add model 2025-07-31-qwen3_4b_q4_k_m_gguf_en * Add model 2025-07-31-qwen3_4b_q8_0_gguf_en * Add model 2025-07-31-qwen3_4b_bf16_gguf_en --------- Co-authored-by: AbdullahMubeenAnwar <[email protected]> * 2025-08-04-Qwen3_Embedding_0.6B_Q8_0_gguf_en (#14642) * Add model 2025-08-04-Qwen3_Embedding_0.6B_Q8_0_gguf_en * Update 2025-08-04-Qwen3_Embedding_0.6B_Q8_0_gguf_en.md * Add model 2025-08-04-Phi_4_mini_instruct_Q4_K_M_gguf_en * Add model 2025-08-04-Qwen2.5_VL_3B_Instruct_Q4_K_M_gguf_en --------- Co-authored-by: DevinTDHa <[email protected]> Co-authored-by: Devin Ha <[email protected]> * Add model 2025-08-11-qwen2_vl_2b_instruct_q4_gguf_en (#14648) Co-authored-by: AbdullahMubeenAnwar <[email protected]> * Add model 2025-09-01-bge_reranker_v2_m3_Q4_K_M_en * Add model 2025-09-15-qwen2.5_vl_7b_instruct_q16_gguf_en (#14661) Co-authored-by: AbdullahMubeenAnwar <[email protected]> * 2025-11-03-umlsbert_eng_onnx_en (#14683) * Add model 2025-11-03-umlsbert_eng_onnx_en * Update 2025-11-03-umlsbert_eng_onnx_en.md --------- Co-authored-by: AbdullahMubeenAnwar <[email protected]> Co-authored-by: Abdullah mubeen <[email protected]> * Add model 2025-11-04-bert_base_uncased_multiple_choice_en * Add model 2025-11-07-distilbert_base_cased_en (#14688) Co-authored-by: AbdullahMubeenAnwar <[email protected]> * Add model 2025-11-07-bge_base_en_v1_5_onnx_en (#14690) Co-authored-by: AbdullahMubeenAnwar <[email protected]> * Add model 2025-11-09-glove2024_wikigiga_200d_en --------- Co-authored-by: ahmedlone127 <[email protected]> Co-authored-by: jsl-models <[email protected]> Co-authored-by: prabod <[email protected]> Co-authored-by: AbdullahMubeenAnwar <[email protected]> Co-authored-by: Abdullah mubeen <[email protected]>
…ic… (#14701) * SPARKNLP-1315 changing input data type for CamemBertForTokenClassification from int 64 to 32 * SPARKNLP-1315 adding test for tensorflow models
* NerDLGraphChecker add missing setter on scala side * Introduce NerDLDataLoader for NerDLApproach Threaded NerDLDataLoader fetches batches in the background while training is happening in NerDLApproach, reducing idle time in the driver thread. * NerDLApproach: Optimize partitioning flag Allow NerDLApproach to repartition the input dataset, so the driver does not go out of memory when training on large partitions. * NerDL Optimizations python side
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Motivation and Context
How Has This Been Tested?
Screenshots (if appropriate):
Types of changes
Checklist: