Skip to content

voidful/GSQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GSQA

Environment Settings

pip3 install -r requirements.txt
# pip3 install -r requirements_2.txt # Oscar's local env settings

Fine-tuned LM List

HuBERT Unit:long-t5-base-SQA-hubert-100
mHuBERT Unit:long-t5-base-SQA-mhubert-1000

Training

Datasets: NMSQA

T5-series Model:long-T5

Training Script:

python3 main.py

Multi-Task Training

Datasets

Unit Datasets: GSQA/speech-alpaca-gpt4-unit Speech Datasets GSQA/spoken-alpaca-gpt4

Models Hub

T5-series Model:long-T5 alpaca-TQA-init T5-series Model: LongT5-alpaca-TQA

1. setting

login GSQA authorized huggingface account

$ huggingface-cli login

login wandb account to record training figures

$ wandb login --relogin

2. training script

# select one of the aux_task in choices to fill after --aux_task
$ python3 main_multiTask.py --aux_task qt,at,qu
(choices=['qt,qu', 'qt,at,qu', "qu,at", "at"])

3. after finish training, push model to https://huggingface.co/GSQA


Unit-to-unit Evaluation

ASR Model:Whisper --> TBD

Evaluating Script:

# stpe1: run
python3 whisper_evaluate.py --model /path/to/the/huggingface/model --auto_split_dataset
# (for more optional arguments check whisper_evaluate.py)

# step 2: for alpaca dataset BertScore, run
python3 BertScore_eval.py
# (remember to change the evaluation file path first)

# step 2: for dataset with context, run
python3 eval_score.py # Remember to check the name of output files.
# Note: Please put the best reported score to Overleaf Table.

About

Generative Spoken Question Answering

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages