Skip to content

Commit 00e77cc

Browse files
author
nonstoptimm
committed
doku update
1 parent 9cc7b6a commit 00e77cc

File tree

7 files changed

+141
-74
lines changed

7 files changed

+141
-74
lines changed

README.md

Lines changed: 114 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,61 +1,127 @@
11
![GLUE](assets/img/glue_logo.png)
22

3-
GLUE a lightweight, Python-based collection of scripts to support you at succeeding with speech and text use-cases based on [Microsoft Azure Cognitive Services](https://azure.microsoft.com/en-us/services/cognitive-services/). It not only allows you to batch-process data, rather glues together the services of your choice in an end-to-end pipeline.
3+
## About GLUE
4+
GLUE a lightweight, Python-based collection of scripts to support you at succeeding with speech and text use-cases based on [Microsoft Azure Cognitive Services](https://azure.microsoft.com/en-us/services/cognitive-services/). It not only allows you to batch-process data, rather glues together the services of your choice in one place and ensures an end-to-end view on the training and testing process.
45

5-
- Batch-transcribe audio files to text transcripts using [Microsoft Speech to Text Service](https://azure.microsoft.com/en-us/services/cognitive-services/speech-to-text/)
6-
- Batch-synthesize text data using [Microsoft Text to Speech Service](https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/)
6+
## Modules
7+
GLUE consists of multiple modules, which either can be executed separately or ran as a central pipeline:
8+
- Batch-transcribe audio files to text transcripts using [Microsoft Speech to Text Service](https://azure.microsoft.com/en-us/services/cognitive-services/speech-to-text/) (STT)
9+
- Batch-synthesize text data using [Microsoft Text to Speech Service](https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/) (TTS)
10+
- Batch-evaluate reference transcriptions and recognitions
11+
12+
- Batch-score text strings on an existing, pre-trained [Microsoft LUIS](https://luis.ai)-model
13+
14+
TBD:
715
- Batch-translate text data using [Microsoft Translator](https://azure.microsoft.com/en-us/services/cognitive-services/translator/)
8-
- Batch-score text strings on a [Microsoft LUIS](https://luis.ai)-model
9-
- [to extract LUIS files from Excel sheets and with that create test data sets to be scored using a LUIS endpoint.]
1016

11-
## Know before you go
12-
This toolkit is based on multiple, free and/or open source software components. This section helps you to check whether you are all set for using it.
17+
## Getting Started
18+
This section describes how you get started with GLUE and which requirements need to be fulfilled by your working environment.
1319

1420
### Prerequisites
1521
Before getting your hands on the toolkits, make sure your local computer is equipped with the following frameworks and base packages:
1622
- [Python](https://www.python.org/downloads/windows/) (required, Version 3.8 is recommended)
17-
- [VSCode](https://code.visualstudio.com/docs/?dv=win) (recommended)
18-
- alternatively, you can also run the scripts using PowerShell or PyCharm
19-
- [git](https://git-scm.com/downloads) (recommended, alternatively download the repository as zip)
20-
- Internet access for installing your environment and scoring the files
21-
22-
After making sure these are all available on your system, the environment can be set up.
23-
24-
### Setup of virtual environment
25-
1. Open your PowerShell or open VSCode
26-
1. Change the directory to your preferred workspace (using `cd`)
27-
1. Download the repository as a ZIP-archive and unpack your file locally to the respective folder
28-
1. Enter the root folder of your repository
29-
1. Set up the virtual environment<br>
30-
`python -m venv venv`
31-
1. Activate the virtual environment<br> `venv\Scripts\activate`
32-
1. Install the requirements<br>
33-
`pip install -r requirements.txt`
34-
1. After successfully installing the requirements-file, your environment is set up and you can go ahead.
35-
Afterwards, you should be able to see the activated environment in the command line:<br>`(txttool)`
36-
37-
### Get your keys
38-
In the root directory, you will find a file named `config.sample.ini`. This is the file where all the LUIS keys have to be set. First, create a copy of this file and rename it to `config.ini`. You only need the keys for the services you use during your experiment. However, keep the structure of the `config.ini`-file as it is to avoid errors. The toolkit will just set the variable values as _none_, but will throw an error when the keys cannot be found.
39-
40-
An instruction on how to get the keys can be found [here](getyourkeys.md).
41-
42-
## How to use
43-
44-
### File guidelines
45-
There are some rules how the input files have to look like:
46-
- tab-delimited file (If you only have an Excel sheet, you can create it using Excel -> Save as -> .txt (tab-delimited))
23+
- [VSCode](https://code.visualstudio.com/docs/?dv=win) (recommended), but you can also run the scripts using PowerShell, Bash etc.
24+
- Stable connection for installing your environment and scoring the files
25+
26+
### Setup of Virtual Environment
27+
1. Open a command line of your choice (PowerShell, Bash)
28+
2. Change the directory to your preferred workspace (using `cd`)
29+
3. Clone the repository (alternatively, download the repository as a zip-archive and unpack your file locally to the respective folder)
30+
```
31+
git clone https://github.com/microsoft/glue
32+
```
33+
4. Enter the root folder of the cloned repository
34+
```
35+
cd glue
36+
```
37+
5. Set up the virtual environment
38+
```
39+
python -m venv .venv
40+
```
41+
6. Activate the virtual environment
42+
```bash
43+
# Windows:
44+
.venv\Scripts\activate
45+
# Linux:
46+
.venv/bin/activate
47+
```
48+
7. Install the requirements
49+
```
50+
pip install -r requirements.txt
51+
```
52+
8. (optional) If you want to use Jupyter Notebooks, you can register your activated environment using the command below
53+
```
54+
python -m ipykernel install --user --name glue --display-name "Python (glue)"
55+
```
56+
After successfully installing the requirements-file, your environment is set up and you can go ahead with the next step.
57+
58+
### API Keys
59+
In the root directory of the repository, you can find a file named `config.sample.ini`. This is the file where the API keys and some other essential confirguation parameters have to be set, depending on which services you would like to use. First, create a copy of `config.sample.ini` and rename it to `config.ini` in the same directory. You only need the keys for the services you use during your experiment. However, keep the structure of the `config.ini`-file as it is to avoid errors. The toolkit will just set the values as empty, but will throw an error when the keys cannot be found at all.
60+
61+
An instruction on how to get the keys can be found [here](GetYourKeys.md).
62+
63+
### Input Parameters
64+
The following table shows and describes the available modes along with their input parameters as well as dependencies.
65+
66+
| __Mode__ | __Command line parameter__ | __Description__ | __Dependencies__ | |
67+
|--------------------|----------------------------|-------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|
68+
| __TTS__ | `--do_synthesize` | Activate text-to-speech synthetization | Requires csv file with `text`-column, see `--audio_files` | |
69+
| __STT__ | `--do_transcribe` | Activate speech-to-text processing | Requires audio files, see `--audio_files` | |
70+
| __STT__ | `--audio_files` | Path to folder with audio files | Audio files have to be provided as WAV-file with the parameters described [here](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-custom-speech-test-and-train) | |
71+
| __STT-Evaluation__ | `--do_evaluate` | Activate evaluation of transcriptions based on reference transcriptions | Requires csv-file with `text`-column and intent names | |
72+
| __LUIS__ | `--do_scoring` | Activate LUIS model scoring | Requires csv-file with `intent` and `text` columns | |
73+
| __STT / TTS__ | `--input` | Path to comma-separated text input file | | |
74+
75+
The requirements for the input files (`--input` and `--audio`)
76+
77+
## GLUE-Modules
78+
This section describes the single components of GLUE, which can either be ran autonomously or, ideally, using the central orchestrator.
79+
80+
`glue.py`
81+
- Central application orchestrator of the toolkit.
82+
- Glues together the single modules in one place as needed.
83+
- Reads input files and writes output files.
84+
85+
`stt.py`
86+
- Batch-transcription of audio files using [Microsoft Speech to Text API](https://azure.microsoft.com/en-us/services/cognitive-services/speech-to-text/).
87+
- Allows baseline models as well as custom endpoints.
88+
- Functionality is limited to the languages and locales listed on the [language support](hhttps://docs.microsoft.com/de-de/azure/cognitive-services/speech-service/language-support#speech-to-text) page.
89+
90+
`tts.py`
91+
- Batch-synthetization of text strings using [Microsoft Text to Speech API](https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/).
92+
- Supports Speech Synthesis Markup Language (SSML) to fine-tune and customize the pronunciation, as described in the [documentation](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup?tabs=python).
93+
- Functionality is limited to the languages and fonts listed on the [language support](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support#text-to-speech) page.
94+
- Make sure the voice of your choice is available in the respective Azure region ([see documentation](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-text-to-speech#standard-and-neural-voices)).
95+
96+
`luis.py`
97+
- Batch-scoring of intent-text combinations using an existing LUIS model
98+
- See the following [quickstart documentation](https://docs.microsoft.com/en-us/azure/cognitive-services/luis/luis-get-started-create-app) in case you need some inspiration for your first LUIS-app.
99+
- Configureable scoring treshold, if predictions only want to be accepted given a certain confidence score returned by the API.
100+
- Writes scoring report as comma-separated file.
101+
- Returns classification report and confusion matrix based on [scikit-learn](https://github.com/scikit-learn/scikit-learn).
102+
103+
`evaluate.py`
104+
- Evaluation of transcription results by comparing them with reference transcripts.
105+
- Calculates metrics such as [Word Error Rate (WER)](https://en.wikipedia.org/wiki/Word_error_rate), Sentence Error Rate (SER), Word Recognition Rate (WRR).
106+
- Implementation based on [github.com/belambert/asr-evaluation](https://github.com/belambert/asr-evaluation).
107+
- See some hints on [how to improve your Custom Speech accuracy](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-custom-speech-evaluate-data).
108+
109+
`params.py`
110+
- Collects API and configuration parameters from the command line (ArgumentParser) and the `config.ini`.
111+
112+
`helper.py`
113+
- Collection of helper functions which do not have a purpose on their own, rather complementing the orchestrator and keeping the code neat and clean.
114+
115+
### Input File Guidelines
116+
Depending on your use-case, you have to provide an input text file and/or audio files. In these cases, you have to pass the path to the respective input file of folder via the command line. There are some rules how the input files have to look like.
117+
118+
-
119+
- Comma-separated file (If you only have an Excel sheet, you can create it using Excel: (_Save as_ -> comma-separated)
47120
- UTF-8 encoding (to make sure it has the correct encoding, open it with a text editor such as [Notepad++](https://notepad-plus-plus.org/downloads/) -> Encoding -> Convert to UTF-8)
48121
- Column names with the respective values dependent on the mode
49122
- of columns _intent_ (ground-truth LUIS-intent) and _text_ (utterance of the text, max length of 500 characters)
50123
- We recommend you to put the input file in the subfolder `input`.
51124

52-
| | __"intent"-column__ | __"text"-column__ | __"Audio File"-folder__ |
53-
|-----------------|---------------------|-------------------|-------------------------|
54-
| --do_synthesize | | X | |
55-
| --do_transcribe | | | x |
56-
| --do_evaluate | | X | |
57-
| --do_scoring | X | | |
58-
| --audio_files | | | X |
59125

60126
You can find an example file [here](input/testset-example.txt).
61127

@@ -117,4 +183,7 @@ To get deeper insights into the classification performance, there is a Jupyter n
117183
1. Place the scoring file from the output folder in the same folder as the notebook or just keep the directory in mind. There is an example file in the notebooks-folder as well
118184
1. Change the file name in the `Import data` section. If you want to reference to the file in the output folder, change it to `../../output/[date-of-case]-case/[date-of-case]-case.txt`.
119185
1. Execute all the fields - this might take a while especially during the plotting phase of the confusion matrix
120-
1. If you want to store the evaluation report, you can do this by "File -> Export -> .html" and open it with any modern internet browser
186+
1. If you want to store the evaluation report, you can do this by "File -> Export -> .html" and open it with any modern internet browser
187+
188+
## Limitations
189+
This toolkit is the right starting point for your bring-your-own data use cases. However, it does not provide automated training runs.

src/main.py renamed to src/glue.py

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,21 +3,21 @@
33
''' Supports Text-To-Speech (TTS), Speech-To-Text (STT) and LUIS-Scoring '''
44

55
# Import standard packages
6-
import logging
7-
import argparse
86
import os
97
import sys
10-
import configparser
118
import shutil
9+
import logging
10+
import argparse
11+
import configparser
1212
import pandas as pd
1313

1414
# Import custom modules
15+
import luis
16+
import stt
17+
import tts
1518
import params as pa
1619
import helper as he
1720
import evaluate as eval
18-
import score_luis as luis
19-
import speech_transcribe as stt
20-
import synthesize_text as tts
2121

2222
''' COMMAND EXAMPLES '''
2323
# python .\src\main.py --do_synthesize --input input/scoringfile.txt
@@ -28,9 +28,8 @@
2828

2929
# Set arguments
3030
fname = args.input
31-
subfolder = args.subfolder
3231
luis_treshold = args.treshold
33-
audio_files = args.audio_files
32+
audio_files = args.audio
3433
do_synthesize = args.do_synthesize
3534
do_scoring = args.do_scoring
3635
do_transcribe = args.do_transcribe
@@ -45,9 +44,9 @@
4544
if __name__ == '__main__':
4645
logging.info('[INFO] - Starting Cognitive Services Tools - v0.1')
4746

48-
# Case management
47+
# Case Management
4948
if any([do_scoring, do_synthesize, do_transcribe, do_evaluate]):
50-
output_folder, case = he.create_case(pa.output_folder, subfolder)
49+
output_folder, case = he.create_case(pa.output_folder)
5150
logging.info(f'[INFO] - Created case {case}')
5251
try:
5352
shutil.copyfile(fname, f'{output_folder}/{case}/input/{os.path.basename(fname)}')

src/helper.py

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,10 @@
1414
from sklearn.model_selection import train_test_split
1515

1616
# Helper Functions
17-
def create_case(output_folder, subfolders):
17+
def create_case(output_folder):
1818
""" Create case for project
1919
Args:
2020
output_folder: directory of output folder
21-
subfolders: list of folders to be created as subfolders
2221
Returns:
2322
output_folder: directory of output folder
2423
case: name of the created case
@@ -27,8 +26,6 @@ def create_case(output_folder, subfolders):
2726
# Create Case
2827
case = f"{datetime.today().strftime('%Y-%m-%d_%H-%M-%S')}"
2928
os.makedirs(f"{output_folder}/{case}", exist_ok=True)
30-
for folder in subfolders.split(","):
31-
os.makedirs(f"{output_folder}/{case}/{folder}", exist_ok=True)
3229
return output_folder, case
3330

3431
def create_df(fname):
File renamed without changes.

src/params.py

Lines changed: 17 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -5,52 +5,54 @@
55
def get_params(parser):
66
'''
77
Collect arguments from command line
8+
Args:
9+
parser: ArgumentParser-object
10+
Returns:
11+
args: object with parsed arguments
812
'''
913
parser.add_argument("--input",
1014
type=str,
11-
#default="input/example_testset_flights.txt",
12-
help="give the whole path to tab-delimited file")
15+
help="Path to comma-separated text input file")
1316
parser.add_argument("--subfolder",
1417
default="input",
1518
type=str,
1619
help="Input folders, pass comma-separated if multiple ones")
17-
parser.add_argument("--audio_files",
18-
#default="input/audio/",
20+
parser.add_argument("--audio",
1921
type=str,
20-
help="Input folders, pass comma-separated if multiple ones")
21-
parser.add_argument("--treshold",
22-
default=0.85,
23-
type=float,
24-
help="Set minimum confidence score between 0.00 and 1.00")
22+
help="Path to folder with audio files")
2523
parser.add_argument("--do_transcribe",
2624
default=False,
2725
action="store_true",
28-
help="Speech to Text using Microsoft Speech Service")
26+
help="Activate speech-to-text processing")
2927
parser.add_argument("--do_scoring",
3028
default=False,
3129
action="store_true",
32-
help="Model testing using LUIS API")
30+
help="Activate LUIS model scoring")
3331
parser.add_argument("--do_synthesize",
3432
default=False,
3533
action="store_true",
36-
help="Text to speech using Microsoft Speech API")
34+
help="Activate text-to-speech synthetization")
3735
parser.add_argument("--do_evaluate",
3836
default=False,
3937
action="store_true",
40-
help="Evaluate speech transcriptions")
38+
help="Activate evaluation of transcriptions based on reference transcriptions")
4139
args = parser.parse_args()
4240
return args
4341

44-
def get_config():
42+
def get_config(fname_config='config.ini'):
4543
'''
4644
Collect parameters from config file
45+
Args:
46+
fname_config: file name of config file
47+
Returns:
48+
Sets parsed arguments as global variables
4749
'''
4850
# Get config file
4951
sys.path.append('./')
5052
config = configparser.ConfigParser()
5153
global output_folder, luis_appid, luis_key, luis_region, luis_endpoint, luis_slot, luis_treshold, stt_key, stt_endpoint, stt_region, tts_key, tts_region, tts_resource_name, tts_language, tts_font
5254
try:
53-
config.read('config.ini')
55+
config.read(fname_config)
5456
output_folder = config['dir']['output_folder']
5557
stt_key = config['stt']['key']
5658
stt_endpoint = config['stt']['endpoint']
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)