|
1 | 1 |  |
2 | 2 |
|
3 | | -GLUE a lightweight, Python-based collection of scripts to support you at succeeding with speech and text use-cases based on [Microsoft Azure Cognitive Services](https://azure.microsoft.com/en-us/services/cognitive-services/). It not only allows you to batch-process data, rather glues together the services of your choice in an end-to-end pipeline. |
| 3 | +## About GLUE |
| 4 | +GLUE a lightweight, Python-based collection of scripts to support you at succeeding with speech and text use-cases based on [Microsoft Azure Cognitive Services](https://azure.microsoft.com/en-us/services/cognitive-services/). It not only allows you to batch-process data, rather glues together the services of your choice in one place and ensures an end-to-end view on the training and testing process. |
4 | 5 |
|
5 | | -- Batch-transcribe audio files to text transcripts using [Microsoft Speech to Text Service](https://azure.microsoft.com/en-us/services/cognitive-services/speech-to-text/) |
6 | | -- Batch-synthesize text data using [Microsoft Text to Speech Service](https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/) |
| 6 | +## Modules |
| 7 | +GLUE consists of multiple modules, which either can be executed separately or ran as a central pipeline: |
| 8 | +- Batch-transcribe audio files to text transcripts using [Microsoft Speech to Text Service](https://azure.microsoft.com/en-us/services/cognitive-services/speech-to-text/) (STT) |
| 9 | +- Batch-synthesize text data using [Microsoft Text to Speech Service](https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/) (TTS) |
| 10 | +- Batch-evaluate reference transcriptions and recognitions |
| 11 | + |
| 12 | +- Batch-score text strings on an existing, pre-trained [Microsoft LUIS](https://luis.ai)-model |
| 13 | + |
| 14 | +TBD: |
7 | 15 | - Batch-translate text data using [Microsoft Translator](https://azure.microsoft.com/en-us/services/cognitive-services/translator/) |
8 | | -- Batch-score text strings on a [Microsoft LUIS](https://luis.ai)-model |
9 | | -- [to extract LUIS files from Excel sheets and with that create test data sets to be scored using a LUIS endpoint.] |
10 | 16 |
|
11 | | -## Know before you go |
12 | | -This toolkit is based on multiple, free and/or open source software components. This section helps you to check whether you are all set for using it. |
| 17 | +## Getting Started |
| 18 | +This section describes how you get started with GLUE and which requirements need to be fulfilled by your working environment. |
13 | 19 |
|
14 | 20 | ### Prerequisites |
15 | 21 | Before getting your hands on the toolkits, make sure your local computer is equipped with the following frameworks and base packages: |
16 | 22 | - [Python](https://www.python.org/downloads/windows/) (required, Version 3.8 is recommended) |
17 | | -- [VSCode](https://code.visualstudio.com/docs/?dv=win) (recommended) |
18 | | - - alternatively, you can also run the scripts using PowerShell or PyCharm |
19 | | -- [git](https://git-scm.com/downloads) (recommended, alternatively download the repository as zip) |
20 | | -- Internet access for installing your environment and scoring the files |
21 | | - |
22 | | -After making sure these are all available on your system, the environment can be set up. |
23 | | - |
24 | | -### Setup of virtual environment |
25 | | -1. Open your PowerShell or open VSCode |
26 | | -1. Change the directory to your preferred workspace (using `cd`) |
27 | | -1. Download the repository as a ZIP-archive and unpack your file locally to the respective folder |
28 | | -1. Enter the root folder of your repository |
29 | | -1. Set up the virtual environment<br> |
30 | | -`python -m venv venv` |
31 | | -1. Activate the virtual environment<br> `venv\Scripts\activate` |
32 | | -1. Install the requirements<br> |
33 | | -`pip install -r requirements.txt` |
34 | | -1. After successfully installing the requirements-file, your environment is set up and you can go ahead. |
35 | | -Afterwards, you should be able to see the activated environment in the command line:<br>`(txttool)` |
36 | | - |
37 | | -### Get your keys |
38 | | -In the root directory, you will find a file named `config.sample.ini`. This is the file where all the LUIS keys have to be set. First, create a copy of this file and rename it to `config.ini`. You only need the keys for the services you use during your experiment. However, keep the structure of the `config.ini`-file as it is to avoid errors. The toolkit will just set the variable values as _none_, but will throw an error when the keys cannot be found. |
39 | | - |
40 | | -An instruction on how to get the keys can be found [here](getyourkeys.md). |
41 | | - |
42 | | -## How to use |
43 | | - |
44 | | -### File guidelines |
45 | | -There are some rules how the input files have to look like: |
46 | | -- tab-delimited file (If you only have an Excel sheet, you can create it using Excel -> Save as -> .txt (tab-delimited)) |
| 23 | +- [VSCode](https://code.visualstudio.com/docs/?dv=win) (recommended), but you can also run the scripts using PowerShell, Bash etc. |
| 24 | +- Stable connection for installing your environment and scoring the files |
| 25 | + |
| 26 | +### Setup of Virtual Environment |
| 27 | +1. Open a command line of your choice (PowerShell, Bash) |
| 28 | +2. Change the directory to your preferred workspace (using `cd`) |
| 29 | +3. Clone the repository (alternatively, download the repository as a zip-archive and unpack your file locally to the respective folder) |
| 30 | +``` |
| 31 | +git clone https://github.com/microsoft/glue |
| 32 | +``` |
| 33 | +4. Enter the root folder of the cloned repository |
| 34 | +``` |
| 35 | +cd glue |
| 36 | +``` |
| 37 | +5. Set up the virtual environment |
| 38 | +``` |
| 39 | +python -m venv .venv |
| 40 | +``` |
| 41 | +6. Activate the virtual environment |
| 42 | +```bash |
| 43 | +# Windows: |
| 44 | +.venv\Scripts\activate |
| 45 | +# Linux: |
| 46 | +.venv/bin/activate |
| 47 | +``` |
| 48 | +7. Install the requirements |
| 49 | +``` |
| 50 | +pip install -r requirements.txt |
| 51 | +``` |
| 52 | +8. (optional) If you want to use Jupyter Notebooks, you can register your activated environment using the command below |
| 53 | +``` |
| 54 | +python -m ipykernel install --user --name glue --display-name "Python (glue)" |
| 55 | +``` |
| 56 | +After successfully installing the requirements-file, your environment is set up and you can go ahead with the next step. |
| 57 | + |
| 58 | +### API Keys |
| 59 | +In the root directory of the repository, you can find a file named `config.sample.ini`. This is the file where the API keys and some other essential confirguation parameters have to be set, depending on which services you would like to use. First, create a copy of `config.sample.ini` and rename it to `config.ini` in the same directory. You only need the keys for the services you use during your experiment. However, keep the structure of the `config.ini`-file as it is to avoid errors. The toolkit will just set the values as empty, but will throw an error when the keys cannot be found at all. |
| 60 | + |
| 61 | +An instruction on how to get the keys can be found [here](GetYourKeys.md). |
| 62 | + |
| 63 | +### Input Parameters |
| 64 | +The following table shows and describes the available modes along with their input parameters as well as dependencies. |
| 65 | + |
| 66 | +| __Mode__ | __Command line parameter__ | __Description__ | __Dependencies__ | | |
| 67 | +|--------------------|----------------------------|-------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---| |
| 68 | +| __TTS__ | `--do_synthesize` | Activate text-to-speech synthetization | Requires csv file with `text`-column, see `--audio_files` | | |
| 69 | +| __STT__ | `--do_transcribe` | Activate speech-to-text processing | Requires audio files, see `--audio_files` | | |
| 70 | +| __STT__ | `--audio_files` | Path to folder with audio files | Audio files have to be provided as WAV-file with the parameters described [here](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-custom-speech-test-and-train) | | |
| 71 | +| __STT-Evaluation__ | `--do_evaluate` | Activate evaluation of transcriptions based on reference transcriptions | Requires csv-file with `text`-column and intent names | | |
| 72 | +| __LUIS__ | `--do_scoring` | Activate LUIS model scoring | Requires csv-file with `intent` and `text` columns | | |
| 73 | +| __STT / TTS__ | `--input` | Path to comma-separated text input file | | | |
| 74 | + |
| 75 | +The requirements for the input files (`--input` and `--audio`) |
| 76 | + |
| 77 | +## GLUE-Modules |
| 78 | +This section describes the single components of GLUE, which can either be ran autonomously or, ideally, using the central orchestrator. |
| 79 | + |
| 80 | +`glue.py` |
| 81 | +- Central application orchestrator of the toolkit. |
| 82 | +- Glues together the single modules in one place as needed. |
| 83 | +- Reads input files and writes output files. |
| 84 | + |
| 85 | +`stt.py` |
| 86 | +- Batch-transcription of audio files using [Microsoft Speech to Text API](https://azure.microsoft.com/en-us/services/cognitive-services/speech-to-text/). |
| 87 | +- Allows baseline models as well as custom endpoints. |
| 88 | +- Functionality is limited to the languages and locales listed on the [language support](hhttps://docs.microsoft.com/de-de/azure/cognitive-services/speech-service/language-support#speech-to-text) page. |
| 89 | + |
| 90 | +`tts.py` |
| 91 | +- Batch-synthetization of text strings using [Microsoft Text to Speech API](https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/). |
| 92 | +- Supports Speech Synthesis Markup Language (SSML) to fine-tune and customize the pronunciation, as described in the [documentation](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup?tabs=python). |
| 93 | +- Functionality is limited to the languages and fonts listed on the [language support](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support#text-to-speech) page. |
| 94 | +- Make sure the voice of your choice is available in the respective Azure region ([see documentation](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-text-to-speech#standard-and-neural-voices)). |
| 95 | + |
| 96 | +`luis.py` |
| 97 | +- Batch-scoring of intent-text combinations using an existing LUIS model |
| 98 | + - See the following [quickstart documentation](https://docs.microsoft.com/en-us/azure/cognitive-services/luis/luis-get-started-create-app) in case you need some inspiration for your first LUIS-app. |
| 99 | +- Configureable scoring treshold, if predictions only want to be accepted given a certain confidence score returned by the API. |
| 100 | +- Writes scoring report as comma-separated file. |
| 101 | +- Returns classification report and confusion matrix based on [scikit-learn](https://github.com/scikit-learn/scikit-learn). |
| 102 | + |
| 103 | +`evaluate.py` |
| 104 | +- Evaluation of transcription results by comparing them with reference transcripts. |
| 105 | +- Calculates metrics such as [Word Error Rate (WER)](https://en.wikipedia.org/wiki/Word_error_rate), Sentence Error Rate (SER), Word Recognition Rate (WRR). |
| 106 | +- Implementation based on [github.com/belambert/asr-evaluation](https://github.com/belambert/asr-evaluation). |
| 107 | +- See some hints on [how to improve your Custom Speech accuracy](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-custom-speech-evaluate-data). |
| 108 | + |
| 109 | +`params.py` |
| 110 | +- Collects API and configuration parameters from the command line (ArgumentParser) and the `config.ini`. |
| 111 | + |
| 112 | +`helper.py` |
| 113 | +- Collection of helper functions which do not have a purpose on their own, rather complementing the orchestrator and keeping the code neat and clean. |
| 114 | + |
| 115 | +### Input File Guidelines |
| 116 | +Depending on your use-case, you have to provide an input text file and/or audio files. In these cases, you have to pass the path to the respective input file of folder via the command line. There are some rules how the input files have to look like. |
| 117 | + |
| 118 | +- |
| 119 | +- Comma-separated file (If you only have an Excel sheet, you can create it using Excel: (_Save as_ -> comma-separated) |
47 | 120 | - UTF-8 encoding (to make sure it has the correct encoding, open it with a text editor such as [Notepad++](https://notepad-plus-plus.org/downloads/) -> Encoding -> Convert to UTF-8) |
48 | 121 | - Column names with the respective values dependent on the mode |
49 | 122 | - of columns _intent_ (ground-truth LUIS-intent) and _text_ (utterance of the text, max length of 500 characters) |
50 | 123 | - We recommend you to put the input file in the subfolder `input`. |
51 | 124 |
|
52 | | -| | __"intent"-column__ | __"text"-column__ | __"Audio File"-folder__ | |
53 | | -|-----------------|---------------------|-------------------|-------------------------| |
54 | | -| --do_synthesize | | X | | |
55 | | -| --do_transcribe | | | x | |
56 | | -| --do_evaluate | | X | | |
57 | | -| --do_scoring | X | | | |
58 | | -| --audio_files | | | X | |
59 | 125 |
|
60 | 126 | You can find an example file [here](input/testset-example.txt). |
61 | 127 |
|
@@ -117,4 +183,7 @@ To get deeper insights into the classification performance, there is a Jupyter n |
117 | 183 | 1. Place the scoring file from the output folder in the same folder as the notebook or just keep the directory in mind. There is an example file in the notebooks-folder as well |
118 | 184 | 1. Change the file name in the `Import data` section. If you want to reference to the file in the output folder, change it to `../../output/[date-of-case]-case/[date-of-case]-case.txt`. |
119 | 185 | 1. Execute all the fields - this might take a while especially during the plotting phase of the confusion matrix |
120 | | -1. If you want to store the evaluation report, you can do this by "File -> Export -> .html" and open it with any modern internet browser |
| 186 | +1. If you want to store the evaluation report, you can do this by "File -> Export -> .html" and open it with any modern internet browser |
| 187 | + |
| 188 | +## Limitations |
| 189 | +This toolkit is the right starting point for your bring-your-own data use cases. However, it does not provide automated training runs. |
0 commit comments