diff --git a/README.md b/README.md index d51eff0a9..f0ee8bb61 100644 --- a/README.md +++ b/README.md @@ -227,6 +227,8 @@ which is close to state-of-the art. If training on a single GPU, try the or larger data-sets (e.g., for English-French), try the big model with `--hparams_set=transformer_big`. +See this [example](https://github.com/Styleoshin/tensor2tensor/blob/Transformer_tutorial/tensor2tensor/notebooks/Transformer_translate.ipynb) to know how the translation works. + ## Basics ### Walkthrough diff --git a/tensor2tensor/notebooks/Transformer_translate.ipynb b/tensor2tensor/notebooks/Transformer_translate.ipynb new file mode 100644 index 000000000..07c350351 --- /dev/null +++ b/tensor2tensor/notebooks/Transformer_translate.ipynb @@ -0,0 +1,1102 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "Transformer_translate.ipynb", + "version": "0.3.2", + "provenance": [], + "collapsed_sections": [], + "toc_visible": true, + "include_colab_link": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "accelerator": "GPU" + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e7PMze9tKHX9", + "colab_type": "text" + }, + "source": [ + "# Welcome to the [Tensor2Tensor](https://github.com/tensorflow/tensor2tensor) Colab\n", + "\n", + "Tensor2Tensor, or T2T for short, is a library of deep learning models and datasets designed to make deep learning more accessible and [accelerate ML research](https://research.googleblog.com/2017/06/accelerating-deep-learning-research.html). In this notebook we will see how to use this library for a translation task by exploring the necessary steps. We will see how to define a problem, generate the data, train the model and test the quality of it, and we will translate our sequences and we visualize the attention. We will also see how to download a pre-trained model." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "KC8jNpnyKJdm", + "colab_type": "code", + "cellView": "form", + "colab": {} + }, + "source": [ + "#@title\n", + "# Copyright 2018 Google LLC.\n", + "\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "AYUy570fKRcw", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# Install deps\n", + "!pip install -q -U tensor2tensor" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hEhFfyVNbB_D", + "colab_type": "text" + }, + "source": [ + "#1. Initialization\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i23pCAVwegx3", + "colab_type": "text" + }, + "source": [ + "##1.1. Make some directories" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "oUf4e18_8E31", + "colab_type": "code", + "colab": {} + }, + "source": [ + "import tensorflow as tf\n", + "import os\n", + "\n", + "DATA_DIR = os.path.expanduser(\"/t2t/data\") # This folder contain the data\n", + "TMP_DIR = os.path.expanduser(\"/t2t/tmp\")\n", + "TRAIN_DIR = os.path.expanduser(\"/t2t/train\") # This folder contain the model\n", + "EXPORT_DIR = os.path.expanduser(\"/t2t/export\") # This folder contain the exported model for production\n", + "TRANSLATIONS_DIR = os.path.expanduser(\"/t2t/translation\") # This folder contain all translated sequence\n", + "EVENT_DIR = os.path.expanduser(\"/t2t/event\") # Test the BLEU score\n", + "USR_DIR = os.path.expanduser(\"/t2t/user\") # This folder contains our data that we want to add\n", + " \n", + "tf.gfile.MakeDirs(DATA_DIR)\n", + "tf.gfile.MakeDirs(TMP_DIR)\n", + "tf.gfile.MakeDirs(TRAIN_DIR)\n", + "tf.gfile.MakeDirs(EXPORT_DIR)\n", + "tf.gfile.MakeDirs(TRANSLATIONS_DIR)\n", + "tf.gfile.MakeDirs(EVENT_DIR)\n", + "tf.gfile.MakeDirs(USR_DIR)" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HIuzsMzgbLv9", + "colab_type": "text" + }, + "source": [ + "## 1.2. Init parameters\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ZQaURmfKBGus", + "colab_type": "code", + "colab": {} + }, + "source": [ + "PROBLEM = \"translate_enfr_wmt32k\" # We chose a problem translation English to French with 32.768 vocabulary\n", + "MODEL = \"transformer\" # Our model\n", + "HPARAMS = \"transformer_big\" # Hyperparameters for the model by default \n", + " # If you have a one gpu, use transformer_big_single_gpu" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "EikK-hW5m-ax", + "colab_type": "code", + "colab": {} + }, + "source": [ + "#Show all problems and models \n", + "\n", + "from tensor2tensor.utils import registry\n", + "from tensor2tensor import problems\n", + "\n", + "problems.available() #Show all problems\n", + "registry.list_models() #Show all registered models\n", + "\n", + "#or\n", + "\n", + "#Command line\n", + "!t2t-trainer --registry_help #Show all problems\n", + "!t2t-trainer --problems_help #Show all models" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "78kBAIMQbeO6", + "colab_type": "text" + }, + "source": [ + "# 2. Data generation \n", + "\n", + "Generate the data (download the dataset and generate the data).\n", + "\n", + "---\n", + "\n", + " You can choose between command line or code." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CrDy3V7ibpQH", + "colab_type": "text" + }, + "source": [ + "## 2.1. Generate with terminal\n", + "For more information: https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/bin/t2t_datagen.py" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "0Dfr8nFXmg1o", + "colab_type": "code", + "colab": {} + }, + "source": [ + "!t2t-datagen \\\n", + " --data_dir=$DATA_DIR \\\n", + " --tmp_dir=$TMP_DIR \\\n", + " --problem=$PROBLEM \\\n", + " --t2t_usr_dir=$USR_DIR" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tMvCiiBtbuzh", + "colab_type": "text" + }, + "source": [ + "## 2.2. Generate with code" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Of5bHYVJmbwH", + "colab_type": "code", + "colab": {} + }, + "source": [ + "t2t_problem = problems.problem(PROBLEM)\n", + "t2t_problem.generate_data(DATA_DIR, TMP_DIR) " + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UkSwoqBzb47T", + "colab_type": "text" + }, + "source": [ + "# 3. Train the model\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1JVF2PJn7ByQ", + "colab_type": "text" + }, + "source": [ + "##3.1. Init parameters\n", + "\n", + "You can choose between command line or code.\n", + "\n", + "---\n", + "\n", + " batch_size : a great value of preference.\n", + "\n", + "---\n", + "train_steps : research paper mentioned 300k steps with 8 gpu on big transformer. So if you have 1 gpu, you will need to train the model x8 more. (https://arxiv.org/abs/1706.03762 for more information).\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "yw6HgVWA7AQF", + "colab_type": "code", + "colab": {} + }, + "source": [ + "train_steps = 300000 # Total number of train steps for all Epochs\n", + "eval_steps = 100 # Number of steps to perform for each evaluation\n", + "batch_size = 4096\n", + "save_checkpoints_steps = 1000\n", + "ALPHA = 0.1\n", + "schedule = \"continuous_train_and_eval\"" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze_YvVnIfD8z", + "colab_type": "text" + }, + "source": [ + "You can choose schedule :\n", + " \n", + "\n", + "* train. Bad quality\n", + "* continuous_train_and_eval (default)\n", + "* train_and_eval\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-zAub7Ggb8tj", + "colab_type": "text" + }, + "source": [ + "##3.2. Train with terminal\n", + "https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/bin/t2t_trainer.py\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "kSYAi4BsnpSD", + "colab_type": "code", + "colab": {} + }, + "source": [ + "!t2t-trainer \\\n", + " --data_dir=$DATA_DIR \\\n", + " --problem=$PROBLEM \\\n", + " --model=$MODEL \\\n", + " --hparams_set=$HPARAMS \\\n", + " --hparams=\"batch_size=$batch_size\" \\\n", + " --schedule=$schedule\\\n", + " --output_dir=$TRAIN_DIR \\\n", + " --train_steps=$train_steps \\\n", + " --worker-gpu=1 \\ \n", + " --eval_steps=$eval_steps " + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bNfNBWtNVMwO", + "colab_type": "text" + }, + "source": [ + " --worker-gpu = 1, for train on 1 gpu (facultative).\n", + "\n", + "---\n", + "\n", + "For distributed training see: https://github.com/tensorflow/tensor2tensor/blob/master/docs/distributed_training.md\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nnSoC1AUcLG6", + "colab_type": "text" + }, + "source": [ + "##3.3. Train with code\n", + "create_hparams : https://github.com/tensorflow/tensor2tensor/blob/28adf2690c551ef0f570d41bef2019d9c502ec7e/tensor2tensor/utils/hparams_lib.py#L42\n", + "\n", + "---\n", + "Change hyper parameters :\n", + "https://github.com/tensorflow/tensor2tensor/blob/28adf2690c551ef0f570d41bef2019d9c502ec7e/tensor2tensor/models/transformer.py#L1627\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "RJ91vQ2hyIPx", + "colab_type": "code", + "colab": {} + }, + "source": [ + "from tensor2tensor.utils.trainer_lib import create_run_config, create_experiment\n", + "from tensor2tensor.utils.trainer_lib import create_hparams\n", + "from tensor2tensor.utils import registry\n", + "from tensor2tensor import models\n", + "from tensor2tensor import problems\n", + "\n", + "# Init Hparams object from T2T Problem\n", + "hparams = create_hparams(HPARAMS)\n", + "\n", + "# Make Changes to Hparams\n", + "hparams.batch_size = batch_size\n", + "hparams.learning_rate = ALPHA\n", + "#hparams.max_length = 256\n", + "\n", + "# Can see all Hparams with code below\n", + "#print(json.loads(hparams.to_json())" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KZX1cwK3TEXs", + "colab_type": "text" + }, + "source": [ + "create_run_config : https://github.com/tensorflow/tensor2tensor/blob/28adf2690c551ef0f570d41bef2019d9c502ec7e/tensor2tensor/utils/trainer_lib.py#L105\n", + "\n", + "---\n", + "\n", + "\n", + "create_experiment : https://github.com/tensorflow/tensor2tensor/blob/28adf2690c551ef0f570d41bef2019d9c502ec7e/tensor2tensor/utils/trainer_lib.py#L611" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "yByKcs7XvAXL", + "colab_type": "code", + "colab": {} + }, + "source": [ + "RUN_CONFIG = create_run_config(\n", + " model_dir=TRAIN_DIR,\n", + " model_name=MODEL,\n", + " save_checkpoints_steps= save_checkpoints_steps\n", + ")\n", + "\n", + "tensorflow_exp_fn = create_experiment(\n", + " run_config=RUN_CONFIG,\n", + " hparams=hparams,\n", + " model_name=MODEL,\n", + " problem_name=PROBLEM,\n", + " data_dir=DATA_DIR, \n", + " train_steps=train_steps, \n", + " eval_steps=eval_steps, \n", + " #use_xla=True # For acceleration\n", + " ) \n", + "\n", + "tensorflow_exp_fn.train_and_evaluate()" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "03xuR70jce_2", + "colab_type": "text" + }, + "source": [ + "#4. See the BLEU score" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "MiwyVWPhhGrk", + "colab_type": "code", + "colab": {} + }, + "source": [ + "#INIT FILE FOR TRANSLATE\n", + "\n", + "SOURCE_TEST_TRANSLATE_DIR = TMP_DIR+\"/dev/newstest2014-fren-src.en.sgm\"\n", + "REFERENCE_TEST_TRANSLATE_DIR = TMP_DIR+\"/dev/newstest2014-fren-ref.en.sgm\"\n", + "BEAM_SIZE=1" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "agnSg_89cr63", + "colab_type": "text" + }, + "source": [ + "##4.1. Translate all\n", + "https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/bin/t2t_translate_all.py" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Jrt5fwqsg3pl", + "colab_type": "code", + "colab": {} + }, + "source": [ + "!t2t-translate-all \\\n", + " --source=$SOURCE_TEST_TRANSLATE_DIR \\\n", + " --model_dir=$TRAIN_DIR \\\n", + " --translations_dir=$TRANSLATIONS_DIR \\\n", + " --data_dir=$DATA_DIR \\\n", + " --problem=$PROBLEM \\\n", + " --hparams_set=$HPARAMS \\\n", + " --output_dir=$TRAIN_DIR \\\n", + " --t2t_usr_dir=$USR_DIR \\\n", + " --beam_size=$BEAM_SIZE \\\n", + " --model=$MODEL" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "O-pKKU2Acv8Q", + "colab_type": "text" + }, + "source": [ + "##4.2. Test the BLEU score\n", + "The BLEU score for all translations: https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/bin/t2t_bleu.py#L68\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "EULP9TdPc58d", + "colab_type": "code", + "colab": {} + }, + "source": [ + "!t2t-bleu \\\n", + " --translations_dir=$TRANSLATIONS_DIR \\\n", + " --model_dir=$TRAIN_DIR \\\n", + " --data_dir=$DATA_DIR \\\n", + " --problem=$PROBLEM \\\n", + " --hparams_set=$HPARAMS \\\n", + " --source=$SOURCE_TEST_TRANSLATE_DIR \\\n", + " --reference=$REFERENCE_TEST_TRANSLATE_DIR \\\n", + " --event_dir=$EVENT_DIR" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "13j50bpAc-bM", + "colab_type": "text" + }, + "source": [ + "#5. Prediction of sentence\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8WHPnqxhdQl6", + "colab_type": "text" + }, + "source": [ + "##5.1. Predict with terminal\n", + "https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/bin/t2t_decoder.py" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "3SD-XhImnwpo", + "colab_type": "code", + "colab": {} + }, + "source": [ + "!echo \"the business of the house\" > \"inputs.en\"\n", + "!echo -e \"les affaires de la maison\" > \"reference.fr\" # You can add other references\n", + "\n", + "!t2t-decoder \\\n", + " --data_dir=$DATA_DIR \\\n", + " --problem=$PROBLEM \\\n", + " --model=$MODEL \\\n", + " --hparams_set=$HPARAMS \\\n", + " --output_dir=$TRAIN_DIR \\\n", + " --decode_hparams=\"beam_size=1,alpha=$ALPHA\" \\\n", + " --decode_from_file=\"inputs.en\" \\\n", + " --decode_to_file=\"outputs.fr\"\n", + "\n", + "# See the translations\n", + "!cat outputs.fr" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sGOC25N4dWdM", + "colab_type": "text" + }, + "source": [ + "##5.2. Predict with code" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "S6u4QmhPIbDx", + "colab_type": "code", + "colab": {} + }, + "source": [ + "import tensorflow as tf\n", + "\n", + "#After training the model, re-run the environment but run this code in first, then predict.\n", + "\n", + "tfe = tf.contrib.eager\n", + "tfe.enable_eager_execution()\n", + "Modes = tf.estimator.ModeKeys" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "PaCkILfjz9x3", + "colab_type": "code", + "colab": {} + }, + "source": [ + "#Config\n", + "\n", + "from tensor2tensor import models\n", + "from tensor2tensor import problems\n", + "from tensor2tensor.layers import common_layers\n", + "from tensor2tensor.utils import trainer_lib\n", + "from tensor2tensor.utils import t2t_model\n", + "from tensor2tensor.utils import registry\n", + "from tensor2tensor.utils import metrics\n", + "import numpy as np\n", + "\n", + "enfr_problem = problems.problem(PROBLEM)\n", + "\n", + "# Copy the vocab file locally so we can encode inputs and decode model outputs\n", + "vocab_name = \"vocab.translate_enfr_wmt32k.32768.subwords\"\n", + "vocab_file = os.path.join(DATA_DIR, vocab_name)\n", + "\n", + "# Get the encoders from the problem\n", + "encoders = enfr_problem.feature_encoders(DATA_DIR)\n", + "\n", + "ckpt_path = tf.train.latest_checkpoint(os.path.join(TRAIN_DIR))\n", + "print(ckpt_path)\n", + "\n", + "def translate(inputs):\n", + " encoded_inputs = encode(inputs)\n", + " with tfe.restore_variables_on_create(ckpt_path):\n", + " model_output = translate_model.infer(encoded_inputs)[\"outputs\"]\n", + " return decode(model_output)\n", + "\n", + "def encode(input_str, output_str=None):\n", + " \"\"\"Input str to features dict, ready for inference\"\"\"\n", + " inputs = encoders[\"inputs\"].encode(input_str) + [1] # add EOS id\n", + " batch_inputs = tf.reshape(inputs, [1, -1, 1]) # Make it 3D.\n", + " return {\"inputs\": batch_inputs}\n", + "\n", + "def decode(integers):\n", + " \"\"\"List of ints to str\"\"\"\n", + " integers = list(np.squeeze(integers))\n", + " if 1 in integers:\n", + " integers = integers[:integers.index(1)]\n", + " return encoders[\"inputs\"].decode(np.squeeze(integers))" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "5zE8yHLUA2He", + "colab_type": "code", + "colab": {} + }, + "source": [ + "#Predict \n", + "\n", + "hparams = trainer_lib.create_hparams(HPARAMS, data_dir=DATA_DIR, problem_name=PROBLEM)\n", + "translate_model = registry.model(MODEL)(hparams, Modes.PREDICT)\n", + "\n", + "inputs = \"the aniamal didn't cross the river because it was too tired\"\n", + "ref = \"l'animal n'a pas traversé la rue parcequ'il etait trop fatigué\" ## this just a reference for evaluate the quality of the traduction\n", + "outputs = translate(inputs)\n", + "\n", + "print(\"Inputs: %s\" % inputs)\n", + "print(\"Outputs: %s\" % outputs)\n", + "\n", + "file_input = open(\"outputs.fr\",\"w+\")\n", + "file_input.write(outputs)\n", + "file_input.close()\n", + "\n", + "file_output = open(\"reference.fr\",\"w+\")\n", + "file_output.write(ref)\n", + "file_output.close()" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "y6jbQ6FoRsmG", + "colab_type": "text" + }, + "source": [ + "##5.3. Evaluate the BLEU Score\n", + "BLEU score for a sequence translation: https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/bin/t2t_bleu.py#L24" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "il2oevmXRrbf", + "colab_type": "code", + "colab": {} + }, + "source": [ + "!t2t-bleu \\\n", + " --translation=outputs.fr \\\n", + " --reference=reference.fr" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FXegHzD1I67e", + "colab_type": "text" + }, + "source": [ + "#6. Attention visualization\n", + "We need to have a predicted sentence with code." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ISHauPT8I-3S", + "colab_type": "text" + }, + "source": [ + "##6.1. Attention utils\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "2RHCTrc9I55K", + "colab_type": "code", + "colab": {} + }, + "source": [ + "from tensor2tensor.visualization import attention\n", + "from tensor2tensor.data_generators import text_encoder\n", + "\n", + "SIZE = 35\n", + "\n", + "def encode_eval(input_str, output_str):\n", + " inputs = tf.reshape(encoders[\"inputs\"].encode(input_str) + [1], [1, -1, 1, 1]) # Make it 3D.\n", + " outputs = tf.reshape(encoders[\"inputs\"].encode(output_str) + [1], [1, -1, 1, 1]) # Make it 3D.\n", + " return {\"inputs\": inputs, \"targets\": outputs}\n", + "\n", + "def get_att_mats():\n", + " enc_atts = []\n", + " dec_atts = []\n", + " encdec_atts = []\n", + "\n", + " for i in range(hparams.num_hidden_layers):\n", + " enc_att = translate_model.attention_weights[\n", + " \"transformer/body/encoder/layer_%i/self_attention/multihead_attention/dot_product_attention\" % i][0]\n", + " dec_att = translate_model.attention_weights[\n", + " \"transformer/body/decoder/layer_%i/self_attention/multihead_attention/dot_product_attention\" % i][0]\n", + " encdec_att = translate_model.attention_weights[\n", + " \"transformer/body/decoder/layer_%i/encdec_attention/multihead_attention/dot_product_attention\" % i][0]\n", + " enc_atts.append(resize(enc_att))\n", + " dec_atts.append(resize(dec_att))\n", + " encdec_atts.append(resize(encdec_att))\n", + " return enc_atts, dec_atts, encdec_atts\n", + "\n", + "def resize(np_mat):\n", + " # Sum across heads\n", + " np_mat = np_mat[:, :SIZE, :SIZE]\n", + " row_sums = np.sum(np_mat, axis=0)\n", + " # Normalize\n", + " layer_mat = np_mat / row_sums[np.newaxis, :]\n", + " lsh = layer_mat.shape\n", + " # Add extra dim for viz code to work.\n", + " layer_mat = np.reshape(layer_mat, (1, lsh[0], lsh[1], lsh[2]))\n", + " return layer_mat\n", + "\n", + "def to_tokens(ids):\n", + " ids = np.squeeze(ids)\n", + " subtokenizer = hparams.problem_hparams.vocabulary['targets']\n", + " tokens = []\n", + " for _id in ids:\n", + " if _id == 0:\n", + " tokens.append('')\n", + " elif _id == 1:\n", + " tokens.append('')\n", + " elif _id == -1:\n", + " tokens.append('')\n", + " else:\n", + " tokens.append(subtokenizer._subtoken_id_to_subtoken_string(_id))\n", + " return tokens\n", + "\n", + "def call_html():\n", + " import IPython\n", + " display(IPython.core.display.HTML('''\n", + " \n", + " \n", + " '''))" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9PGwUbJuJHJS", + "colab_type": "text" + }, + "source": [ + "##6.2 Display Attention" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ijTOlrt8JI4t", + "colab_type": "code", + "colab": {} + }, + "source": [ + "import numpy as np\n", + "\n", + "# Convert inputs and outputs to subwords\n", + "\n", + "inp_text = to_tokens(encoders[\"inputs\"].encode(inputs))\n", + "out_text = to_tokens(encoders[\"inputs\"].encode(outputs))\n", + "\n", + "hparams = trainer_lib.create_hparams(HPARAMS, data_dir=DATA_DIR, problem_name=PROBLEM)\n", + "\n", + "# Run eval to collect attention weights\n", + "example = encode_eval(inputs, outputs)\n", + "with tfe.restore_variables_on_create(tf.train.latest_checkpoint(ckpt_path)):\n", + " translate_model.set_mode(Modes.EVAL)\n", + " translate_model(example)\n", + "# Get normalized attention weights for each layer\n", + "enc_atts, dec_atts, encdec_atts = get_att_mats()\n", + "\n", + "call_html()\n", + "attention.show(inp_text, out_text, enc_atts, dec_atts, encdec_atts)" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "r8yAQUDZdm1p", + "colab_type": "text" + }, + "source": [ + "#7. Export the model\n", + "For more information: https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/serving" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "c2yulC7J8_I9", + "colab_type": "code", + "colab": {} + }, + "source": [ + "#export Model\n", + "!t2t-exporter \\\n", + " --data_dir=$DATA_DIR \\\n", + " --output_dir=$TRAIN_DIR \\\n", + " --problem=$PROBLEM \\\n", + " --model=$MODEL \\\n", + " --hparams_set=$HPARAMS \\\n", + " --decode_hparams=\"beam_size=1,alpha=$ALPHA\" \\\n", + " --export_dir=$EXPORT_DIR" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2ltjEr3JX5-e", + "colab_type": "text" + }, + "source": [ + "#8.Load pretrained model from Google Storage\n", + "We use the pretrained model En-De translation." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QgY3Fw261bZC", + "colab_type": "text" + }, + "source": [ + "##8.1. See existing content storaged" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "7P7aJClG0t8c", + "colab_type": "code", + "colab": {} + }, + "source": [ + "print(\"checkpoint: \")\n", + "!gsutil ls \"gs://tensor2tensor-checkpoints\"\n", + "\n", + "print(\"data: \")\n", + "!gsutil ls \"gs://tensor2tensor-data\"" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wP8jrR5bbu7e", + "colab_type": "text" + }, + "source": [ + "##8.2. Init model" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "AnYU7lrazkMm", + "colab_type": "code", + "colab": {} + }, + "source": [ + "PROBLEM_PRETRAINED = \"translate_ende_wmt32k\"\n", + "MODEL_PRETRAINED = \"transformer\" \n", + "HPARAMS_PRETRAINED = \"transformer_base\"" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DTgPvq4q1VAr", + "colab_type": "text" + }, + "source": [ + "##8.3. Load content from google storage" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "FrxOAVcyinll", + "colab_type": "code", + "colab": {} + }, + "source": [ + "import tensorflow as tf\n", + "import os\n", + "\n", + "\n", + "DATA_DIR_PRETRAINED = os.path.expanduser(\"/t2t/data_pretrained\")\n", + "CHECKPOINT_DIR_PRETRAINED = os.path.expanduser(\"/t2t/checkpoints_pretrained\")\n", + "\n", + "tf.gfile.MakeDirs(DATA_DIR_PRETRAINED)\n", + "tf.gfile.MakeDirs(CHECKPOINT_DIR_PRETRAINED)\n", + "\n", + "\n", + "gs_data_dir = \"gs://tensor2tensor-data/\"\n", + "vocab_name = \"vocab.translate_ende_wmt32k.32768.subwords\"\n", + "vocab_file = os.path.join(gs_data_dir, vocab_name)\n", + "\n", + "gs_ckpt_dir = \"gs://tensor2tensor-checkpoints/\"\n", + "ckpt_name = \"transformer_ende_test\"\n", + "gs_ckpt = os.path.join(gs_ckpt_dir, ckpt_name)\n", + "\n", + "TRAIN_DIR_PRETRAINED = os.path.join(CHECKPOINT_DIR_PRETRAINED, ckpt_name)\n", + "\n", + "!gsutil cp {vocab_file} {DATA_DIR_PRETRAINED}\n", + "!gsutil -q cp -R {gs_ckpt} {CHECKPOINT_DIR_PRETRAINED}\n", + "\n", + "CHECKPOINT_NAME_PRETRAINED = tf.train.latest_checkpoint(TRAIN_DIR_PRETRAINED) # for translate with code\n" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LP6cro9Xbygf", + "colab_type": "text" + }, + "source": [ + "##8.4. Translate" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "CBoNpy5HbzoF", + "colab_type": "code", + "colab": {} + }, + "source": [ + "!echo \"the business of the house\" > \"inputs.en\"\n", + "!echo -e \"das Geschäft des Hauses\" > \"reference.de\"\n", + "\n", + "!t2t-decoder \\\n", + " --data_dir=$DATA_DIR_PRETRAINED \\\n", + " --problem=$PROBLEM_PRETRAINED \\\n", + " --model=$MODEL_PRETRAINED \\\n", + " --hparams_set=$HPARAMS_PRETRAINED \\\n", + " --output_dir=$TRAIN_DIR_PRETRAINED \\\n", + " --decode_hparams=\"beam_size=1\" \\\n", + " --decode_from_file=\"inputs.en\" \\\n", + " --decode_to_file=\"outputs.de\"\n", + "\n", + "# See the translations\n", + "!cat outputs.de\n", + "\n", + "!t2t-bleu \\\n", + " --translation=outputs.de \\\n", + " --reference=reference.de" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bKI4WF0DgoFd", + "colab_type": "text" + }, + "source": [ + "#9. Add your dataset/problem\n", + "To add a new dataset/problem, subclass Problem and register it with @registry.register_problem. See TranslateEnfrWmt8k for an example: \n", + "https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/translate_enfr.py\n", + "\n", + "---\n", + "Adding your own components: https://github.com/tensorflow/tensor2tensor#adding-your-own-components\n", + "\n", + "---\n", + "\n", + "See this example: https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/test_data/example_usr_dir" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "mB1SIrJNqy1N", + "colab_type": "code", + "colab": {} + }, + "source": [ + "from tensor2tensor.utils import registry\n", + "\n", + "@registry.register_problem\n", + "class MyTranslateEnFr(translate_enfr.TranslateEnfrWmt8k):\n", + "\n", + " def generator(self, data_dir, tmp_dir, train):\n", + " #your code" + ], + "execution_count": 0, + "outputs": [] + } + ] +} \ No newline at end of file