|
1 | 1 | { |
2 | 2 | "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": {}, |
| 6 | + "source": [ |
| 7 | + "# Recommendation Model with Approximate Item Matching\n", |
| 8 | + "\n", |
| 9 | + "This notebook shows how to train a simple Neural Collaborative Filtering model for recommeding movies to users. We also show how learnt movie embeddings are stored in an appoximate similarity matching index, using Spotify's [Annoy library](https://github.com/spotify/annoy), so that we can quickly find and recommend the most relevant movies to a given customer. We show how this index to search for similar movies.\n", |
| 10 | + "\n", |
| 11 | + "In essense, this tutorial works as follows:\n", |
| 12 | + "1. Download the movielens dataset.\n", |
| 13 | + "2. Train a simple Neural Collaborative Model using TensorFlow custom estimator.\n", |
| 14 | + "3. Extract the learnt movie embeddings.\n", |
| 15 | + "4. Build an approximate similarity matching index for the movie embeddings.\n", |
| 16 | + "5. Export the trained model, which receives a user Id, and output the user embedding.\n", |
| 17 | + "\n", |
| 18 | + "The recommendation is served as follows:\n", |
| 19 | + "1. Receives a user Id\n", |
| 20 | + "2. Get the user embedding from the exported model\n", |
| 21 | + "3. Find the similar movie embeddings to the user embedding in the index\n", |
| 22 | + "4. Return the movie Ids of these embeddings to recommend\n", |
| 23 | + "\n", |
| 24 | + "<a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/tf-estimator-tutorials/blob/master/Experimental/Movielens%20Recommendation.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" |
| 25 | + ] |
| 26 | + }, |
| 27 | + { |
| 28 | + "cell_type": "markdown", |
| 29 | + "metadata": {}, |
| 30 | + "source": [ |
| 31 | + "## Setup" |
| 32 | + ] |
| 33 | + }, |
| 34 | + { |
| 35 | + "cell_type": "code", |
| 36 | + "execution_count": null, |
| 37 | + "metadata": {}, |
| 38 | + "outputs": [], |
| 39 | + "source": [ |
| 40 | + "!pip install annoy" |
| 41 | + ] |
| 42 | + }, |
3 | 43 | { |
4 | 44 | "cell_type": "code", |
5 | 45 | "execution_count": 1, |
|
32 | 72 | "cell_type": "markdown", |
33 | 73 | "metadata": {}, |
34 | 74 | "source": [ |
35 | | - "## Download Data" |
| 75 | + "## 1. Download Data" |
36 | 76 | ] |
37 | 77 | }, |
38 | 78 | { |
|
373 | 413 | "cell_type": "markdown", |
374 | 414 | "metadata": {}, |
375 | 415 | "source": [ |
376 | | - "## Define Metadata" |
| 416 | + "## 2. Build the TensorFlow Model" |
| 417 | + ] |
| 418 | + }, |
| 419 | + { |
| 420 | + "cell_type": "markdown", |
| 421 | + "metadata": {}, |
| 422 | + "source": [ |
| 423 | + "### 2.1 Define Metadata" |
377 | 424 | ] |
378 | 425 | }, |
379 | 426 | { |
|
393 | 440 | "cell_type": "markdown", |
394 | 441 | "metadata": {}, |
395 | 442 | "source": [ |
396 | | - "## Define Data Input Function" |
| 443 | + "### 2.2 Define Data Input Function" |
397 | 444 | ] |
398 | 445 | }, |
399 | 446 | { |
400 | 447 | "cell_type": "code", |
401 | | - "execution_count": 18, |
| 448 | + "execution_count": null, |
402 | 449 | "metadata": {}, |
403 | 450 | "outputs": [], |
404 | 451 | "source": [ |
|
418 | 465 | " num_epochs=num_epochs,\n", |
419 | 466 | " shuffle= (mode==tf.estimator.ModeKeys.TRAIN)\n", |
420 | 467 | " )\n", |
421 | | - " \n", |
422 | | - " iterator = dataset.make_one_shot_iterator()\n", |
423 | | - " features, target = iterator.get_next()\n", |
424 | | - " return features, target\n", |
| 468 | + " return dataset\n", |
425 | 469 | " \n", |
426 | 470 | " return _input_fn" |
427 | 471 | ] |
|
430 | 474 | "cell_type": "markdown", |
431 | 475 | "metadata": {}, |
432 | 476 | "source": [ |
433 | | - "## Create Feature Columns" |
| 477 | + "### 2.3 Create Feature Columns" |
434 | 478 | ] |
435 | 479 | }, |
436 | 480 | { |
|
466 | 510 | "cell_type": "markdown", |
467 | 511 | "metadata": {}, |
468 | 512 | "source": [ |
469 | | - "## Define Model Function" |
| 513 | + "### 2.4 Define Model Function" |
470 | 514 | ] |
471 | 515 | }, |
472 | 516 | { |
|
506 | 550 | " mode=mode,\n", |
507 | 551 | " loss=loss,\n", |
508 | 552 | " train_op=train_op\n", |
509 | | - " )\n" |
| 553 | + " )" |
510 | 554 | ] |
511 | 555 | }, |
512 | 556 | { |
513 | 557 | "cell_type": "markdown", |
514 | 558 | "metadata": {}, |
515 | 559 | "source": [ |
516 | | - "## Create Estimator" |
| 560 | + "### 2.5 Create Estimator" |
517 | 561 | ] |
518 | 562 | }, |
519 | 563 | { |
|
537 | 581 | "cell_type": "markdown", |
538 | 582 | "metadata": {}, |
539 | 583 | "source": [ |
540 | | - "## Define Experiment" |
| 584 | + "### 2.6 Define Experiment" |
541 | 585 | ] |
542 | 586 | }, |
543 | 587 | { |
|
612 | 656 | "cell_type": "markdown", |
613 | 657 | "metadata": {}, |
614 | 658 | "source": [ |
615 | | - "## Run Experiment with Parameters" |
| 659 | + "### 2.7 Run Experiment with Parameters" |
616 | 660 | ] |
617 | 661 | }, |
618 | 662 | { |
|
710 | 754 | "cell_type": "markdown", |
711 | 755 | "metadata": {}, |
712 | 756 | "source": [ |
713 | | - "## Extract Movie Embeddings " |
| 757 | + "## 3. Extract Movie Embeddings " |
714 | 758 | ] |
715 | 759 | }, |
716 | 760 | { |
|
766 | 810 | "cell_type": "markdown", |
767 | 811 | "metadata": {}, |
768 | 812 | "source": [ |
769 | | - "## Build Annoy Index" |
| 813 | + "## 4. Build Annoy Index" |
770 | 814 | ] |
771 | 815 | }, |
772 | 816 | { |
|
1145 | 1189 | "cell_type": "markdown", |
1146 | 1190 | "metadata": {}, |
1147 | 1191 | "source": [ |
1148 | | - "## Export the Model\n", |
| 1192 | + "## 5. Export the Model\n", |
1149 | 1193 | "This needed to receive a userId and produce the embedding for the user." |
1150 | 1194 | ] |
1151 | 1195 | }, |
|
1234 | 1278 | "print(output)" |
1235 | 1279 | ] |
1236 | 1280 | }, |
| 1281 | + { |
| 1282 | + "cell_type": "markdown", |
| 1283 | + "metadata": {}, |
| 1284 | + "source": [ |
| 1285 | + "## Serve Movie Recommendations to a User" |
| 1286 | + ] |
| 1287 | + }, |
1237 | 1288 | { |
1238 | 1289 | "cell_type": "code", |
1239 | 1290 | "execution_count": 190, |
|
1276 | 1327 | ] |
1277 | 1328 | }, |
1278 | 1329 | { |
1279 | | - "cell_type": "code", |
1280 | | - "execution_count": null, |
| 1330 | + "cell_type": "markdown", |
1281 | 1331 | "metadata": {}, |
1282 | | - "outputs": [], |
1283 | | - "source": [] |
| 1332 | + "source": [ |
| 1333 | + "## License" |
| 1334 | + ] |
| 1335 | + }, |
| 1336 | + { |
| 1337 | + "cell_type": "markdown", |
| 1338 | + "metadata": {}, |
| 1339 | + "source": [ |
| 1340 | + "---\n", |
| 1341 | + "\n", |
| 1342 | + "Author: Khalid Salama\n", |
| 1343 | + "\n", |
| 1344 | + "\n", |
| 1345 | + "---\n", |
| 1346 | + "***Disclaimer***: This is not an official Google product. This sample code provided for an educational purpose.\n", |
| 1347 | + "\n", |
| 1348 | + "---\n", |
| 1349 | + "\n", |
| 1350 | + "Copyright 2019 Google LLC\n", |
| 1351 | + "\n", |
| 1352 | + "Licensed under the Apache License, Version 2.0 (the \"License\");\n", |
| 1353 | + "you may not use this file except in compliance with the License.\n", |
| 1354 | + "You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.\n", |
| 1355 | + "\n", |
| 1356 | + "Unless required by applicable law or agreed to in writing, software\n", |
| 1357 | + "distributed under the License is distributed on an \"AS IS\" BASIS,\n", |
| 1358 | + "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", |
| 1359 | + "See the License for the specific language governing permissions and\n", |
| 1360 | + "limitations under the License.\n", |
| 1361 | + "\n", |
| 1362 | + "\n", |
| 1363 | + "---\n", |
| 1364 | + "\n", |
| 1365 | + "\n" |
| 1366 | + ] |
1284 | 1367 | } |
1285 | 1368 | ], |
1286 | 1369 | "metadata": { |
|
0 commit comments