{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# CNTK 303: Deep Structured Semantic Modeling with LSTM Networks\n", "\n", "DSSM stands for Deep Structured Semantic Model, or more general, Deep Semantic Similarity Model. DSSM, developed by the MSR Deep Learning Technology Center(DLTC), is a deep neural network (DNN) modeling technique for representing text strings (sentences, queries, predicates, entity mentions, etc.) in a continuous semantic space and modeling semantic similarity between two text strings (e.g., Sent2Vec). DSSM has wide applications including information retrieval and web search ranking ([Huang et al. 2013](https://www.microsoft.com/en-us/research/publication/learning-deep-structured-semantic-models-for-web-search-using-clickthrough-data/); [Shen et al. 2014a](https://www.microsoft.com/en-us/research/publication/learning-semantic-representations-using-convolutional-neural-networks-for-web-search/),[2014b](https://www.microsoft.com/en-us/research/publication/a-latent-semantic-model-with-convolutional-pooling-structure-for-information-retrieval/); [Palangi et al. 2016](https://www.microsoft.com/en-us/research/publication/deep-sentence-embedding-using-long-short-term-memory-networks-analysis-application-information-retrieval/)), ad selection/relevance, contextual entity search and interestingness tasks ([Gao et al. 2014a](https://www.microsoft.com/en-us/research/publication/modeling-interestingness-with-deep-neural-networks/), question answering ([Yih et al., 2014](https://www.microsoft.com/en-us/research/publication/semantic-parsing-for-single-relation-question-answering/)), image captioning ([Fang et al., 2014](https://arxiv.org/abs/1411.4952)), and machine translation ([Gao et al., 2014b](https://www.microsoft.com/en-us/research/publication/learning-continuous-phrase-representations-for-translation-modeling/)) etc. \n", "\n", "DSSM can be used to develop latent semantic models that project entities of different types (e.g., queries and documents) into a common low-dimensional semantic space for a variety of machine learning tasks such as ranking and classification. For example, in web search ranking, the relevance of a document given a query can be readily computed as the distance between them in that space. With the latest GPUs from Nvidia, we can train our models on billions of words. Readers that are interested in deep learning for text processing may refer to the tutorial by [He et al., 2014](https://www.microsoft.com/en-us/research/publication/deep-learning-for-natural-language-processing-theory-and-practice-tutorial/).\n", "We released the predictors and trained model files of the DSSM (also a.k.a. Sent2Vec).\n", "\n", "## Goal\n", "\n", "To develop mechanism such that given a pair of documents say a query and a set of web page documents, the model would map the inputs to a pair of feature vectors in a continuous, low dimensional space where one could compare the semantic similarity between the text strings using the cosine similarity between their vectors in that space. \n", "\n", "![](http://kubicode.me/img/Study-With-Deep-Structured-Semantic-Model/dssm_arch.png)\n", "\n", "In the figure above one can see how given a query ($Q$) and set of documents ($D_1, D_2, \\ldots, D_n$), one can generate latent representation a.k.a. semantic features, which can then be used to generate pairwise distance metric. The metric evaluated can be used for ranking." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the picture above, one can see that the query and the document are each mapped to a term vector. While a [bag of word](https://en.wikipedia.org/wiki/Bag-of-words_model) based modeling is a first step one takes while building NLP models, they are limited in their ability to capture relative positions amongst words. Convolution based, or recurrence based models perform better due to their inherent ability to leverage the positions of words. In this tutorial, we will use a simple illustrative model using LSTM to encode the term vector following the work done by [Palangi et. al.](https://www.microsoft.com/en-us/research/wp-content/uploads/2017/02/LSTM_DSSM_IEEE_TASLP.pdf). \n", "\n", "In this tutorial, we show you how to build such a network. We use a small sample from the Question-Answering corpus. Additionally we will use a recurrent network to develop the semantic model as it allows to inherently incorporate the positional information with the word tokens. \n", "\n", "**Note**: The data set is very small and the emphasis of this tutorial is in showing how to create an end-to-end modeling workflow for the DSSM network and not so much on the specific numerical performance we are able to get on this small data set. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Import the relevant libraries\n", "import math\n", "import numpy as np\n", "import os\n", "from __future__ import print_function # Use a function definition from future version (say 3.x from 2.7 interpreter)\n", "\n", "import cntk as C\n", "import cntk.tests.test_utils\n", "cntk.tests.test_utils.set_device_from_pytest_env() # (only needed for our build system)\n", "C.cntk_py.set_fixed_random_seed(1) # fix a random seed for CNTK components" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Preparation\n", "\n", "### Download\n", "\n", "We use a sampling of the Question Answering data set for illustrating how to model DSSM networks. The data set consists of pair of sentences with [Questions and Answers](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/ACL15-STAGG.pdf). In this tutorial, we have preprocessed the data into two parts:\n", "- Vocabulary files: 1 file each for question and answers. There are 1204 and 1019 words in the question and answers vocabulary, respectively.\n", "- QA files: 1 file each for training and validation data (hold-out) where each of the files are converted in the [CTF format](https://cntk.ai/pythondocs/CNTK_202_Language_Understanding.html). The training and validation files have 3500 and 409 sentence pairs respectively.\n", "\n", "Note: a small portion of the original data was provided by the author of the paper for creating an exemplar network for illustration purposes." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Starting download: vocab_A.wl\n", "http://www.cntk.ai/jup/dat/DSSM/vocab_A.wl.csv\n", "Download completed\n", "Starting download: train.pair.tok.ctf\n", "http://www.cntk.ai/jup/dat/DSSM/train.pair.tok.ctf.csv\n", "Download completed\n", "Starting download: valid.pair.tok.ctf\n", "http://www.cntk.ai/jup/dat/DSSM/valid.pair.tok.ctf.csv\n", "Download completed\n", "Starting download: vocab_Q.wl\n", "http://www.cntk.ai/jup/dat/DSSM/vocab_Q.wl.csv\n", "Download completed\n" ] } ], "source": [ "location = os.path.normpath('data/DSSM')\n", "data = {\n", " 'train': { 'file': 'train.pair.tok.ctf' },\n", " 'val':{ 'file': 'valid.pair.tok.ctf' },\n", " 'query': { 'file': 'vocab_Q.wl' },\n", " 'answer': { 'file': 'vocab_A.wl' }\n", "}\n", "\n", "import requests\n", "\n", "def download(url, filename):\n", " \"\"\" utility function to download a file \"\"\"\n", " response = requests.get(url, stream=True)\n", " with open(filename, \"wb\") as handle:\n", " for data in response.iter_content():\n", " handle.write(data)\n", "\n", "if not os.path.exists(location):\n", " os.mkdir(location)\n", " \n", "for item in data.values():\n", " path = os.path.normpath(os.path.join(location, item['file']))\n", "\n", " if os.path.exists(path):\n", " print(\"Reusing locally cached:\", path)\n", " \n", " else:\n", " print(\"Starting download:\", item['file'])\n", " url = \"http://www.cntk.ai/jup/dat/DSSM/%s.csv\"%(item['file'])\n", " print(url)\n", " download(url, path)\n", " print(\"Download completed\")\n", " item['file'] = path" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reader\n", "\n", "We will be using the CTF deserializer to read the input data. However, one can write their own readers or use numpy arrays to provide data into CNTK modeling workflow. You may want to open the CTF files with a text editor to parse the input. Note, the CTF deserializer has the capability to scale across production scale data sizes spanning mulitple disks. The reader also abstracts the randomization of the large scale with a simple flag, an added convenience and time savings for the programmer. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Define the vocabulary size (QRY-stands for question and ANS stands for answer)\n", "QRY_SIZE = 1204\n", "ANS_SIZE = 1019\n", "\n", "def create_reader(path, is_training):\n", " return C.io.MinibatchSource(C.io.CTFDeserializer(path, C.io.StreamDefs(\n", " query = C.io.StreamDef(field='S0', shape=QRY_SIZE, is_sparse=True),\n", " answer = C.io.StreamDef(field='S1', shape=ANS_SIZE, is_sparse=True)\n", " )), randomize=is_training, max_sweeps = C.io.INFINITELY_REPEAT if is_training else 1)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "data\\DSSM\\train.pair.tok.ctf\n", "data\\DSSM\\valid.pair.tok.ctf\n" ] } ], "source": [ "train_file = data['train']['file']\n", "print(train_file)\n", "\n", "if os.path.exists(train_file):\n", " train_source = create_reader(train_file, is_training=True)\n", "else:\n", " raise ValueError(\"Cannot locate file {0} in current directory {1}\".format(train_file, os.getcwd()))\n", "\n", "validation_file = data['val']['file']\n", "print(validation_file)\n", "if os.path.exists(validation_file):\n", " val_source = create_reader(validation_file, is_training=False)\n", "else:\n", " raise ValueError(\"Cannot locate file {0} in current directory {1}\".format(validation_file, os.getcwd()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Model creation\n", "\n", "The proposed LSTM-RNN model sequentially takes each word in a sentence, extracts its information, and embeds it into a semantic vector. Due to its ability to capture long term memory, the LSTM-RNN accumulates increasingly richer information as it goes through the sentence, and when it reaches the last word, the hidden layer of the network provides a semantic representation of the whole sentence. The `last` block is then projected to a `query_vector` space, also referred to semantic feature in the figure above.\n", "\n", "\n", " \"query vector\"\n", " ^\n", " |\n", " +-------+ \n", " | Dense | \n", " +-------+ \n", " ^ \n", " | \n", " +---------+ \n", " | Dropout | \n", " +---------+\n", " ^\n", " | \n", " +-------+ \n", " | Dense | \n", " +-------+ \n", " ^ \n", " | \n", " +------+ \n", " | last | \n", " +------+ \n", " ^ \n", " | \n", " +------+ +------+ +------+ +------+ +------+ \n", " 0 -->| LSTM |-->| LSTM |-->| LSTM |-->| LSTM |-->| LSTM |\n", " +------+ +------+ +------+ +------+ +------+ \n", " ^ ^ ^ ^ ^\n", " | | | | |\n", " +-------+ +-------+ +-------+ +-------+ +-------+\n", " | Embed | | Embed | | Embed | | Embed | | Embed | \n", " +-------+ +-------+ +-------+ +-------+ +-------+\n", " ^ ^ ^ ^ ^\n", " | | | | |\n", " query ------>+--------->+--------->+--------->+--------->+\n", " \n", " \n", " Similarly we can project the answer sentence to `answer_vector`. However, before we create our model. Let us define the input variables for our model. Note, there is a query and paired with it there is an answer. Given both of these are a sequence of words we define " ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Create the containers for input feature (x) and the label (y)\n", "qry = C.sequence.input_variable(QRY_SIZE)\n", "ans = C.sequence.input_variable(ANS_SIZE)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Notice**: Do you smell any problem with the aforementioned statements. If you want to see what would happen if you were to go with the declarations above, please comment out the 4 statements below and run the model. You will find that your model throws an exception. The details of the exception is explained [here](https://cntk.ai/pythondocs/Manual_How_to_debug.html#Runtime-errors).\n", "\n", "Each sequence in CNTK, is associated with a dynamic axis representing the number of words in the sequence. Intuitively, when you have sequences of different sizes and vocabularies, each of them need to have their own dynamic axis. This is facilitated by declaring the input data containers with a named axis. Strictly speaking you could name just one, the other one would be a default dynamic axis. However, for clarity we name the two axis separately." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Create the containers for input feature (x) and the label (y)\n", "axis_qry = C.Axis.new_unique_dynamic_axis('axis_qry')\n", "qry = C.sequence.input_variable(QRY_SIZE, sequence_axis=axis_qry)\n", "\n", "axis_ans = C.Axis.new_unique_dynamic_axis('axis_ans')\n", "ans = C.sequence.input_variable(ANS_SIZE, sequence_axis=axis_ans)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before we can create the model we need to specify a few parameters associated with the network architecture." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "EMB_DIM = 25 # Embedding dimension\n", "HIDDEN_DIM = 50 # LSTM dimension\n", "DSSM_DIM = 25 # Dense layer dimension \n", "NEGATIVE_SAMPLES = 5\n", "DROPOUT_RATIO = 0.2" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def create_model(qry, ans):\n", " with C.layers.default_options(initial_state=0.1):\n", " qry_vector = C.layers.Sequential([\n", " C.layers.Embedding(EMB_DIM, name='embed'),\n", " C.layers.Recurrence(C.layers.LSTM(HIDDEN_DIM), go_backwards=False),\n", " C.sequence.last,\n", " C.layers.Dense(DSSM_DIM, activation=C.relu, name='q_proj'),\n", " C.layers.Dropout(DROPOUT_RATIO, name='dropout qdo1'),\n", " C.layers.Dense(DSSM_DIM, activation=C.tanh, name='q_enc')\n", " ])\n", " \n", " ans_vector = C.layers.Sequential([\n", " C.layers.Embedding(EMB_DIM, name='embed'),\n", " C.layers.Recurrence(C.layers.LSTM(HIDDEN_DIM), go_backwards=False),\n", " C.sequence.last,\n", " C.layers.Dense(DSSM_DIM, activation=C.relu, name='a_proj'),\n", " C.layers.Dropout(DROPOUT_RATIO, name='dropout ado1'),\n", " C.layers.Dense(DSSM_DIM, activation=C.tanh, name='a_enc')\n", " ])\n", "\n", " return {\n", " 'query_vector': qry_vector(qry),\n", " 'answer_vector': ans_vector(ans)\n", " }\n", "\n", "# Create the model and store reference in `network` dictionary\n", "network = create_model(qry, ans)\n", "\n", "network['query'], network['axis_qry'] = qry, axis_qry\n", "network['answer'], network['axis_ans'] = ans, axis_ans" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training\n", "\n", "Now that we have created a network, the next step is to find a suitable loss function where if a `question` is paired with the correct `answer`, the loss would be 0 else it would be 1. In other words, this loss should maximize the similarity (dot  product) between the answer vector which appears close to the answer vector and minimize the similarity of between the answer and question vector that do not answer each other.  \n", "\n", "The use cases of DSSM often appear in information retrieval where for a given query or question there are few answers amongst an ocean of poor or non-answers. The input data as in this case is a pair of query and answer (document or advertisement) that attracted a click. A classical way to train would be a a binary classifier to predict click / no-click (or equivalently a 2-class classifier - one class each for click or no click). One could generate pairs of query and incorrect answers (as no-click data). However, one way to simulate no-click data is to use query and answers for other queries within a minibatch. This is the concept behind `cosine_distance_with_negative_samples` function. Note: This function returns 1 for correct the question and answer pair and 0 for incorrect, which is referred to as *similarity*. Hence, we use 1- `cosine_distance_with_negative_samples` as our loss function." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def create_loss(vector_a, vector_b):\n", " qry_ans_similarity = C.cosine_distance_with_negative_samples(vector_a, \\\n", " vector_b, \\\n", " shift=1, \\\n", " num_negative_samples=5)\n", " return 1 - qry_ans_similarity" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Model parameters\n", "MAX_EPOCHS = 5\n", "EPOCH_SIZE = 10000\n", "MINIBATCH_SIZE = 50" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Create trainer\n", "def create_trainer(reader, network):\n", " \n", " # Setup the progress updater\n", " progress_writer = C.logging.ProgressPrinter(tag='Training', num_epochs=MAX_EPOCHS)\n", "\n", " # Set learning parameters\n", " lr_per_sample = [0.0015625]*20 + \\\n", " [0.00046875]*20 + \\\n", " [0.00015625]*20 + \\\n", " [0.000046875]*10 + \\\n", " [0.000015625]\n", " lr_schedule = C.learning_parameter_schedule_per_sample(lr_per_sample, \\\n", " epoch_size=EPOCH_SIZE)\n", " mms = [0]*20 + [0.9200444146293233]*20 + [0.9591894571091382]\n", " mm_schedule = C.learners.momentum_schedule(mms, \\\n", " epoch_size=EPOCH_SIZE, \\\n", " minibatch_size=MINIBATCH_SIZE)\n", " l2_reg_weight = 0.0002\n", "\n", " model = C.combine(network['query_vector'], network['answer_vector'])\n", "\n", " #Notify the network that the two dynamic axes are indeed same\n", " query_reconciled = C.reconcile_dynamic_axes(network['query_vector'], network['answer_vector'])\n", " \n", " network['loss'] = create_loss(query_reconciled, network['answer_vector'])\n", " network['error'] = None\n", "\n", " print('Using momentum sgd with no l2')\n", " dssm_learner = C.learners.momentum_sgd(model.parameters, lr_schedule, mm_schedule)\n", "\n", " network['learner'] = dssm_learner\n", " \n", " print('Using local learner')\n", " # Create trainer\n", " return C.Trainer(model, (network['loss'], network['error']), network['learner'], progress_writer) " ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using momentum sgd with no l2\n", "Using local learner\n" ] } ], "source": [ "# Instantiate the trainer\n", "trainer = create_trainer(train_source, network)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Train \n", "def do_train(network, trainer, train_source):\n", " # define mapping from intput streams to network inputs\n", " input_map = {\n", " network['query']: train_source.streams.query,\n", " network['answer']: train_source.streams.answer\n", " } \n", "\n", " t = 0\n", " for epoch in range(MAX_EPOCHS): # loop over epochs\n", " epoch_end = (epoch+1) * EPOCH_SIZE\n", " while t < epoch_end: # loop over minibatches on the epoch\n", " data = train_source.next_minibatch(MINIBATCH_SIZE, input_map= input_map) # fetch minibatch\n", " trainer.train_minibatch(data) # update model with it\n", " t += MINIBATCH_SIZE\n", "\n", " trainer.summarize_training_progress()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Learning rate per 1 samples: 0.0015625\n", "Momentum per 1 samples: 0.0\n", "Finished Epoch[1 of 5]: [Training] loss = 0.343046 * 1522, metric = 0.00% * 1522 5.720s (266.1 samples/s);\n", "Finished Epoch[2 of 5]: [Training] loss = 0.102804 * 1530, metric = 0.00% * 1530 3.464s (441.7 samples/s);\n", "Finished Epoch[3 of 5]: [Training] loss = 0.066461 * 1525, metric = 0.00% * 1525 3.402s (448.3 samples/s);\n", "Finished Epoch[4 of 5]: [Training] loss = 0.048511 * 1534, metric = 0.00% * 1534 3.390s (452.5 samples/s);\n", "Finished Epoch[5 of 5]: [Training] loss = 0.035384 * 1510, metric = 0.00% * 1510 3.383s (446.3 samples/s);\n" ] } ], "source": [ "do_train(network, trainer, train_source)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Validate\n", "\n", "Once the model is trained we want to select a model that has similar error with the validation (hold-out set) as the error with the training set. \n", "\n", "**Suggested Activity**: Vary the number of epochs and check the training and the validation error.\n", "\n", "The chosen model would then be used for prediction. " ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Validate\n", "def do_validate(network, val_source):\n", " # process minibatches and perform evaluation\n", " progress_printer = C.logging.ProgressPrinter(tag='Evaluation', num_epochs=0)\n", "\n", " val_map = {\n", " network['query']: val_source.streams.query,\n", " network['answer']: val_source.streams.answer\n", " } \n", "\n", " evaluator = C.eval.Evaluator(network['loss'], progress_printer)\n", "\n", " while True:\n", " minibatch_size = 100\n", " data = val_source.next_minibatch(minibatch_size, input_map=val_map)\n", " if not data: # until we hit the end\n", " break\n", "\n", " evaluator.test_minibatch(data)\n", "\n", " evaluator.summarize_test_progress()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Finished Evaluation [1]: Minibatch[1-35]: metric = 0.02% * 410;\n" ] } ], "source": [ "do_validate(network, val_source)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Predict\n", "\n", "We will now create a vector representation of the query and the answer. Then compute the cosine similarity between the two vectors. When the answer is close to the question one would get a high similarity, while an incorrect / partially relevant question / answer pair would result in a smaller similarity. These scores are often used for ranking web documents in response to a query." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Query Indices: [1202, 1154, 267, 321, 357, 648, 1070, 905, 549, 6, 1203]\n", "Answer Indices: [1017, 135, 91, 137, 1018]\n", "Poor Answer Indices: [1017, 501, 452, 533, 1018]\n" ] } ], "source": [ "# load dictionaries\n", "query_wl = [line.rstrip('\\n') for line in open(data['query']['file'])]\n", "answers_wl = [line.rstrip('\\n') for line in open(data['answer']['file'])]\n", "query_dict = {query_wl[i]:i for i in range(len(query_wl))}\n", "answers_dict = {answers_wl[i]:i for i in range(len(answers_wl))}\n", "\n", "# let's run a sequence through\n", "qry = 'BOS what contribution did e1 made to science in 1665 EOS'\n", "ans = 'BOS book author book_editions_published EOS'\n", "ans_poor = 'BOS language human_language main_country EOS'\n", "\n", "qry_idx = [query_dict[w+' '] for w in qry.split()] # convert to query word indices\n", "print('Query Indices:', qry_idx)\n", "\n", "ans_idx = [answers_dict[w+' '] for w in ans.split()] # convert to answer word indices\n", "print('Answer Indices:', ans_idx)\n", "\n", "ans_poor_idx = [answers_dict[w+' '] for w in ans_poor.split()] # convert to fake answer word indices\n", "print('Poor Answer Indices:', ans_poor_idx)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Convert the query, answer and the fake answer to one-hot representation. This is a necessary step since the input to our trained network takes one-hot encoded input. " ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Create the one hot representations\n", "qry_onehot = np.zeros([len(qry_idx),len(query_dict)], np.float32)\n", "for t in range(len(qry_idx)):\n", " qry_onehot[t,qry_idx[t]] = 1\n", " \n", "ans_onehot = np.zeros([len(ans_idx),len(answers_dict)], np.float32)\n", "for t in range(len(ans_idx)):\n", " ans_onehot[t,ans_idx[t]] = 1\n", " \n", "ans_poor_onehot = np.zeros([len(ans_poor_idx),len(answers_dict)], np.float32)\n", "for t in range(len(ans_poor_idx)):\n", " ans_poor_onehot[t, ans_poor_idx[t]] = 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For each of the query and the answer one-hot encoded input, create the embeddings. Note: we use the answer embedding for both the correct answer and the poor answer. We compute the cosine similarity between the query and answer pair. The relative value of the cosine similarity with a higher value indicating a better answer." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Query to Answer similarity: 0.99995367043\n", "Query to poor-answer similarity: 0.999941420215\n" ] } ], "source": [ "qry_embedding = network['query_vector'].eval([qry_onehot])\n", "ans_embedding = network['answer_vector'].eval([ans_onehot])\n", "ans_poor_embedding = network['answer_vector'].eval([ans_poor_onehot])\n", "\n", "from scipy.spatial.distance import cosine\n", "\n", "print('Query to Answer similarity:', 1-cosine(qry_embedding, ans_embedding))\n", "print('Query to poor-answer similarity:', 1-cosine(qry_embedding, ans_poor_embedding))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.4" } }, "nbformat": 4, "nbformat_minor": 2 }