Playground for bAbI tasks

23 Feb 2016

Recently we have implemented Dynamic memory networks in Theano and trained it on Facebook’s bAbI tasks which are designed for testing basic reasoning abilities. Our implementation now solves 8 out of 20 bAbI tasks which is still behind state-of-the-art. Today we release a web application for testing and comparing several network architectures and pretrained models.

Attention module
Architecture extensions
Results
Visualizing Dynamic memory networks
Looking for feedback

Attention module

One of the key parts in the DMN architecture, as described in the original paper, is its attention system. DMN obtains internal representations of input sentences and question and passes these to the episodic memory module. Episodic memory passes over all the facts, generates episodes, which are finally combined into a memory. Each episode is created by looking at all input sentences according to some attention. Attention system gives a score for each of the sentences, and if the score is low for some sentence, it will be ignored when constructing the episode.

Attention system is a simple 2 layer neural network where input is a vector of features computed based on input sentence, question and current state of the memory. This vector of features is described in the paper as follows:

attention module input

where c is an input sentence, q is the question, m is the current state of the memory. We tried to stay as close to the original as possible in our first implementation, but probably we understood these expressions too literally. We implemented |c-q| as an absolute value of a difference of two vectors, which caused lots of trouble, as Theano’s implementation of (the gradient of) abs function gave NaNs at random during training. Then, the terms cWq and cWm actually produce just two numbers, and they do not affect anything in a large vector.

Later we implemented another version called dmn_smooth which uses Euclidean distance between two vectors (instead of abs). This version is much more stable and gives better results. It is interesting to note that this version trains faster on CPU than on our GPU (GTX 980). It could be because of our not so optimal code or some issue in Theano’s scan function.

Architecture extensions

The only significant difference between our implementation and the original DMN, as we understand it, is the fixed number of episodes. In the paper the authors describe a stop condition, so that the network decides if it needs to compute more episodes. We did not implement it yet.

Our implementations heavily overfit on many tasks. We tried several techniques to fight that, but with little luck. First, we have implemented a version of dmn_smooth which supports mini-batch training. Then we applied dropout and batch normalization on top of the memory module (before passing to the answer module). All of these tricks help for some tasks for some hyperparameters, but still we could not beat the results obtained using simple dmn_smooth trained without mini-batches.

We plan to bring some ideas from the Neural Reasoner paper, especially the idea of recovering the input sentences based on the outputs of the input module.

Results

We train our implementations on bAbI tasks in a weakly supervised setting, as described in our previous post. Here we compare our results to End-to-end memory networks (MemN2N).

So far our best results are obtained by training dmn_smooth with 100 neurons for internal representations, 5 memory hops, using simple gradient descent for 11 epochs. We train jointly on all 20 bAbI tasks.

Task	MemN2N best version	Joint100 75.05%
1. Single supporting fact	99.9%	100%
2. Two supporting facts	81.2%	39.7%
3. Three supporting facts	68.3%	41.5%
4. Two argument relations	82.5%	75.5%
5. Three arguments relations	87.1%	50.1%
6. Yes/no questions	98%	97.7%
7. Counting	89.9%	91.4%
8. Lists/sets	93.9%	95.2%
9. Simple negation	98.5%	99%
10. Indefinite knowledge	97.4%	87.3%
11. Basic coreference	96.7%	100%
12. Conjuction	100%	87%
13. Compound coreference	99.5%	96.4%
14. Time reasoning	98%	73.1%
15. Basic deduction	98.2%	53.9%
16. Basic induction	49%	49.5%
17. Positional reasoning	57.4%	59.3%
18. Size reasoning	90.8%	98.3%
19. Path finding	9.4%	9%
20. Agent’s motivations	99.8%	97.1%
Average accuracy	84.775%	75.05%
Solved tasks	10	8

We solve (obtain >95% accuracy) 8 tasks. Our system outperforms MemN2N on some tasks, but on average stays behind by 10 percentage points. Experiments show that our networks do not manage to find connections between several sentences at once (tasks 2, 3 etc.). Task 19 (path finding) remains the most difficult one. It is actually the only task on which none of our implementations overfit. The authors of Neural Reasoner claim some success on that task when training on 10 000 examples. We use only 1000 samples per task for all experiments.

Visualizing Dynamic memory networks

We have created a web application / playground for Dynamic memory networks focused on bAbI tasks. It allows to choose a pretrained model and send custom input sentences and questions. The app shows the predicted answer and visualizes attention scores for each memory step.


Web-based playground for bAbI tasks

These visualizations show that the network does not significantly change its attention for different episodes, so it is very hard to correctly answer the questions from tasks 2 or 3.

Web app is accessible at http://yerevann.com/dmn-ui/. Note that the vocabulary of bAbI tasks is quite limited, and our implementation of DMN cannot process out-of-vocabulary words. Sample button is a good starting point, it gives a random sample from bAbI test set.

Looking for feedback

Everything described in this post is available on Github. DMN implementations are here, Flask-based restful server of the web app is in the /server/ folder, UI is in another repository. Feel free to fork, report issues, and please share your thoughts.

YerevaNN Blog on neural networks