A few months ago we showed how well an LSTM network can perform transliteration. Transliteration is a relatively easy task for humans, so it is interesting to see whether the network actually capture the patterns people normally use to transliterate text. In this post we’ll try to understand what individual neurons of the network actually learn.
Today we have officially registered YerevaNN scientific educational foundation, which aims to promote world-class AI research in Armenia and develop high quality educational programs in machine learning and related disciplines. The board members of the foundation are Gor Vardanyan, founder of FimeTech, Vazgen Hakobjanyan, cofounder of Teamable, and Rouben Meschian, founder of Arminova Technologies. Hrant Khachatrian is the director of the foundation.
The success of neural word embedding models like word2vec and GloVe motivated research on representing sentences in an n-dimensional space. Michael Manukyan and Hrayr Harutyunyan reviewed several sentence representation algorithms and their applications in state-of-the-art automated question answering systems during a talk at the Armenian NLP meetup. The slides of the talk are below. Follow us on SlideShare to get the latest slides from YerevaNN.
Many languages have their own non-Latin alphabets but the web is full of content in those languages written in Latin letters, which makes it inaccessible to various NLP tools (e.g. automatic translation). Transliteration is the process of converting the romanized text back to the original writing system. In theory every language has a strict set of romanization rules, but in practice people do not follow the rules and most of the romanized content is hard to transliterate using rule based algorithms. We believe this problem is solvable using the state of the art NLP tools, and we demonstrate a high quality solution for Armenian based on recurrent neural networks. We invite everyone to adapt our system for more languages.
Last year Hrayr used convolutional networks to identify spoken language from short audio recordings for a TopCoder contest and got 95% accuracy. After the end of the contest we decided to try recurrent neural networks and their combinations with CNNs on the same task. The best combination allowed to reach 99.24% and an ensemble of 33 models reached 99.67%. This work became Hrayr’s bachelor’s thesis.