Software and datasets

Software, Data & Cheatsheets

ABAE-PyTorch: github

Yet another PyTorch implementation of the model described in the paper An Unsupervised Neural Attention Model for Aspect Extraction by He, Ruidan and Lee, Wee Sun and Ng, Hwee Tou and Dahlmeier, Daniel, ACL2017.

keras: Generating Sentences from a Continuous Space: github

Keras implementation of the LSTM variational autoencoder described in the paper Generating Sentences from a Continuous Space by Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy Bengio in 2015.
Doesn't follow the paper exactly, but the main ideas are implemented. A toy dataset is provided.

pytrovich: Python3 port of Petrovich: github, pypi

A library that inflects Russian names to a given grammatical case. Supports first names, last names and middle names inflections. Since version 0.0.2, gender detection is also available.

Glyphnet-PyTorch: github

Сracking Egyptologist's MNIST: PyTorch implementation of the Glyphnet model introduced in "A Deep Learning Approach to Ancient Egyptian Hieroglyphs Classification", Barucci et al., 2021.

Gardiner2Unicode: github, pypi

A Python package that provides a convenient out-of-the-box way to access the mapping of Gardiner's Sign List codes to unicode IDs and generates hieroglyphs as images.

Mystem-Scala: github, mvn

A Scala wrapper of the Yandex.Mystem morphological analyzer (early version by I. Segalovich and V. Titov is described here).

RuConceptNet: github, pypi

A Python package providing efficient retrieval from the Russian part of ConceptNet 5.7 out-of-the-box. Open Source ODS Award 2020.

YSK-minimal-TGBot: github

A minimal Telegram bot decoding 30-second Russian speech recordings using Yandex Speech Kit. Requesting a few access tokens (both from Telegram and Yandex) is required.

Datasets


Hogweed: Ground-Level View: github

Non-aerial photographic images of Sosnowsky's hogweed, manually annotated for semantic segmentation.

Other


Awesome Azeri NLP: github

A curated list of Azerbaijani language processing software, relevant datasets, etc.

Awesome Kyrgyz NLP: github

A curated list of Kyrgyz language processing software, relevant datasets, etc.

Apertium's List of Symbols: PDF

An A0 format poster with the tables copied from the Apertium Project Wiki.

Apertium tags description: PDF

Apertium tags glossary translated into Russian where possible.