Літня майстерня з Natural Language Processing, 3-7 липня
– Traditional word and document representations: BoW / TF-IDF / latent space factorization models.
– Traditional text classification models (Naive Bayes, GLM).
– Practice: Spam Filtering, Sentiment Analysis.
– Word and document representations: Word embeddings (word2vec, GloVe, fasttext, word2gauss, lexvec, NNSE). doc2vec.
– Co-occurrence matrix, SVD.
– Practice: doc2vec for medical text similarity.
– Language modeling: N-grams. Smoothing.
– Language modeling: RNN / LSTM-based.
– Practice: LM for sentence closure.
– Semantic similarity analysis: BoW-based, ontology-based.
– Semantic similarity analysis: embedding-based, Word Mover’s Distance.
– Practice: Sentence paraphrase detection.
– Sequence-to-sequence modeling (Question Answering, Summarization, NMT).
– Attention mechanisms in neural networks.
– Practice: Training a seq2seq model for text summarization.
Python, Machine Learning basics, Neural Networks. It is good to be familiar with one of the Deep Learning framework.
NLP expert, former tech lead for the NLP team at Grammarly where he did some exciting research in mistake detection and correction for the English language. Currently, Vsevolod is growing a Cognitive computing consultancy called (m8n)ware. Co-founder of the lang-uk project that aims to create, gather, release, and maintain data sets, models, and tools for Ukrainian language processing. More information about Vsevolod on his personal web page.
Contacts: [email protected]
Machine Learning Engineer at DataRobot, focused on real-world ML automation on enterprise scale. Yuriy is a relentless practitioner with over 10 years of experience in the industry and a contributor to many DS/ML educational initiatives: LITS, [email protected], Kyivstar Big Data School, DataFest.