Normalisation and Analysis of Social Media Texts (NormSoMe)

 

at LREC’2016, Portorož (Slovenia), May, 28th (morning session), 2016

Social media texts provide large quantities of interesting and useful data as well as new challenges for NLP. Social media texts include chats, online commentaries, reviews, blogs, emails, forums, and other genres. Typically, the texts are informal and notoriously noisy. Thus, many NLP tools have difficulties processing and normalizing the data.

As English social media has been investigated most widely, we also invited papers on other languages, especially those rich in inflections and diacritics, which cause additional processing problems. In the programme of NormSoMe, besides English, papers on Dutch, German, Hungarian, Lithuanian, and Slovene are included.

The workshop is aimed at researchers who have solutions, insights, and ideas for tackling the processing of social media texts, or who are interested in this field of research.

Time, Place and Duration

This was a half-day event, which took take place at the Conference venue, the Grand Hotel Bernardin Conference Center (Europa C), in the morning session of Saturday, 28 May 2016.

Proceedings

The Proceedings of the workshop can be found here.

Workshop Programme

28 May, 2016 (morning session) 9:00 – 9:05 – Introduction to the Workshop (by Andrius Utka)
9:05 – 9:45 – Session 1 (Chair: Martin Volk)
Torsten Zesch (keynote speech): Your noise is my research question! - Limitations of normalizing social media data (PDF)
9:45 – 10:35 – Session 2 (Chair: Michi Amsler)
  • Judit Ács, József Halmi: Hunaccent: Small Footprint Diacritic Restoration for Social Media
  • Andrius Utka, Darius Amilevičius: Normalisation of Lithuanian Social Media Texts: Towards Morphological Analysis of User-Generated Comments
10:30 – 11:00 Coffee break
11:00 – 13:00 – Session 3 (Chair: Andrius Utka)
  • Jaka Čibej, Darja Fišer, Tomaž Erjavec: Normalisation, Tokenisation and Sentence Segmentation of Slovene Tweets
  • Hans van Halteren, Nelleke Oostdijk: Listening to the Noise: Model Improvement on the Basis of Variation Patterns in Tweets
  • Rob van der Goot: Normalizing Social Media Texts by Combining Word Embeddings and Edit Distances in a Random Forest Regressor
  • Ronja Laarmann-Quante, Stefanie Dipper: An Annotation Scheme for the Comparison of Different Genres of Social Media with a Focus on Normalization
  • Tatjana Scheffler, Elina Zarisheva: Dialog Act Recognition for Twitter Conversations

OC

  • Andrius Utka (Vytautas Magnus University, Kaunas)
  • Jolanta Kovalevskaitė (Vytautas Magnus University, Kaunas)
  • Danguolė Kalinauskaitė (Vytautas Magnus University, Kaunas)
  • Martin Volk (University of Zurich)
  • Rita Butkienė (Kaunas University of Technology)
  • Jurgita Vaičenonienė (Vytautas Magnus University, Kaunas)

PC

  • Darius Amilevičius (Vytautas Magnus University, Kaunas)
  • Michi Amsler (University of Zurich)
  • Loic Boizou (Vytautas Magnus University, Kaunas)
  • Gintarė Grigonytė (Stockholm University)
  • Jurgita Kapočiūtė-Dzikienė (Vytautas Magnus University, Kaunas)
  • Tomas Krilavičius (Vytautas Magnus University, Kaunas)
  • Joakim Nivre (Uppsala University)
  • Raivis Skadinš (Tilde, Riga, Latvia)
  • Andrius Utka (Vytautas Magnus University, Kaunas)
  • Martin Volk (University of Zurich)

Contact Organising Committee

normsome@hmf.vdu.lt

Conference Website

http://donelaitis.vdu.lt/~normsome2016/