CALCS Shared Tasks: Machine Translation (MT)



English - Hinglish (ENG - HINGLISH)

[Paper] [Data] [Bibtex]

[License Agreement]: Twitter data is distributed for non-commercial use and for research purposes only, following Twitter's own Developer Agreement and Policy.



English - Spanish (ENG - SPA)

[Paper] [Data] [Bibtex]

[License Agreement]: Twitter data is distributed for non-commercial use and for research purposes only, following Twitter's own Developer Agreement and Policy.



English - Spanglish (ENG - SPANGLISH)

[Paper] [Data] [Bibtex]

[License Agreement]: Twitter data is distributed for non-commercial use and for research purposes only, following Twitter's own Developer Agreement and Policy.



Spanglish - English (SPANGLISH - ENG)

[Paper] [Data] [Bibtex]

[License Agreement]: Twitter data is distributed for non-commercial use and for research purposes only, following Twitter's own Developer Agreement and Policy.



Spanglish - Spanish (SPANGLISH - SPA)

[Paper] [Data] [Bibtex]

[License Agreement]: Twitter data is distributed for non-commercial use and for research purposes only, following Twitter's own Developer Agreement and Policy.



Modern Standard Arabic-Egyptian Arabic - English (MSAEA - ENG)

[Paper] [Data] [Bibtex]

[License Agreement]: Twitter data is distributed for non-commercial use and for research purposes only, following Twitter's own Developer Agreement and Policy.



Modern Standard Arabic-Egyptian Arabic -> Spanish (MSAEA - SPA)

[Paper] [Data] [Bibtex]

[License Agreement]: Twitter data is distributed for non-commercial use and for research purposes only, following Twitter's own Developer Agreement and Policy.



English -> Modern Standard Arabic-Egyptian Arabic (ENG - MSAEA)

[Paper] [Data] [Bibtex]

[License Agreement]: Twitter data is distributed for non-commercial use and for research purposes only, following Twitter's own Developer Agreement and Policy.

Language Identification (LID)



Spanish - English (SPA - ENG)

Overview for the Second Shared Task on Language Identification in Code-Switched Data (CALCS 2016)
Giovanni Molina, Nicolas Rey-Villamizar, Thamar Solorio, Fahad AlGhamdi, Mahmoud Ghoneim, Abdelati Hawwari, Mona Diab
[Paper] [Data] [Bibtex]




Hindi - English (HIN - ENG)

Language Identification and Analysis of Code-Switched Social Media Text
Deepthi Mave, Suraj Maharjan and Thamar Solorio
[Paper] [Data] [Bibtex]




Nepali - English (NEP - ENG)

Overview for the First Shared Task on Language Identification in Code-Switched Data (CALCS 2014)
Thamar Solorio, Elizabeth Blair, Suraj Maharjan, Steven Bethard, Mona Diab, Mahmoud Gohneim, Abdelati Hawwari, Fahad AlGhamdi, ulia Hirschberg, Alison Chang, Pascale Fung
[Paper] [Data] [Bibtex]




Modern Standard Arabic - Egyptian Arabic (MSA - EA)

Overview for the Second Shared Task on Language Identification in Code-Switched Data (CALCS 2016)
Giovanni Molina, Nicolas Rey-Villamizar, Thamar Solorio, Fahad AlGhamdi, Mahmoud Ghoneim, Abdelati Hawwari, Mona Diab
[Paper] [Data] [Bibtex]


Parts of Speech Tagging (POS)



Spanish - English (SPA - ENG)

Part of Speech Tagging for Code Switched Data
Fahad AlGhamdi, Giovanni Molina, Mona Diab, Thamar Solorio, Abdelati Hawwari, Victor Soto, Julia Hirschberg
[Paper] [Data] [Bibtex]




Hindi - English (HIN - ENG)

A Twitter Corpus for Hindi-English Code Mixed POS Tagging
Kushagra Singh, Indira Sen, Ponnurangam Kumaraguru
[Paper] [Data] [Bibtex]


Named Entity Recognition (NER)



Spanish - English (SPA - ENG)

Named Entity Recognition on Code-Switched Data: Overview of the CALCS 2018 Shared Task
Gustavo Aguilar, Fahad AlGhamdi, Victor Soto, Mona Diab, Julia Hirschberg, Thamar Solorio
[Paper] [Data] [Bibtex]




Modern Standard Arabic - Egyptian Arabic (MSA - EA)

Named Entity Recognition on Code-Switched Data: Overview of the CALCS 2018 Shared Task
Gustavo Aguilar, Fahad AlGhamdi, Victor Soto, Mona Diab, Julia Hirschberg, Thamar Solorio
[Paper] [Data] [Bibtex]




Hindi - English (HIN - ENG)

Language Identification and Named Entity Recognition in Hinglish Code Mixed Tweets
Kushagra Singh, Indira Sen, Ponnurangam Kumaraguru
[Paper] [Data] [Bibtex]


Sentiment Analysis (SA)



Spanish - English (SPA - ENG)

SemEval-2020 Task 9: Overview of Sentiment Analysis of Code-Mixed Tweets
Parth Patwa, Gustavo Aguilar, Sudipta Kar, Suraj Pandey, Srinivas PYKL, Björn Gambäck, Tanmoy Chakraborty, Thamar Solorio, Amitava Das
[Paper] [Data] [Bibtex]