UMUTeam at EXIST 2021: Sexist Language Identification based on Linguistic Features and Transformers in Spanish and English

García-Díaz, José Antonio; Colomo-Palacios, Ricardo; Valencia-García, Rafael

dc.contributor.author	García-Díaz, José Antonio
dc.contributor.author	Colomo-Palacios, Ricardo
dc.contributor.author	Valencia-García, Rafael
dc.date.accessioned	2021-11-23T13:27:23Z
dc.date.available	2021-11-23T13:27:23Z
dc.date.created	2021-09-20T13:59:13Z
dc.date.issued	2021
dc.identifier.citation	CEUR Workshop Proceedings. 2021, 2943, 512-521.	en_US
dc.identifier.issn	1613-0073
dc.identifier.uri	https://hdl.handle.net/11250/2831044
dc.description.abstract	Sexism is harmful behaviour that can make women feel worthless promoting self-censorship and gender inequality. In the digital era, misogynists have found in social networks a place in which they can spread their oppressive discourse towards women. Although this particular form of oppressive speech is banned and punished on most social networks, its identification is quite challenging due to the large number of messages posted everyday. Moreover, sexist comments can be unnoticed as condescends or friendly statements which hinders its identification even for humans. With the aim of improving automatic sexist identification on social networks, we participate in EXIST-2021. This shared task involves the identification and categorisation of sexism language on Spanish and English documents compiled from micro-blogging platforms. Specifically, two tasks were proposed, one concerning a binary classification of sexism utterances and another regarding multi-class identification of sexist traits. Our proposal for solving both tasks is grounded on the combination of linguistic features and state-of-the-art transformers by means of ensembles and multi-input neural networks. To address the multi-language problem, we tackle the problem independently by language to put the results together at the end. Our best result was achieved in task 1 with an accuracy of 75.14% and 61.70% for task 2.	en_US
dc.language.iso	eng	en_US
dc.publisher	Technical University of Aachen	en_US
dc.relation.uri	http://ceur-ws.org/Vol-2943/exist_paper19.pdf
dc.rights	Navngivelse 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/deed.no	*
dc.subject	sexism identification	en_US
dc.subject	document classification	en_US
dc.subject	feature engineering	en_US
dc.subject	natural language processing	en_US
dc.title	UMUTeam at EXIST 2021: Sexist Language Identification based on Linguistic Features and Transformers in Spanish and English	en_US
dc.type	Peer reviewed	en_US
dc.type	Journal article	en_US
dc.description.version	publishedVersion	en_US
dc.rights.holder	© 2021 for this paper by its authors.	en_US
dc.subject.nsi	VDP::Humaniora: 000::Språkvitenskapelige fag: 010	en_US
dc.subject.nsi	VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550::Datateknologi: 551	en_US
dc.source.pagenumber	512-521	en_US
dc.source.volume	2943	en_US
dc.source.journal	CEUR Workshop Proceedings	en_US
dc.identifier.cristin	1936075
cristin.ispublished	true
cristin.fulltext	original
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: Colomo-PalaciosSexist2021.pdf
Størrelse:: 451.7Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for informasjonsteknologi og kommunikasjon [134]
Enheten inneholder bidrag fra ansatte ved Institutt for informasjonsteknologi og kommunikasjon

Vis enkel innførsel

Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal