Publications
An up-to-date list is available on Google Scholar.
2025
- NAACL 2025Echoes of Discord: Forecasting Hater Reactions to CounterspeechXiaoying Song, Sharon Lisseth Perez, Xinchen Yu, Eduardo Blanco, and Lingzi HongIn Findings of the Association for Computational Linguistics: NAACL 2025, Apr 2025
Hate speech (HS) erodes the inclusiveness of online users and propagates negativity and division. Counterspeech has been recognized as a way to mitigate the harmful consequences. While some research has investigated the impact of user-generated counterspeech on social media platforms, few have examined and modeled haters’ reactions toward counterspeech, despite the immediate alteration of haters’ attitudes being an important aspect of counterspeech. This study fills the gap by analyzing the impact of counterspeech from the hater‘s perspective, focusing on whether the counterspeech leads the hater to reenter the conversation and if the reentry is hateful. We compile the Reddit Echoes of Hate dataset (ReEco), which consists of triple-turn conversations featuring haters’ reactions, to assess the impact of counterspeech. To predict haters’ behaviors, we employ two strategies: a two-stage reaction predictor and a three-way classifier. The linguistic analysis sheds insights on the language of counterspeech to hate eliciting different haters’ reactions. Experimental results demonstrate that the 3-way classification model outperforms the two-stage reaction predictor, which first predicts reentry and then determines the reentry type. We conclude the study with an assessment showing the most common errors identified by the best-performing model.
@inproceedings{song-etal-2025-echoes, title = {Echoes of Discord: Forecasting Hater Reactions to Counterspeech}, author = {Song, Xiaoying and Perez, Sharon Lisseth and Yu, Xinchen and Blanco, Eduardo and Hong, Lingzi}, editor = {Chiruzzo, Luis and Ritter, Alan and Wang, Lu}, booktitle = {Findings of the Association for Computational Linguistics: NAACL 2025}, month = apr, year = {2025}, address = {Albuquerque, New Mexico}, publisher = {Association for Computational Linguistics}, pages = {4892--4905}, isbn = {979-8-89176-195-7}, }
2024
- ICWSM 2025Measuring and Forecasting Conversation Incivility: the Role of Antisocial and Prosocial BehaviorsXinchen Yu, Hayden Arnold, Benjamin Su, and Eduardo BlancoApr 2024
@misc{yu2024measuringforecastingconversationincivility, title = {Measuring and Forecasting Conversation Incivility: the Role of Antisocial and Prosocial Behaviors}, author = {Yu, Xinchen and Arnold, Hayden and Su, Benjamin and Blanco, Eduardo}, year = {2024}, eprint = {2412.02911}, archiveprefix = {arXiv}, primaryclass = {cs.CY}, }
- ICWSM 2024Hate Cannot Drive out Hate: Forecasting Conversation Incivility following Replies to Hate SpeechXinchen Yu, Eduardo Blanco, and Lingzi HongIn Proceedings of the International AAAI Conference on Web and Social Media, Apr 2024
@inproceedings{yu2024hate, title = {Hate Cannot Drive out Hate: Forecasting Conversation Incivility following Replies to Hate Speech}, author = {Yu, Xinchen and Blanco, Eduardo and Hong, Lingzi}, booktitle = {Proceedings of the International AAAI Conference on Web and Social Media}, volume = {18}, pages = {1740--1752}, year = {2024} }
2023
- EMNLP 2023A Fine-Grained Taxonomy of Replies to Hate SpeechXinchen Yu, Ashley Zhao, Eduardo Blanco, and Lingzi HongIn Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Dec 2023
Countering rather than censoring hate speech has emerged as a promising strategy to address hatred. There are many types of counterspeech in user-generated content: addressing the hateful content or its author, generic requests, well-reasoned counter arguments, insults, etc. The effectiveness of counterspeech, which we define as subsequent incivility, depends on these types. In this paper, we present a theoretically grounded taxonomy of replies to hate speech and a new corpus. We work with real, user-generated hate speech and all the replies it elicits rather than replies generated by a third party. Our analyses provide insights into the content real users reply with as well as which replies are empirically most effective. We also experiment with models to characterize the replies to hate speech, thereby opening the door to estimating whether a reply to hate speech will result in further incivility.
@inproceedings{yu-etal-2023-fine, title = {A Fine-Grained Taxonomy of Replies to Hate Speech}, author = {Yu, Xinchen and Zhao, Ashley and Blanco, Eduardo and Hong, Lingzi}, editor = {Bouamor, Houda and Pino, Juan and Bali, Kalika}, booktitle = {Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing}, month = dec, year = {2023}, address = {Singapore}, publisher = {Association for Computational Linguistics}, doi = {10.18653/v1/2023.emnlp-main.450}, pages = {7275--7289}, }
2022
- NAACL 2022Hate Speech and Counter Speech Detection: Conversational Context Does MatterXinchen Yu, Eduardo Blanco, and Lingzi HongIn Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jul 2022
Hate speech is plaguing the cyberspace along with user-generated content. Adding counter speech has become an effective way to combat hate speech online. Existing datasets and models target either (a) hate speech or (b) hate and counter speech but disregard the context. This paper investigates the role of context in the annotation and detection of online hate and counter speech, where context is defined as the preceding comment in a conversation thread. We created a context-aware dataset for a 3-way classification task on Reddit comments: hate speech, counter speech, or neutral. Our analyses indicate that context is critical to identify hate and counter speech: human judgments change for most comments depending on whether we show annotators the context. A linguistic analysis draws insights into the language people use to express hate and counter speech. Experimental results show that neural networks obtain significantly better results if context is taken into account. We also present qualitative error analyses shedding light into (a) when and why context is beneficial and (b) the remaining errors made by our best model when context is taken into account.
@inproceedings{yu-etal-2022-hate, title = {Hate Speech and Counter Speech Detection: Conversational Context Does Matter}, author = {Yu, Xinchen and Blanco, Eduardo and Hong, Lingzi}, booktitle = {Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies}, month = jul, year = {2022}, address = {Seattle, United States}, publisher = {Association for Computational Linguistics}, doi = {10.18653/v1/2022.naacl-main.433}, pages = {5918--5930}, }
- WebSci 2022Multi-Task Models for Multi-Faceted Classification of Pandemic Information on Social MediaXinchen Yu, Zhuoli Xie, Afra Mashhadi, and Lingzi HongIn Proceedings of the 14th ACM Web Science Conference 2022, Jul 2022
Social media data such as tweets have been seen as a convenient source of information to enhance situational awareness, and to assist local governments in decision making and response actions in crisis. However, extracting the relevant information for different types of situational awareness has been challenging. Existing studies have investigated classifications of crisis information on social media, but not much focus has been put on the classification of pandemic related information. Pandemics are public health related crisis that present unique characteristics. We propose to classify pandemic tweets from three perspectives, i.e., informativeness, geographic view, and information source, after a comprehensive analysis of the factors determining the relevance of information to situational awareness. The joint use of three-faceted classifications will enable the identification of relevant data for multiple purposes of situational awareness. We manually annotate a dataset with COVID-19 tweets and explore multi-task learning models for the classification of three tasks simultaneously. The proposed multi-task neural network models show improved performance compared with single learning models. We also find that pretraining multi-task models with relevant crisis datasets can further boost the performance. Specifically, multi-task models can significantly increase the recall of ‘informative’ and ‘local’ tweets, which are important for local response actions and policy decision making.
@inproceedings{10.1145/3501247.3531552, author = {Yu, Xinchen and Xie, Zhuoli and Mashhadi, Afra and Hong, Lingzi}, title = {Multi-Task Models for Multi-Faceted Classification of Pandemic Information on Social Media}, year = {2022}, isbn = {9781450391917}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, doi = {10.1145/3501247.3531552}, booktitle = {Proceedings of the 14th ACM Web Science Conference 2022}, pages = {327–335}, numpages = {9}, keywords = {Deep Learning, COVID-19, Multi-task, Social Media, Pandemics, Classification}, location = {Barcelona, Spain}, series = {WebSci '22}, }
- EUSSET 2022Causal impact model to evaluate the diffusion effect of social media campaignsXinchen Yu, Afra Mashhadi, Jeremy Boy, Rene Clausen Nielsen, and Lingzi HongJul 2022
Organized information campaigns on social media platforms have influence on collective opinions through processes such as social influence and majority opinion formation. Evaluating the effect of such campaigns has become a critical question. We proposed a method by first characterize user engagement and the semantics in public discussions with social media data, then apply a causal impact analysis to measure the effect. We conducted a case study to examine the effect of the 16 Days Campaign (a campaign organized by UN Women) through changes in public discussions of the MeToo, which is a related topic the campaign was aimed to impact. Results showed there were significantly more discussions in MeToo after the launch of the campaign. Hashtags on 16Days topics were used more and by more people. The proposed methods evaluate the direct and indirect diffusion effect of a campaign by quantifying the difference had the campaign not taken place based on social media data. The method enables to evaluate the overall outcome of collaborative work in a social media campaign.
@article{yu2022causal, title = {Causal impact model to evaluate the diffusion effect of social media campaigns}, author = {Yu, Xinchen and Mashhadi, Afra and Boy, Jeremy and Nielsen, Rene Clausen and Hong, Lingzi}, year = {2022}, publisher = {European Society for Socially Embedded Technologies (EUSSET)}, }
- ECSM 2022Linguistic Characteristics of Social Media Messages Spreading across Geographic and Linguistic BoundariesXinchen Yu, Jeremy Boy, Rene Clausen Nielsen, Lingzi Hong, and UN Global PulseIn ECSM 2022 9th European Conference on Social Media, Jul 2022
Social media enable messages to be exchanged beyond geographic constraints. Some of the messages could be shared and forwarded by people with different cultural backgrounds across different geographical regions. Studying the content of messages that can reach diverse populations is important for practices such as movement propagation and global marketing. Existing studies mainly investigated the characteristics of messages that are popular, ie, shared or forwarded by more users. As the diffusion of information is prone to be echoed inside certain geographical and linguistic boundaries, popular messages are not always to be shared and spread across geographical and linguistic boundaries. We investigated the linguistic characteristics of social media messages that can reach and be disseminated by people across nations, and across geographic and linguistic boundaries in the MeToo movement. Specifically, we analyze the diffusion paths of messages according to the geolocation of tweets and conducted statistical analysis to compare the linguistic characteristic of tweets that spread across geographical or linguistic boundaries with those that do not. We focus on the linguistic characteristics from three aspects:‘emotions’,‘social relations’, and ‘economics, politics, and religion’. Our findings reveal that popular messages tend to contain more negative emotions, however, messages with negative emotions are unlikely to be disseminated across geographical or linguistic boundaries. On the other hand, messages on economic topics or non-adults’ issues are more probable to be disseminated universally.
@inproceedings{yu2022linguistic, title = {Linguistic Characteristics of Social Media Messages Spreading across Geographic and Linguistic Boundaries}, author = {Yu, Xinchen and Boy, Jeremy and Nielsen, Rene Clausen and Hong, Lingzi and Pulse, UN Global}, booktitle = {ECSM 2022 9th European Conference on Social Media}, year = {2022}, organization = {Academic Conferences and publishing limited}, }
2021
- JMIR 2021Temporal Variations and Spatial Disparities in Public Sentiment Toward COVID-19 and Preventive Practices in the United States: Infodemiology Study of TweetsAlexander Kahanek, Xinchen Yu, Lingzi Hong, Ana Cleveland, and Jodi PhilbrickJMIR Infodemiology, Dec 2021
Background: During the COVID-19 pandemic, US public health authorities and county, state, and federal governments recommended or ordered certain preventative practices, such as wearing masks, to reduce the spread of the disease. However, individuals had divergent reactions to these preventive practices. Objective: The purpose of this study was to understand the variations in public sentiment toward COVID-19 and the recommended or ordered preventive practices from the temporal and spatial perspectives, as well as how the variations in public sentiment are related to geographical and socioeconomic factors. Methods: The authors leveraged machine learning methods to investigate public sentiment polarity in COVID-19–related tweets from January 21, 2020 to June 12, 2020. The study measured the temporal variations and spatial disparities in public sentiment toward both general COVID-19 topics and preventive practices in the United States. Results: In the temporal analysis, we found a 4-stage pattern from high negative sentiment in the initial stage to decreasing and low negative sentiment in the second and third stages, to the rebound and increase in negative sentiment in the last stage. We also identified that public sentiment to preventive practices was significantly different in urban and rural areas, while poverty rate and unemployment rate were positively associated with negative sentiment to COVID-19 issues. Conclusions: The differences between public sentiment toward COVID-19 and the preventive practices imply that actions need to be taken to manage the initial and rebound stages in future pandemics. The urban and rural differences should be considered in terms of the communication strategies and decision making during a pandemic. This research also presents a framework to investigate time-sensitive public sentiment at the county and state levels, which could guide local and state governments and regional communities in making decisions and developing policies in crises.
@article{info:doi/10.2196/31671, author = {Kahanek, Alexander and Yu, Xinchen and Hong, Lingzi and Cleveland, Ana and Philbrick, Jodi}, title = {Temporal Variations and Spatial Disparities in Public Sentiment Toward COVID-19 and Preventive Practices in the United States: Infodemiology Study of Tweets}, journal = {JMIR Infodemiology}, year = {2021}, month = dec, day = {30}, volume = {1}, number = {1}, pages = {e31671}, keywords = {COVID-19; preventive practices; temporal variations; spatial disparities; Twitter; public sentiment; socioeconomic factors}, issn = {2564-1891}, doi = {10.2196/31671}, }
- The disciplinary research landscape of data science reflected in data science journalsLingzi Hong, William Moen, Xinchen Yu, and Jiangping ChenInformation Discovery and Delivery, Dec 2021
@article{hong2021disciplinary, title = {The disciplinary research landscape of data science reflected in data science journals}, author = {Hong, Lingzi and Moen, William and Yu, Xinchen and Chen, Jiangping}, journal = {Information Discovery and Delivery}, volume = {49}, number = {4}, pages = {287--297}, year = {2021}, publisher = {Emerald Publishing Limited}, }
2020
- SocInfo 2020The Effect of Structural Affinity on the Diffusion of a Transnational Online Movement: The Case of #MeTooXinchen Yu, Shashidhar Reddy Daida, Jeremy Boy, and Lingzi HongIn Social Informatics, Dec 2020
Social media platforms intrinsically afford the diffusion of social movements like #MeToo across countries. However, little is known about how extrinsic, country-level factors affect the uptake of such a transnational movement. In this paper, we present a macro, comparative study of the transnational #MeToo movement on Twitter across 33 countries. Our aim is to identify how socio-economic and cultural variables might have influenced the in-country scale of participation, as well as the timings of in-country peak surges of messages related to #MeToo. Our results show that total in-country participation over a three-year timeframe is highly related to the scale of participation in peak surges; the scale of in-country participation is related to the population size and income level of the country; and the timings of peak surges are related to the country’s population size, gender equality score, and the language used in messages. We believe these findings resonate with theoretical frameworks that describe the formation of transnational social movements, and provide a quantitative perspective that complements much of the qualitative work in the literature.
@inproceedings{10.1007/978-3-030-60975-7_33, author = {Yu, Xinchen and Daida, Shashidhar Reddy and Boy, Jeremy and Hong, Lingzi}, editor = {Aref, Samin and Bontcheva, Kalina and Braghieri, Marco and Dignum, Frank and Giannotti, Fosca and Grisolia, Francesco and Pedreschi, Dino}, title = {The Effect of Structural Affinity on the Diffusion of a Transnational Online Movement: The Case of {\#}MeToo}, booktitle = {Social Informatics}, year = {2020}, publisher = {Springer International Publishing}, address = {Cham}, pages = {447--460}, isbn = {978-3-030-60975-7}, dimensions = {true}, }
- ASIS&T 2020Characteristics of information spreading across nationsXinchen Yu, Daida Shashidhar Reddy, Lasya Bentula, and Lingzi HongProceedings of the Association for Information Science and Technology, Dec 2020
Abstract The mechanism of information diffusion on social media platforms such as retweeting in Twitter has been largely studied. Existing studies mainly look into the social or discursive features of information that are popular in virtual space, few have related the diffusion of information to the real-world context and studied the characteristics of information that can spread across nations. In the context of globalization, understanding information spreading across nations not only facilitates marketing and propagation but also helps to understand transnational activities such as transnational social movements. We conduct a preliminary study to analyze the sentiment and cognition components of tweets that disseminate transnationally in the MeToo movement. Tweets that spread across nations are generally popular, that is retweeted more. However, popular tweets do not always spread across nations. We find popular tweets that disseminate transnationally contain more elements of politics, religion, or social bond indicators. Our study provides insights for propagation in globalization and may help social movement organizations that aim for transnational activities.
@article{https://doi.org/10.1002/pra2.309, author = {Yu, Xinchen and Reddy, Daida Shashidhar and Bentula, Lasya and Hong, Lingzi}, title = {Characteristics of information spreading across nations}, journal = {Proceedings of the Association for Information Science and Technology}, volume = {57}, number = {1}, pages = {e309}, keywords = {content analysis, information diffusion, transnational dissemination, Twitter}, doi = {https://doi.org/10.1002/pra2.309}, url = {https://asistdl.onlinelibrary.wiley.com/doi/abs/10.1002/pra2.309}, year = {2020}, }