Labeling in their shoes: Improving Text Annotation with Cognitive Empathy Priming

Sung Hyun Kwon, Jessica Clark, Jui Ramaprasad, Il-Horn Hann

Abstract

Human-annotated labels are crucial for training and evaluating machine learning models, especially in subjective domains. In settings where individuals' perspectives affect their labels, crowdsourced workers may both fail to reach consensus and systematically differ from expert consensus on average. One such perspective-sensitive context is identifying sexist content. We demonstrate that, in this domain, there is systematic inter-annotator misalignment and annotator-expert misalignment, which persists even with large numbers of annotators. This misalignment significantly impacts the performance of LLMs fine-tuned using the annotated data. To address this challenge, we introduce cognitive empathy priming (CEP), a scalable psychological intervention that enhances annotators' ability to recognize perspectives different from their own. Our results show that CEP substantially improves label quality: treated workers demonstrate around 8-20 percentage points higher alignment with expert consensus compared to those who received no priming. Inter-rater consistency also significantly increases. These improvements translate directly to model performance. LLMs fine-tuned using CEP-treated labels show approximately 16% higher agreement with expert-determined labels versus fine-tuning with control group labels. Sensitivity analyses confirm these results are robust even when accounting for potential expert biases. This research provides organizations with a scalable and cost-effective solution to enhance AI training data quality, particularly for applications such as content moderation and bias detection.

Under review


Working paper

Target Variable Engineering

Jessica Clark

Abstract

How does the formulation of a target variable affect performance within the ML pipeline? The experiments in this study examine numeric targets that have been binarized by comparing against a threshold. We compare the predictive performance of regression models trained to predict the numeric targets vs. classifiers trained to predict their binarized counterparts. Specifically, we make this comparison at every point of a randomized hyperparameter optimization search to understand the effect of computational resource budget on the tradeoff between the two. We find that regression requires significantly more computational effort to converge upon the optimal performance, and is more sensitive to both randomness and heuristic choices in the training process. Although classification can and does benefit from systematic hyperparameter tuning and model selection, the improvements are much less than for regression. This work comprises the first systematic comparison of regression and classification within the framework of computational resource requirements. Our findings contribute to calls for greater replicability and efficiency within the ML pipeline for the sake of building more sustainable and robust AI systems.


Under review

Automated Promotion? A Study of the Fairness-Economic Tradeoffs in Reducing Crowdfunding Disparities via AI/ML

Lauren Rhue, Jessica Clark

Abstract

Digital platforms have a widely-documented issue with racial disparities, which can result in adverse reputational and economic consequences. Equitable promotion of projects across racial groups can mitigate these disparities. Our research explores how to more equitably determine which projects should be promoted by the platform. Platforms typically rely on their employees to decide what content to highlight, but human decisions are subject to cognitive and implicit biases. We examine whether an algorithmic-based approach to choosing which projects to promote can generate more equitable outcomes for people in traditionally marginalized groups while resulting in equivalent economic outcomes. We perform an observational and simulated study on more than 100,000 projects gathered from crowdfunding platform Kickstarter.com to determine whether machine learning models would diversify the set of promoted projects.

Our analysis yields three main findings. First, machine learning models—fairness- unaware and fairness-aware models—identify a more diverse set of projects to promote than those selected by employees. Second, promoting a more diverse set of projects diminishes but does not completely eliminate disparities between racial groups. Third, a more equitable promotion scheme does not substantially negatively affect core business outcomes for the platform. This study contributes to the information systems literature related to using machine learning to reduce racial disparities and to research examining the fairness-economic trade-off. Furthermore, this paper provides a practical path forward for digital platforms who want to increase participation from diverse groups, and for potential crowdfunding participants. 


Under review

Not time but place: Location vs. previous choices for prosocial crowdfunding recommendation strategies

Lauren Rhue, Atiya Avery, Jessica Clark

Abstract

Donors on prosocial crowdfunding platforms have two critical motivations for donors: supporting local causes and supporting social connections. Platform recommendation strategies often leverage prior donor choices; however, donors’ choices may be driven by supply constraints rather than their true preferences. In these instances, a recommendation strategy based on donor attributes such as location may better reflect their true preferences. To understand the effectiveness of these two recommendation strategies, we conducted a randomized experiment with 200,000 donors in partnership with a prosocial crowdfunding platform. Donors were randomly selected to receive project recommendations that were either geographically close to their home or geographically close to their previously supported cause. We found that the local recommendation strategy increased the likelihood of clicks and donations. These results are driven by donors without social connections to the platform, indicating that social motivations supersede geographic motivations and suggesting that digital platforms should consider a hierarchical approach. We also found evidence that the local recommendation strategy yields a rich-get-richer effect. We discuss the implications of our findings for digital platforms as well as the practical implications for our research context of education in the United States.