Welcome!

I am a Ph.D candidate in economics at School of Economics and Management, Tsinghua University. My research interests include experimental economics, behavioral economics, digital Economy (AI), and health economics. You can download my CV here. Email: shuwang719 [at] gmail.com


Research

Working Papers

4.
Understanding the Mechanism of Altruism in Large Language Models, Apr. 2026.
Zhang, Shuhuai, Shu Wang, Zijun Yao, Chuanhao Li, Xiaozhi Wang, Songfa Zhong, Tracy Xiao Liu. [Abstract]
Altruism is fundamental to human societies, fostering cooperation and social cohesion. Recent studies suggest that large language models (LLMs) can display human-like prosocial behavior, but the internal computations that produce such behavior remain poorly understood. We investigate the mechanisms underlying LLM altruism using sparse autoencoders (SAEs). In a standard Dictator Game, minimal-pair prompts that differ only in social stance (generous versus selfish) induce large, economically meaningful shifts in allocations. Leveraging this contrast, we identify a set of SAE features (0.024% of all features across the model's layers) whose activations are strongly associated with the behavioral shift. To interpret these features, we use benchmark tasks motivated by dual-process theories to classify a subset as primarily heuristic (System 1) or primarily deliberative (System 2). Causal interventions validate their functional role: activation patching and continuous steering of this feature direction reliably shift allocation distributions, with System 2 features exerting a more proximal influence on the model's final output than System 1 features. The same steering direction generalizes across multiple social-preference games. Together, these results enhance our understanding of artificial cognition by translating altruistic behaviors into identifiable network states and provide a framework for aligning LLM behavior with human values, thereby informing more transparent and value-aligned deployment.
3.
When Experimental Economics Meets Large Language Models: Evidence-based Tactics, Oct. 2025.
Wang, Shu, Zijun Yao, Shuhuai Zhang, Jianuo Gai, Tracy Xiao Liu, Songfa Zhong. [arXiv] [Abstract]
Advancements in large language models (LLMs) have sparked a growing interest in measuring and understanding their behavior through experimental economics. However, there is still a lack of established guidelines for designing economic experiments for LLMs. Inspired by principles from experimental economics with insights from LLM research in artificial intelligence, we outline key considerations in the experimental design and implementation stage, and perform two sets of experiments to assess the impact of these considerations on LLMs' responses. Based on our findings, we discuss seven practical tactics for conducting experiments with LLMs. Our study enhances the design, replicability, and generalizability of LLM experiments, and broadens the scope of experimental economics in the digital age.
2.
How General Are Measures of Choice Consistency? Evidence from Experimental and Scanner Data, Sep. 2025.
Chen, Mingshi, Tracy Xiao Liu, You Shan, Shu Wang, Songfa Zhong, Yanju Zhou. [Abstract]
Choice consistency with utility maximization, as a key assumption in economics, has been extensively used to evaluate decision quality of individuals and to predict real-world outcomes across different contexts. Here we investigate the generalizability of consistency measures derived from budgetary decisions in the lab-in-the-field experiment and purchasing decisions using supermarket scanner data. In the first study, we observe a lack of correlation between consistency scores derived from risky decisions in the experiment and those from supermarket food purchasing decisions. In the second study, we observe moderate correlations between experimental tasks and low to moderate correlations across purchasing categories and over time periods within the supermarket. Moreover, consistency in the two settings exhibits distinct predictive validity in predicting consumer behavior. These results suggest that choice consistency, as a measure of decision quality, may be better characterized as a multidimensional skill set rather than a single-dimensional ability.
1.
The Surprising Benefits of Base Rate Neglect in Robust Aggregation, June 2024.
Kong, Yuqing, Shu Wang, Ying Wang. [Abstract]
Robust aggregation integrates predictions from multiple experts without knowledge of experts' information structures. Prior work assumes experts are Bayesian, providing predictions as perfect posteriors based on their signals. However, real-world experts often deviate systematically from Bayesian reasoning. Our work considers experts who tend to ignore the base rate and reveals that a certain degree of base rate neglect helps robust forecast aggregation. Specifically, we consider a two-expert forecast aggregation problem with a binary world state. Experts exhibit base rate neglect, incorporating the base rate information to degree \( \lambda \in [0,1] \). Aggregators' performance is measured by the worst-case regret, which is the maximum regret across the set of considered information structures compared to an omniscient benchmark. Our results reveal the surprising V-shape of regret as a function of experts' base rate consideration degree \( \lambda \), meaning that predictions with intermediate base rate neglect can counter-intuitively lead to better aggregated predictions than perfect Bayesian posteriors.

Extended abstract at EC’24. [Publisher]

Publications in English

1.
Wang, Yipei, Shu Wang (Co-first author), Ke Zhang, Zhijie Liu, Qingbian Ma, Hong Ji, Zheng Hou, Tracy Xiao Liu, Xuedong Xu, Xinxia Wu, Changxiao Jin, Association Between Physician Communication Features and Patient Outcomes in Telemedicine: Retrospective Cross-Sectional Observational Study, Journal of Medical Internet Research, 2026, 28, e86977. [Publisher] [Abstract]
Background: Asynchronous telemedicine is a crucial component of multichannel health care, where effective communication drives satisfaction. However, the effectiveness of communication features remains poorly understood. Prior research relied on subjective surveys or small-scale simulations, failing to link features to objective outcomes. Understanding these features is critical for optimizing physician engagement and establishing quality indicators to enhance the patient experience.
Objective: This study aimed to bridge this gap by leveraging a large-scale real-world dataset to quantify the association between physicians’ communication features—including response modalities, length, and sequence—and patient repurchase behavior, as well as review scores, within a high-autonomy health care setting.
Methods: This retrospective cross-sectional study analyzed 304,337 paid, patient-initiated virtual visits from a Chinese academic medical center (2021‐2023), which included 823,135 physician responses. The sample was selected after applying a series of exclusion criteria, such as free consultations, team-based visits, and outlier data. The key exposures were the modality of physician responses, response length, and response sequence. Outcomes included patient loyalty and satisfaction. Loyalty was operationalized as follow-up visits within 6 months, with a 30-day exclusion period applied to same-physician (fv1) and same-department (fv2) revisits to filter out clinical necessity, but not to hospital-wide revisits (fv3). Satisfaction was measured by the review scores. We used probit and ordinary least squares regressions to examine the relationships between communication features and patient outcomes.
Results: Regarding loyalty, audio-only visits were associated with the lowest fv1, with an average marginal effect (AME) of −0.030 (95% CI −0.043 to −0.016, P<.001), translating to a 30.9% (0.030/0.097) reduction compared to text-only visits. Regarding satisfaction, audio messages were associated with a significantly increased likelihood of patients providing reviews, with an AME of 0.041 (95% CI 0.006‐0.076, P=.02), but they did not affect review scores after adjusting for inverse Mills ratios. Increased numbers of text and audio replies were (marginally) associated with improved fv1, with AMEs of 0.009 (95% CI 0.006‐0.011, P<.001) and 0.007 (95% CI −0.000 to 0.016, P=.06), respectively. Visits beginning with a sub-5-second audio response and ending with text had significantly higher fv1 than text-only visits, with an AME of 0.069 (95% CI 0.018‐0.120, P=.008). The same patterns hold for fv2 and fv3. Based on the Bonferroni test, coefficients with a P value smaller than α=.050/3=.017 or α=.50/2=.025 were regarded as significant when evaluating the association with patient loyalty or satisfaction, respectively.
Conclusions: Physician communication practices were significantly associated with patient loyalty and satisfaction. This study is innovative in leveraging large-scale real-world data to systematically examine physician communication. It differs from existing studies by transcending prior survey-based research limitations. It introduces an effective hybrid approach, balancing human connection with text clarity in the field. Its implication in the real world is providing data-driven evidence to guide clinicians and policymakers in designing high-quality telemedicine services.

Publications in Chinese

3.
刘天寒,王澍(通讯作者),沉没成本效应与在线学习表现世界经济,2026年第2期: 195-220.(封面文章)
2.
毛其淋王澍外资并购对中国企业产能利用率的影响国际贸易问题,2022年第1期: 113-129.
1.
毛其淋王澍地方金融自由化如何影响中国企业出口?: 以城市商业银行发展为例世界经济研究,2019年第8期: 11-29.(封面文章)


Teaching

Teaching Assistant

  • Experimental Economics (graduate), Tsinghua University; 2023 spring, 2024 spring, 2025 spring
  • Behavioral Economics (graduate), Tsinghua University; 2022 spring
  • Intermediate Macroeconomics (undergraduate), Tsinghua University; 2021 fall
  • Econometrics (undergraduate), Nankai University; 2019 fall


Professional Activities

Referee

Journal of Economic Behavior & Organization, Journal of Behavioral and Experimental Economics