PAPER_TITLE

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

Code Aesthetics with Agentic Reward Feedback

^1,2Bang Xiao^*, ^1,3Lingjie Jiang^*, ¹Shaohan Huang^#, ¹Tengchao Lv, ¹Yupan Huang, ¹Xun Wu, ¹Lei Cui, ¹Furu Wei

¹Microsoft Research Asia ²Zhiyuan College, Shanghai Jiao Tong University ³Peking University
arXiv Preprint
^*Equal Contribution ^#Corresponding Author

Code arXiv

Abstract

Large Language Models (LLMs) have become valuable assistants for developers in code-related tasks. While LLMs excel at traditional programming tasks such as code generation and bug fixing, they struggle with visually-oriented coding tasks, often producing suboptimal aesthetics. In this paper, we introduce a new pipeline to enhance the aesthetic quality of LLM-generated code. We first construct AesCode-358K, a large-scale instruction-tuning dataset focused on code aesthetics. Next, we propose agentic reward feedback, a multi-agent system that evaluates executability, static aesthetics, and interactive aesthetics. Building on this, we develop GRPO-AR, which integrates these signals into the GRPO algorithm for joint optimization of functionality and code aesthetics. Finally, we develop OpenDesign, a benchmark for assessing code aesthetics. Experimental results show that combining supervised fine-tuning on AesCode-358K with reinforcement learning using agentic reward feedback significantly improves perfor- mance on OpenDesign and also enhances results on existing benchmarks such as PandasPlotBench. Notably, our AesCoder-4B surpasses GPT-4o and GPT-4.1, and achieves performance comparable to large open-source models with 480B–685B parameters, underscoring the effectiveness of our approach.

Our Method

Agentic Reward Framework for Visually-Oriented Code Generation

To enhance LLMs' ability in visually-oriented coding tasks such as webpage design and chart generation, we propose the Agentic Reward Framework, a multi-agent system that provides comprehensive aesthetic feedback from three complementary perspectives.

•Execution Agent ensures that generated code is executable and functionally correct.

•Static Aesthetics Agent evaluates the visual quality of the rendered result, judging layout, color harmony, and stylistic alignment using multimodal LLMs.

•Interactive Aesthetics Agent further assesses usability and interaction, autonomously exploring webpage elements to verify responsive and functional design.

By combining these agents, our system delivers textual, visual, and interactive reward signals, enabling reinforcement learning to align model behavior with human aesthetic judgment. This framework bridges the gap between code correctness and design aesthetics—allowing models not only to write working code but also to understand beauty in design.

Case Study

Comparisons between our AesCoder-4B and other leading models across different categories. The results show that our model is comparable to large open-source models with 480B–685B parameters, highlighting the effectiveness of our approach.

BibTeX

@misc{xiao2025codeaestheticsagenticreward,
        title={Code Aesthetics with Agentic Reward Feedback}, 
        author={Bang Xiao and Lingjie Jiang and Shaohan Huang and Tengchao Lv and Yupan Huang and Xun Wu and Lei Cui and Furu Wei},
        year={2025},
        eprint={2510.23272},
        archivePrefix={arXiv},
        primaryClass={cs.CL},
        url={https://arxiv.org/abs/2510.23272}, 
  }