RuleR: Improving LLM Controllability by Rule-based Data Recycling (NAACL'25)
RuleR: Improving LLM Controllability by Rule-based Data Recycling (NAACL'25)
Chinese Version: Zhi Hu
This is the repo for the RuleR project, which proposes a data augmentation method incorporating multiple constraints into the original data samples according to predefined rules, without any human/LLM editing on responses.
(Feel free to email minglii@umd.edu for any questions or feedback.)
News
- [2025/01] Our paper has been accepted to the NAACL 2025 main conference!
- [2024/06] We initialized the RuleR repo.
Contents
- Overview
- Highlights
- Install
- Run Code
- ToDo
- Citation
- Our Related Works
Overview
Large language models (LLMs) still lack delicate controllability over their responses, which is critical to enhancing their performance and the user experience. However, curating supervised fine-tuning (SFT) datasets to improve LLM controllability usually relies on human experts or proprietary LLMs, which requires additional costs. To bridge this gap, we propose Rule-based Data Recycling (RuleR), a data augmentation method incorporating multiple constraints into the original data samples according to predefined rules, which creates new training tasks to consolidate the controllability of LLMs. Instead of creating new data from scratch, RuleR ``recycles'' existing data by simply applying rule-based edits to their responses and appending the rule-instructions in their original instructions. Experimental results demonstrate RuleR's effectiveness in improving LLM controllability while maintaining general instruction-following capabilities.
Comparing existing methods (top) and our RuleR (bottom) for enhancing LLM controllability. Most existing methods rely on extra-human/model supervision to generate or edit instructions and responses, neglecting the remaining potential of the original data. On the contrary, RuleR demonstrates that simple rule-based (human/model-free) editing of existing data can greatly improve LLM controllability.
Highlights
- RuleR is the first human/model-free data augmentation approach designed to improve LLM controllability in enforcing multiple constraints on LLM-generated responses.
Install
- Install the dependencies
Note: The use of RuleR only uses spacy and tqdm packages. We recommend you manually install these 2 packages, and do not need to install them from requirements.txt
- Install the Spacy model
Run Code
Single-Round Data (Alpaca format)
--data_path xxx.json \ # Alpaca format needed here
--save_path xxx_augmented.json \
--augment_rate 0.9 \
--epo_num 2 \
--concate_layer 3
Multi-Round Data (ShareGPT format)
--data_path xxx.json \ # ShareGPT format needed here
--save_path xxx_augmented.json \
--augment_rate 0.9 \
--epo_num 2 \
--concate_layer 3
--data_path: Input data path.
--save_path: Save data path.
--augment_rate: The probability of implmenting augmentation.
--epo_num: The times of random augmentation process to be run.
--concate_layer: The max rule number for each sample.
Training
We use the prompt and code base from FastChat:
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Hi ASSISTANT: Hello.USER: Who are you? ASSISTANT: I am .........