Fanyou Wu | 吴凡优

Fanyou Wu

Applied Scientist II
PXT Central Science (PXTCS)
Amazon

I am Fanyou Wu, and I am an Applied Scientist at Amazon PXT Central Science (PXTCS). I received my Ph.D. degree in Forestry from Department of Forestry and Natural Resources, Purdue University (2021). Before attending Purdue, I received my master's degree from University of Eastern Finland (2018) and bachelor's degree from Nanjing Forestry University (2015) both in Wood Material Science. I was also an exchange student at the University of British Columbia (2013).

My research focuses on applying machine learning to human resource area. Attending machine learning related competitions is my side interests, and I have won many championships and runners-up in machine learning related competitions and top conference competitions at KDD, IJCAI, NeurIPS, and CVPR.

News

Feb 21, 2026I have migrated my website from jekyll to next.js using Claude Code.

Selected publications

  1. Disentangling Biased Knowledge from Reasoning in Large Language Models via Machine Unlearning
    Zheyuan Liu, Suraj Maharjan, Fanyou Wu, Rahil Parikh, Belhassen Bayar, Srinivasan H. Sengamedu, Meng Jiang
    Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics 2025
    @inproceedings{liu2025disentangling,
      title = {Disentangling Biased Knowledge from Reasoning in Large Language Models via Machine Unlearning},
      author = {Liu, Zheyuan and Maharjan, Suraj and Wu, Fanyou and Parikh, Rahil and Bayar, Belhassen and Sengamedu, Srinivasan H. and Jiang, Meng},
      booktitle = {Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics},
      year = {2025},
      doi = {10.18653/v1/2025.acl-long.305}
    }

    The rapid development of Large Language Models (LLMs) has led to their widespread adoption across various domains, leveraging vast pre-training knowledge and impressive generalization capabilities. However, these models often inherit biased knowledge, resulting in unfair decisions in sensitive applications. It is challenging to remove this biased knowledge without compromising reasoning abilities due to the entangled nature of the learned knowledge within LLMs. To solve this problem, existing approaches have attempted to mitigate the bias using techniques such as fine-tuning with unbiased datasets, model merging, and gradient ascent. While these methods have experimentally proven effective, they can still be sub-optimum in fully disentangling biases from reasoning. To address this gap, we propose Selective Disentanglement Unlearning (SDU), a novel unlearning framework that selectively removes biased knowledge while preserving reasoning capabilities. SDU operates in three stages: identifying biased parameters using a shadow LLM, fine-tuning with unbiased data, and performing selective parameter updates based on weight saliency. Experimental results across multiple LLMs show that SDU improves fairness accuracy by 14.7% and enhances reasoning performance by 62.6% compared to existing baselines.

  2. Synthesizing conversations from unlabeled documents using automatic response segmentation
    Fanyou Wu, Weijie Xu, K. Chandan Reddy, H. Srinivasan Sengamedu
    Findings of the Association for Computational Linguistics: ACL 2024
    @inproceedings{wu2024SynCARS,
      title = {Synthesizing conversations from unlabeled documents using automatic response segmentation},
      author = {Wu, Fanyou and Xu, Weijie and Reddy, K. Chandan and Sengamedu, H. Srinivasan},
      booktitle = {Findings of the Association for Computational Linguistics: ACL},
      year = {2024}
    }
  3. Can language models be used for real-world urban-delivery route optimization?
    Yang Liu, Fanyou Wu, Zhiyuan Liu, Kai Wang, Feiyue Wang, Xiaobo Qu
    The Innovation 2023
    @article{liu2023can,
      title = {Can language models be used for real-world urban-delivery route optimization?},
      author = {Liu, Yang and Wu, Fanyou and Liu, Zhiyuan and Wang, Kai and Wang, Feiyue and Qu, Xiaobo},
      journal = {The Innovation},
      year = {2023},
      doi = {10.1016/j.xinn.2023.100520},
      publisher = {Elsevier}
    }
  4. Data Collection and Deep Learning-Based Detection of Wood Growth Rings
    Fanyou Wu, Yunmei Huang, Bedrich Benes, Charles Warner, Rado Gazo
    Information Processing in Agriculture 2023
    @article{wu2023ring,
      title = {Data Collection and Deep Learning-Based Detection of Wood Growth Rings},
      author = {Wu, Fanyou and Huang, Yunmei and Benes, Bedrich and Warner, Charles and Gazo, Rado},
      journal = {Information Processing in Agriculture},
      year = {2023}
    }

    Tree-ring dating enables gathering necessary knowledge about trees, and it is essential in many areas, including forest management and the timber industry. Treering dating can be conducted on either wood's clean cross-sections or tree trunks' rough end cross-sections. However, the measurement process is still time-consuming and frequently requires experts who use special devices, such as stereoscopes. Modern approaches based on image processing using deep learning have been successfully applied in many areas, and they can succeed in recognizing tree rings. While supervised deep learning-based methods often produce excellent results, they also depend on extensive datasets of tediously annotated data. To our knowledge, there are only a few publicly available ring image datasets with annotations. We introduce a new carefully captured dataset of images of hardwood species automatically annotated for tree ring detection. We capture each wood cookie twice, once in the rough form, similar to industrial settings, and then after careful cleaning, that reveals all growth rings. We carefully overlap the images and use them for an automatic ring annotation in the rough data. We then use the Feature Pyramid Network with Resnet encoder that obtains an overall pixel-level area under the curve score of 85.72% and ring level F1 score of 0.7348.

  5. Deep dispatching: A deep reinforcement learning approach for vehicle dispatching on online ride-hailing platform
    Yang Liu, Fanyou Wu, Cheng Lyu, Shen Li, Jiepin Ye, Xiaobo Qu
    Transportation Research Part E 2022
    @article{liu2022learning,
      title = {Deep dispatching: A deep reinforcement learning approach for vehicle dispatching on online ride-hailing platform},
      author = {Liu, Yang and Wu, Fanyou and Lyu, Cheng and Li, Shen and Ye, Jiepin and Qu, Xiaobo},
      journal = {Transportation Research Part E},
      year = {2022},
      doi = {10.1016/j.tre.2022.102694}
    }

    The vehicle dispatching system is one of the most critical problems in online taxi-hailing platforms, which requires adapting the operation and management strategy to the dynamics of demand and supply. In this paper, we propose a single-agent deep reinforcement learning approach for the vehicle repositioning problem by reallocating vacant vehicles to regions with a large demand gap in advance. The simulator and the vehicle repositioning algorithm are designed based on industrial-scale real-world data and the workflow of online taxi-hailing platforms, ensuring the practical value of our approach. Besides, the vehicle repositioning problem is translated in analogy with the load balancing problem in computers. Inspired by the recommendation system, the high concurrency of repositioning requests is addressed by sorting the actions as a recommendation list, whereby matching action with requests. Experiments demonstrate that the proposed approach is superior to the existing ones. It is also worth noting that the proposed approach won first place in the vehicle repositioning task of KDD Cup 2020.

  6. Deep BarkID: A Portable Tree Bark Identification System by Knowledge Distillation
    Fanyou Wu, Rado Gazo, Bedrich Benes, Eva Haviarova
    European Journal of Forest Research 2021
    @article{wu2021bark,
      title = {Deep BarkID: A Portable Tree Bark Identification System by Knowledge Distillation},
      author = {Wu, Fanyou and Gazo, Rado and Benes, Bedrich and Haviarova, Eva},
      journal = {European Journal of Forest Research},
      year = {2021},
      doi = {10.1007/s10342-021-01407-7}
    }

    Species identification is one of the key steps in the management and conservation planning of many forest ecosystems. We introduce Deep BarkID, a portable tree identification system that detects tree species from bark images. Existing bark identification systems rely heavily on massive computing power access, which may be scarce in many locations. Our approach is deployed as a smartphone application that does not require any connection to a database. Its intended use is in a forest, where internet connection is often unavailable. The tree bark identification is expressed as a bark image classification task, and it is implemented as a convolutional neural network (CNN). This research focuses on developing light-weight CNN models through knowledge distillation. Overall, we achieved 96.12% accuracy for tree species classification tasks for ten common tree species in Indiana, USA. We also captured and prepared thousands of bark images—a dataset that we call Indiana Bark Dataset—and we make it available at https://github.com/wufanyou/DBID.

  7. Wood Identification Based on Longitudinal Section Images by Using Deep Learning
    Fanyou Wu, Rado Gazo, Eva Haviarova, Bedrich Benes
    Wood Science and Technology 2021
    @article{wu2021wood,
      title = {Wood Identification Based on Longitudinal Section Images by Using Deep Learning},
      author = {Wu, Fanyou and Gazo, Rado and Haviarova, Eva and Benes, Bedrich},
      journal = {Wood Science and Technology},
      year = {2021},
      doi = {10.1007/s00226-021-01261-1},
      volume = {55},
      number = {2},
      pages = {553-563}
    }

    Automatic species identification has the potential to improve the efficacy and automation of wood processing systems significantly. Recent advances in deep learning allowed for the automation of many previously difficult tasks, and in this paper, we investigate the feasibility of using Deep Convolutional Neural Networks (CNNs) for hardwood lumber identification. In particular, we tested two highly effective CNNs (ResNet-50 and DenseNet-121) as well as lightweight MobileNet-V2. Overall, we achieved 98.2% accuracy for 11 common hardwood species classification tasks.