Timezone: »
Recent advances in large language models (LMs) have facilitated their ability to synthesize programming code. However, they have also raised concerns about intellectual property (IP) rights violations. Despite the significance of this issue, it has been relatively less explored. In this paper, we aim to bridge the gap by presenting CodeIPPrompt, a platform for automatic evaluation of the extent to which code language models may reproduce licensed programs. It comprises two key components: prompts constructed from a licensed code database to elicit LMs to generate IP-violating code, and a measurement tool to evaluate the extent of IP violation of code LMs. We conducted an extensive evaluation of existing open-source code LMs and commercial products and revealed the prevalence of IP violations in all these models. We further identified that the root cause is the substantial proportion of training corpus subject to restrictive licenses, resulting from both intentional inclusion and inconsistent license practice in the real world. To address this issue, we also explored potential mitigation strategies, including fine-tuning and dynamic token filtering. Our study provides a testbed for evaluating the IP violation issues of the existing code generation platforms and stresses the need for a better mitigation strategy.
Author Information
Zhiyuan Yu (Washington University, Saint Louis)
I am a Ph.D. candidate in the Department of Computer Science and Engineering at Washington University in St. Louis [(CV)](https://batyu.github.io/zhiyuanyu/files/CV_ZhiyuanYu.pdf). I am currently working in Computer Security and Privacy Lab (CSPL) supervised by Professor Ning Zhang. My research interests include cyber-physical security, adversarial machine learning, and usable privacy. Prior to my Ph.D. journey, I received B.S. degree in Electrical Engineering from Huazhong University of Science and Technology in 2019.
Yuhao Wu (Washington University, Saint Louis)
Ning Zhang (Washington University, Saint Louis)
Chenguang Wang (University of California Berkeley)
Yevgeniy Vorobeychik (Washington University, St. Louis)
Chaowei Xiao (Umich)
More from the Same Authors
-
2021 : Improving Adversarial Robustness in 3D Point Cloud Classification via Self-Supervisions »
Jiachen Sun · yulong cao · Christopher Choy · Zhiding Yu · Chaowei Xiao · Anima Anandkumar · Zhuoqing Morley Mao -
2021 : Auditing AI models for Verified Deployment under Semantic Specifications »
Homanga Bharadhwaj · De-An Huang · Chaowei Xiao · Anima Anandkumar · Animesh Garg -
2021 : Delving into the Remote Adversarial Patch in Semantic Segmentation »
yulong cao · Jiachen Sun · Chaowei Xiao · Qi Chen · Zhuoqing Morley Mao -
2023 : ChatGPT-powered Conversational Drug Editing Using Retrieval and Domain Feedback »
Shengchao Liu · Jiongxiao Wang · Yijin Yang · Chengpeng Wang · Ling Liu · Hongyu Guo · Chaowei Xiao -
2023 Poster: A Critical Revisit of Adversarial Robustness in 3D Point Cloud Recognition with Diffusion-Driven Purification »
Jiachen Sun · Jiongxiao Wang · Weili Nie · Zhiding Yu · Zhuoqing Morley Mao · Chaowei Xiao -
2022 Poster: Diffusion Models for Adversarial Purification »
Weili Nie · Brandon Guo · Yujia Huang · Chaowei Xiao · Arash Vahdat · Animashree Anandkumar -
2022 Spotlight: Diffusion Models for Adversarial Purification »
Weili Nie · Brandon Guo · Yujia Huang · Chaowei Xiao · Arash Vahdat · Animashree Anandkumar -
2022 Poster: Understanding The Robustness in Vision Transformers »
Zhou Daquan · Zhiding Yu · Enze Xie · Chaowei Xiao · Animashree Anandkumar · Jiashi Feng · Jose M. Alvarez -
2022 Spotlight: Understanding The Robustness in Vision Transformers »
Zhou Daquan · Zhiding Yu · Enze Xie · Chaowei Xiao · Animashree Anandkumar · Jiashi Feng · Jose M. Alvarez -
2021 : Contributed Talk-4. Auditing AI models for Verified Deployment under Semantic Specifications »
Chaowei Xiao -
2021 : Contributed Talk-3. FERMI: Fair Empirical Risk Minimization Via Exponential Rényi Mutual Information »
Chaowei Xiao -
2021 : Contributed Talk-2. Do Humans Trust Advice More if it Comes from AI? An Analysis of Human-AI Interactions »
Chaowei Xiao -
2021 : Kai-Wei Chang. Societal Bias in Language Generation »
Chaowei Xiao -
2021 : Contributed Talk-1. Machine Learning API Shift Assessments »
Chaowei Xiao -
2021 : Nicolas Papernot. What Does it Mean for ML to be Trustworthy »
Chaowei Xiao -
2021 : Olga Russakovsky. Revealing, Quantifying, Analyzing and Mitigating Bias in Visual Recognition »
Chaowei Xiao -
2021 : Jun Zhu. Understand and Benchmark Adversarial Robustness of Deep Learning »
Chaowei Xiao -
2021 : Anima Anandkumar. Opening remarks »
Chaowei Xiao -
2021 Workshop: Workshop on Socially Responsible Machine Learning »
Chaowei Xiao · Animashree Anandkumar · Mingyan Liu · Dawn Song · Raquel Urtasun · Jieyu Zhao · Xueru Zhang · Cihang Xie · Xinyun Chen · Bo Li