Detecting AI-Paraphrased Code Theft

This research introduces novel methods to detect LLM-paraphrased code and identify which LLM was used for paraphrasing, addressing critical IP protection challenges.

Achieves 95% accuracy in detecting code that has been paraphrased using LLMs
Successfully identifies which specific LLM was used for paraphrasing
Utilizes unique coding style features that persist even after paraphrasing
Demonstrates effectiveness across multiple programming languages

This research provides essential security tools for organizations to protect proprietary code in an era where AI can be used to disguise code theft, offering a technical countermeasure to an emerging security threat.

Detection of LLM-Paraphrased Code and Identification of the Responsible LLM Using Coding Style Features