LLMs for Penetration Testing Education

This study evaluates the effectiveness of Large Language Models in conducting realistic penetration testing tasks to support cybersecurity education.

GPT-4o demonstrates superior performance across 15 representative penetration testing scenarios
All evaluated models struggle with technical tasks requiring environmental interaction
LLM performance varies significantly based on task complexity and technical requirements
Strategic integration of LLMs could enhance cybersecurity training while addressing current limitations

For cybersecurity educators, this research provides critical insights on how to effectively leverage LLMs as supplementary tools while understanding their current limitations in practical security training environments.

Towards Supporting Penetration Testing Education with Large Language Models: an Evaluation and Comparison