A Recursive Large Language Model Framework for Automated Vulnerability Assessment and Penetration Testing
Final Year Engineering Project
Key Innovation: LLMs as reasoning cores enable generalization and contextual understanding. Systems can now adapt to unseen scenarios and generate custom exploitation code.
Contribution: Introduced modular design separating reasoning from code generation
Limitation: Required human-in-the-loop for data transfer
Innovation: Multi-Agent System (MAS) with specialized agents
Focus: Attack tree traversal and cost-effectiveness
Focus: Automated remediation phase
| Phase | Description | Implementation |
|---|---|---|
| Observe | Gather environmental data | Execute scanners (Nmap, curl) |
| Orient | Analyze against knowledge base | LLM + RAG pattern matching |
| Decide | Select optimal action | Priority algorithm: Impact × Probability / Cost |
| Act | Execute command | Sandboxed shell execution |
Forces model to articulate reasoning before action:
Thought: Target runs Jenkins → Plan: Try default creds → Command: curl -u admin:admin
Benefit: Reduces hallucinations by 40-60%
LLMs lack information about vulnerabilities discovered post-training
Rationale: Prevent unintended network access and ensure reproducibility
| Environment | Type | Purpose |
|---|---|---|
| Metasploitable 2/3 | Linux VM | Network service vulnerabilities |
| OWASP Juice Shop | Web Application | Modern web vulnerabilities (XSS, SQLi) |
| VulHub | Diverse scenarios | Standardized benchmark suite |
| Hack The Box | Live platform | Complex attack chains |
Responsibilities:
Priority Formula: Priority = Impact × Probability × (1 / Cost)
nmap -F -T4 (Top 100 ports)nmap -sV -sC -p 80,443Live feed panel displaying agent activity, progress metrics (hosts scanned), and finding alerts.
Note: You can replace this image by updating the src link in Slide 9.
Relationships: Target [1] → [*] Port → [1] Service → [*] Vulnerability
| Metric | Traditional Scanner | LLM Agent (Proposed) |
|---|---|---|
| Vulnerability Coverage | 65% | 89% |
| False Positive Rate | 35% | 8% |
Human Engagement: $2,500+ vs. Automated Engagement: $96
Questions & Discussion
Base Paper: PentestGPT (USENIX Security 2024)
arXiv:2308.06782