Authored by: PlexTrac Author Posted on: August 19, 2025 How Do I Pentest My LLM? In the world of cybersecurity, AI is the perpetual topic du jour, and more specifically Generative AI. The use of LLMs for all kinds of use cases is the craze and the AI ecosystem continues to move at a rapid pace. When it comes to pentesting, the job of every tester is to keep up with the evolution of technology to identify major weaknesses and vulnerabilities within these new and evolving solutions. It’s the blessing and curse of pentesting—you get exposed to new and emerging technology but also bear the burden of needing to learn quickly how to attack it. The world of AI and LLMs is no different. What are some things pentesters should know when it comes to pentesting LLMs? In many respects, a tester must treat AI as any other application or underlying component within a larger application. The fundamental approaches in application security testing still apply—such as input validation, unauthorized data leakage, or even business logic flaws. However, AI and LLMs bring their own unique attack vectors that should be included in your methodology. We’ve compiled a list of great resources that can help you on your AI pentesting journey. The lowdown on methodologies and frameworks Known TTPs via MITRE ATLASIt’s always helpful to know what kinds of techniques you should be testing against an LLM as well as what known threat actors are using in their attacks. This is why MITRE ATLAS is a great resource. It’s structured similarly to MITRE ATT&CK broken down into the major tactics, then techniques within those tactics, and individual procedures to test within the techniques. This is a great resource for helping you lay out your test plan and being a reference for new test cases. Just as with ATT&CK, ATLAS has a community around it and updates its set of TTPs regularly. ATLAS may prove to be the best resource for staying current on TTPs for attacking AI systems. Keep this resource close as you build your testing program around AI and LLMs. OWASP Top 10 for LLM ApplicationsOWASP always does a great job of highlighting the most common and impactful vulnerabilities as it relates to application security. They have produced a handy Top 10 guide for LLM in similar fashion to their traditional Top 10 for web applications. This guide will help ensure you’ve covered the most prolific vulnerabilities in the world of LLMs. NIST RMFNIST has produced an AI Risk Management Framework to support organizations when discussing risks related to AI and LLMs. This serves as a great communication tool when writing up your findings and being able to correlate them back to specific controls. This also can serve as a reference for your remediation recommendations when writing up your findings. Tools to support test automation Frameworks and methodologies serve as a great resource, and the next focus in pentesting is “how” to do it. Obviously, as pentesters, we can learn how technologies work and test them as best as we can, but having tools to support test automation is very helpful. There are a lot of tools available and the list will continue to grow. It’s hard to provide an exhaustive list, but here are some recommendations based on our experience in the market as well as some references from others Burpference Pynt Github Zap by Checkmarx Arcanum security bot Test your skills If you want to get access to a playground or challenges to test your skills in a non-production format, Microsoft recently announced its playground labs. Stay tuned as more labs and CTFs become available in the industry. The world of AI and LLMs is nascent, but as AI red teaming and pentesting matures, more and more CTF type exercises will become available. Conclusion: Maintain a continuous testing program In conclusion, the resources for pentesting AI and LLMs will continue to grow and the resources we highlighted in this article will continue to expand and stay current. Due to the rapidly changing nature of AI, it’s important to implement a continuous testing process. You should be pentesting continuously in some fashion regardless of the target, but when it comes to AI, you must prioritize continuous testing. AI models and implementations change daily and that level of change can change your exposure and risk in a volatile way. The best way to maintain a continuous testing program is to ensure your reporting and tracking of these engagements and resulting vulnerabilities are reported in a streamlined and automated manner. PlexTrac provides the platform to help you in your continuous testing, reporting, tracking, and remediation lifecycle. Book a Demo PlexTrac Author At PlexTrac, we bring together insights from a diverse range of voices. Our blog features contributions from industry experts, ethical hackers, CTOs, influencers, and PlexTrac team members—all sharing valuable perspectives on cybersecurity, pentesting, and risk management.
What FedRAMP’s New Vulnerability Management Standard Means for Pentesters and Vuln Managers Breaking Down the New RFC-0012 Standard Under FedRAMP and How It Can Change Your Daily Security Operations If you work in vulnerability management or penetration testing for cloud systems under FedRAMP, buckle up because the new RFC-0012: FedRAMP Continuous Vulnerability Management Standard is going to change how your work is scoped, tracked, and prioritized. The... READ ARTICLE
Beneath the Hat: My Black Hat 2025 Takeaways, Including the AI Imperative As I write this from the airport, the desert heat of Las Vegas is finally fading and I’m reflecting on the whirlwind that was Black Hat USA 2025. For me, this conference is always about two things: the people and the ideas. We hosted our annual Customer Appreciation Night and ran a Pentest Reporting Bootcamp,... READ ARTICLE
Welcome to the Dataverse: Deliver Automated Vulnerability Lifecycle Management Organizations today are living in a fragmented reality—trapped in outdated prioritization and remediation workflows. Prioritization and remediation orchestration often relies on spreadsheets and decentralized coordination. READ ARTICLE