Skip to content
NOW AVAILABLE Feature Release! Learn About Our Enhanced Capabilities for Prioritizing Remediation CTEM Prioritization >>

Authored by: PlexTrac Author

Posted on: August 19, 2025

How Do I Pentest My LLM?

In the world of cybersecurity, AI is the perpetual topic du jour, and more specifically Generative AI. The use of LLMs for all kinds of use cases is the craze and the AI ecosystem continues to move at a rapid pace. When it comes to pentesting, the job of every tester is to keep up with the evolution of technology to identify major weaknesses and vulnerabilities within these new and evolving solutions. It’s the blessing and curse of pentesting—you get exposed to new and emerging technology but also bear the burden of needing to learn quickly how to attack it. The world of AI and LLMs is no different. 

What are some things pentesters should know when it comes to pentesting LLMs? 

In many respects, a tester must treat AI as any other application or underlying component within a larger application. The fundamental approaches in application security testing still apply—such as input validation, unauthorized data leakage, or even business logic flaws. However, AI and LLMs bring their own unique attack vectors that should be included in your methodology. We’ve compiled a list of great resources that can help you on your AI pentesting journey.

The lowdown on methodologies and frameworks

  • Known TTPs via MITRE ATLAS
    It’s always helpful to know what kinds of techniques you should be testing against an LLM as well as what known threat actors are using in their attacks. This is why MITRE ATLAS is a great resource. It’s structured similarly to MITRE ATT&CK broken down into the major tactics, then techniques within those tactics, and individual procedures to test within the techniques. This is a great resource for helping you lay out your test plan and being a reference for new test cases. Just as with ATT&CK, ATLAS has a community around it and updates its set of TTPs regularly. ATLAS may prove to be the best resource for staying current on TTPs for attacking AI systems. Keep this resource close as you build your testing program around AI and LLMs.
  • OWASP Top 10 for LLM Applications
    OWASP always does a great job of highlighting the most common and impactful vulnerabilities as it relates to application security. They have produced a handy Top 10 guide for LLM in similar fashion to their traditional Top 10 for web applications. This guide will help ensure you’ve covered the most prolific vulnerabilities in the world of LLMs.
  • NIST RMF
    NIST has produced an AI Risk Management Framework to support organizations when discussing risks related to AI and LLMs. This serves as a great communication tool when writing up your findings and being able to correlate them back to specific controls. This also can serve as a reference for your remediation recommendations when writing up your findings.

Tools to support test automation 

Frameworks and methodologies serve as a great resource, and the next focus in pentesting is “how” to do it. Obviously, as pentesters, we can learn how technologies work and test them as best as we can, but having tools to support test automation is very helpful. There are a lot of tools available and the list will continue to grow. It’s hard to provide an exhaustive list, but here are some recommendations based on our experience in the market as well as some references from others

Test your skills 

If you want to get access to a playground or challenges to test your skills in a non-production format, Microsoft recently announced its playground labs. Stay tuned as more labs and CTFs become available in the industry. The world of AI and LLMs is nascent, but as AI red teaming and pentesting matures, more and more CTF type exercises will become available.

Conclusion: Maintain a continuous testing program

In conclusion, the resources for pentesting AI and LLMs will continue to grow and the resources we highlighted in this article will continue to expand and stay current. Due to the rapidly changing nature of AI, it’s important to implement a continuous testing process. You should be pentesting continuously in some fashion regardless of the target, but when it comes to AI, you must prioritize continuous testing. AI models and implementations change daily and that level of change can change your exposure and risk in a volatile way. The best way to maintain a continuous testing program is to ensure your reporting and tracking of these engagements and resulting vulnerabilities are reported in a streamlined and automated manner. PlexTrac provides the platform to help you in your continuous testing, reporting, tracking, and remediation lifecycle. 

PlexTrac Author
PlexTrac Author At PlexTrac, we bring together insights from a diverse range of voices. Our blog features contributions from industry experts, ethical hackers, CTOs, influencers, and PlexTrac team members—all sharing valuable perspectives on cybersecurity, pentesting, and risk management.

Liked what you saw?

We’ve got more content for you

Request a Demo

PlexTrac supercharges the efforts of cybersecurity teams of any size in the battle against attackers.

See the platform in action for your environment and use case.