NOW AVAILABLE Feature Release! Learn About Our Enhanced Capabilities for Prioritizing Remediation CTEM Prioritization >>

Authored by: PlexTrac Author

Posted on: August 19, 2025

How Do I Pentest My LLM?

In the world of cybersecurity, AI is the perpetual topic du jour, and more specifically Generative AI. The use of LLMs for all kinds of use cases is the craze and the AI ecosystem continues to move at a rapid pace. When it comes to pentesting, the job of every tester is to keep up with the evolution of technology to identify major weaknesses and vulnerabilities within these new and evolving solutions. It’s the blessing and curse of pentesting—you get exposed to new and emerging technology but also bear the burden of needing to learn quickly how to attack it. The world of AI and LLMs is no different.

What are some things pentesters should know when it comes to pentesting LLMs?

In many respects, a tester must treat AI as any other application or underlying component within a larger application. The fundamental approaches in application security testing still apply—such as input validation, unauthorized data leakage, or even business logic flaws. However, AI and LLMs bring their own unique attack vectors that should be included in your methodology. We’ve compiled a list of great resources that can help you on your AI pentesting journey.

The lowdown on methodologies and frameworks

Known TTPs via MITRE ATLAS
It’s always helpful to know what kinds of techniques you should be testing against an LLM as well as what known threat actors are using in their attacks. This is why MITRE ATLAS is a great resource. It’s structured similarly to MITRE ATT&CK broken down into the major tactics, then techniques within those tactics, and individual procedures to test within the techniques. This is a great resource for helping you lay out your test plan and being a reference for new test cases. Just as with ATT&CK, ATLAS has a community around it and updates its set of TTPs regularly. ATLAS may prove to be the best resource for staying current on TTPs for attacking AI systems. Keep this resource close as you build your testing program around AI and LLMs.

OWASP Top 10 for LLM Applications
OWASP always does a great job of highlighting the most common and impactful vulnerabilities as it relates to application security. They have produced a handy Top 10 guide for LLM in similar fashion to their traditional Top 10 for web applications. This guide will help ensure you’ve covered the most prolific vulnerabilities in the world of LLMs.

NIST RMF
NIST has produced an AI Risk Management Framework to support organizations when discussing risks related to AI and LLMs. This serves as a great communication tool when writing up your findings and being able to correlate them back to specific controls. This also can serve as a reference for your remediation recommendations when writing up your findings.

Tools to support test automation

Frameworks and methodologies serve as a great resource, and the next focus in pentesting is “how” to do it. Obviously, as pentesters, we can learn how technologies work and test them as best as we can, but having tools to support test automation is very helpful. There are a lot of tools available and the list will continue to grow. It’s hard to provide an exhaustive list, but here are some recommendations based on our experience in the market as well as some references from others

Test your skills

If you want to get access to a playground or challenges to test your skills in a non-production format, Microsoft recently announced its playground labs. Stay tuned as more labs and CTFs become available in the industry. The world of AI and LLMs is nascent, but as AI red teaming and pentesting matures, more and more CTF type exercises will become available.

Conclusion: Maintain a continuous testing program

In conclusion, the resources for pentesting AI and LLMs will continue to grow and the resources we highlighted in this article will continue to expand and stay current. Due to the rapidly changing nature of AI, it’s important to implement a continuous testing process. You should be pentesting continuously in some fashion regardless of the target, but when it comes to AI, you must prioritize continuous testing. AI models and implementations change daily and that level of change can change your exposure and risk in a volatile way. The best way to maintain a continuous testing program is to ensure your reporting and tracking of these engagements and resulting vulnerabilities are reported in a streamlined and automated manner. PlexTrac provides the platform to help you in your continuous testing, reporting, tracking, and remediation lifecycle.

Book a Demo

PlexTrac Author At PlexTrac, we bring together insights from a diverse range of voices. Our blog features contributions from industry experts, ethical hackers, CTOs, influencers, and PlexTrac team members—all sharing valuable perspectives on cybersecurity, pentesting, and risk management.

Liked what you saw?

We’ve got more content for you

PlexTrac Named in the Gartner® Magic Quadrant™ for Exposure Assessment Platforms

Today I’m excited to share that PlexTrac has been named as a Niche Player in the latest Gartner Magic Quadrant for Exposure Assessment Platforms (EAP). I couldn’t be prouder of our team for this recognition. I wanted to share why this is important for PlexTrac and our customers, as well as why we believe this...

READ ARTICLE

Friends Friday Recap: How AI Is Reshaping Offensive Security And Why Humans Still Matter

The latest PlexTrac Friends Friday podcast episode brought together host Dan DeCloss, PlexTrac’s founder and CEO, and returning guest Rey Bango, a seasoned penetration tester and educator from a Fortune 100 telecommunication company. Dan and Rey revisited a topic from their last podcast episode, over 18 months ago, on how artificial intelligence is reshaping offensive...

READ ARTICLE

The Great Exposure Management Shift: From Point-in-Time Scans to Continuous Resilience

For years, security teams have relied on point-in-time scans and assessments to gauge their organization’s security posture. The results from these efforts, like quarterly vulnerability scans, annual pentests, and compliance audits, have served as the backbone of most vulnerability management programs. But the landscape has changed. Today, assets spin up and disappear in hours, new...

READ ARTICLE

Request a Demo

PlexTrac supercharges the efforts of cybersecurity teams of any size in the battle against attackers.

See the platform in action for your environment and use case.

SUBMIT REQUEST