Skip to content
NOW AVAILABLE Learn About New Metrics Capabilities in PlexTrac Priorities Learn more >>


How AI Impacts the Future of Pentest Reporting and Vulnerability Management

Cybersecurity leader Jason Haddix from Arcanum Security and AI expert Michael Bell from PlexTrac lead an educational session on the transformative impact of artificial intelligence on offensive security processes and workflows. This session delves into how AI can enhance the speed, accuracy, and comprehensiveness of pentest reporting and empower visibility and prioritization in vulnerability management. 


Category: PlexTrac Update Series



Mike. Jason, thank you so much for being here with me today. Mike, I think you’re going to kick us off, give us a little bit of idea of what our audience has in store. So I’m going to hand things over to you.

Yeah, thanks, Jess. Yeah, so I’m Mike Bell. I’m the head of AI at PlexTrac. Just working to provide some knowledge on how we’re using AI and kind of how it’s affecting our industry.
Jason, you want to introduce yourself?

Yeah, absolutely. My name is Jason Haddix. I’ve been in offensive security for about 20 years. I’ve also sat in the CISO seat and now I run my own company of training for offensive security people. And I run a class actually on this topic on using AI in cybersecurity for red, blue and purple teams. So, you know, Michael and the PlexTrac guys are like, hey, you want to do some q and A and just like a cool webinar where we talk about this stuff? And I was more than excited to come on and just kind of brain dump and brainstorm all these really cool topics as we preface. It’s a wonderful, fast moving topic right now.

Yeah. So we’re going to be just touching on a couple of points as a, as an overall overview. We kind of both like to be informal and have a really kind of qoing and a back and forth with you guys. So if anybody has any questions, please jump on that chat and ask those questions. But we’re going to be going over some stuff about enhanced pentest reporting, how we can leverage AI for that vulnerability management, covering how using AI and concerns around data security and customer data user data, and then, you know, a little plug at the end for how we’re implementing it and showing kind of a little demo for that.

So I will move to the next slide. Yeah, so we are, like I said, we’re going to be going over how we can enhance pentest reporting, empowering vulnerability management, and these couple other points. Yeah, Jason, so enhanced pentest reporting, it’s kind of a ball of wax.

Yeah. Yeah. I mean, I’m sure as you guys work with a lot of pentest companies who are designing their pentest reports through PlexTrac. Right. You see a lot of variants in, in what people put in their reports. And one of the things that in my red teaming, I went from pretty much offensive security, then took a break to be a CISO and then came back to red teaming, hardcore red teaming. One of the things that really, when I was doing that really caused a lot of success, was the report, because it’s the final artifact that you give to the client. No matter how good your findings are, if they’re presented poorly, then your whole project is kind of put in a bad light. And so reporting was one of the biggest things that I really wanted to make sure that we were doing really, really well. Not only design, but I wanted to extend on what red teams and pentest teams were doing in the world of reporting. I went out and I basically downloaded every public report that existed on the Internet of a pentest report. There’s some repos out there and stuff that collect these things, and then some people post sample reports and stuff. And I was looking and seeing what was the best and best, and what I found was that, okay, well, there’s a lot of really good pentest reports that present the vulnerabilities in a certain way that are better, fine, and they’re designed well, and they’re written well.

What I found was that the remediation and the defensive part of the reporting was not very good across the whole industry. And so I took a good, long sit down, and I was like, well, how can I make reporting better? And so what I did was I went on a bunch of defensive research to make my pentest reports, 50% finding, and then for every one of those findings, also 50% defensive information for the developers. Because at the end of the day, the burden sits on the developer of the application or the maintainer of the infrastructure. I started using AI to do this, which was fantastic. Worked way better than I thought it ever was. Happy to dive into some of the topics of the things that I was able to put in there. That was the first place where I really use AI in pentest reporting was to basically add a whole bunch of defensive information that I think my clients had never got. And when my clients received these reports, they were like, this is amazing. We have never seen a pentest company go this deep in remediation. And can you do more of this? Our red team reports end up looking like purple team reports sometimes, which is really interesting.

Yeah. And that’s. I kind of found the same thing. And it’s kind of the mindset that we went into with this was to be able to communicate effectively between those testers and the people consuming the reports, because I found the same thing, you know, where I was a pentest for a number of years and same kind of path. And, yeah, just being able to communicate that, like you said, it’s the last deliverable, it better look good. You can do as hard of hacking or as much hacking and get into all the cool things during a pentest. But if you can’t communicate why that’s important to the board or to the person you’re delivering the report to, the report doesn’t really mean anything and you don’t keep business by doing that. So. Yeah, yeah, for sure, yeah.

One of the things I found in the research was that, I mean, technical testers are the first drafters of usually pentest reports. And so the first thing that was really apparent is that they spoke to, in their findings descriptions, they spoke to other testers, right? They spoke, you know, a lot of techie, which is great. I speak Techie and I love Techie, but for there is like a, you know, different, a different audience, you know, in the executive section, and there’s a different audience actually in the findings too, you know, developer tie-ins and stuff like that. And speaking to a developer rather than a security person has to be, you know, has to be something you do.

And then the other place I found was just business impact, right? Like not a lot of people were really tying their vulnerabilities into business impact. The AI was, was great at both of those things. So taking a very technical topic and helping me realize when I was being too technical and breaking it down for another audience, and then also giving me developer focused communication, which was things like you don’t think of every day. So when a developer gets a vulnerability out of a report, a lot of times what testers give them is like a vulnerability scanner output or a burp suite screenshot or something like that. One of these tools that security people use, developers don’t use our same toolsets. And so the first thing that the AI told me to do was developers know how to use Curl, present all of your findings in Curl scripts so that they can reproduce the vulnerability live. And I was like, that is fantastic. We’re doing that for every finding so that developers can reproduce via a Curl script.

And then in the remediation section, I had very specific ideas around how good remediation should be and what remediation meant. And so I broke it down into three levels. One was code-level remediation via a library that you can bake into your application, and then holistic server-level protections that protected against the vulnerability in question. When I prompted ChatGPT to do some of this in AI, it was easily able to break down my mental model into really good recommendations at the developer level for a lot of remediation which was way better than our competition usually.

Yeah, yeah. And that’s, that’s what I kind of saw as well, is getting with that communication gap. I threw up the bridging the communication gap slide. But, yeah, to your point, there’s going to be different audiences. Almost every report, every organization or, you know, if you’re a consultant providing pentesting services, you’re going to hack a bank one day and a know a hospital the next, and mom and pops restaurant, you know, the third. Yeah, yeah. And it’s hard to stay an expert or stay abreast in all of those different industries to be able to communicate what the priorities are for them. You know, it could be HIPAA or it could be PII protection, you know, any of those things. And so.

Oh, yeah, I think that those compliances, to teach our juniors. Yeah, I mean, I don’t know about your experience, but the compliance also like taking what I had from a technical tester point of view and breaking it down into why it matters to them via whatever compliance they’re subject to, was always a long and arduous kind of writing and research journey in every report and with AI, because it’s trained on that data set of what the compliances are. So many people have written about the compliances. AI can do that really quickly and save the report writers a lot of time.

Yeah. And special fine-tuned models that are even further augmented by current compliance documentation and other things like that. It’s just continuing to grow. The whole field is. So with the ability to kind of identify those, those compliance violations will be, you know, it’s going to be a very important topic coming up, especially with, you know, even AI compliance looping back through and into that.

Sam Smith had asked, how can I predict, how can the AI predict vulnerabilities? Is it based on similar vulnerabilities that have been discovered? And would it be better to build a pattern in your development systems or deployment process?

That’s a great question, Sam. I think that, yes, there’s a combination of it. There’s always going to be ways that you can improve your development process. You know, your SDLC can always have, you know, updates to it. But I think that, you know, as we can always develop, hackers and attackers and threat actors are always developing as well, to find new vulnerabilities and to find those things that us as humans on a small scale don’t take into account. And so leveraging AI and different language models to identify those things and pick out those things on a large scale is kind of where I see the industry going.

So this was an interesting one, Sam. So it’s a great question. And what you can do is you can take your vulnerability management, all of your vulnerabilities that you have in your vulnerability management program. You can ask the AI based on subsections of that data to help you discover things like regressions or other places where those vulns might exist on your networks, both internal and external and also similar vulnerabilities. Now is AI the best tool for that always? Sometimes AI is the best tool for that. Sometimes a hybrid solution of AI and actual programming and or scripting is a solution to that.

A lot of people jump to like oh, let the AI do it all itself, right? And we’re not quite there yet, at least in my opinion, to just have like automated pentesting and vulnerability discovery. Even the white papers that are coming out right now are talking about, you know, like we can get somewhat there, but it has to be very discreet in what the AI is looking for. So one of the things that I do in my pentesting is a lot of times I’ll look at, or my bug bounty hunting, I still do bug bounty hunting in my free time is let’s say a CVE comes out on a piece of infrastructure that a client of mine has, right? And let’s just say it’s a very simple vulnerability, like a cross-site scripting vulnerability. And they’ve patched that vulnerability. I have an AI bot where I give them context about the old vulnerability that they’ve patched and secured now and the attack string. And what it’ll do is it’ll change the attack string using different tricks that I usually use as a pentester and give me five or ten new attack strings to see if I can bypass whatever fix they put in there. And so these discrete problems for discrete vulnerabilities, the bot is already really good.

AI is already really good at this as long as you can prompt really well. And this is, this is the new skill set that you have to have as an engineer these days. Not even a security engineer, but an engineer is to know how to lead the bot using prompt engineering. And so I have a whole bunch of those for different vulnerability types. And so sometimes I discover new, you know, technically they’re zero days, they’re new CVE’s and bypasses to stuff, but they’re not technically complicated. They’re still very shallow at the web app level and stuff like that. Vulnerabilities that are like exploits and binary stuff, it’s not there yet.

No, I agree. And I think I was saying, yeah, just add an extra couple of dots for directory traversal even though they said they patched it. Yeah, yeah, and I agree. Another, you know, way that that kind of shines as well is, is on the reverse side of that decoding complex, you know, attack strings. Whether it’s attackers using multiple encodings for base 64 strings to hide their payloads or queries, you know, AI does a fairly good job at kind of deciphering through that and pulling out the required information and really helps you kind of bridge that gap of knowledge and time. You know, a lot of incident responders, it’s a time thing. You just got to sift through everything. You’ll have your tools, but that validation, that initial check of those things is very time-consuming.

In the future, we’ll have systems of agents, and we’re already finding AI agent systems. If you’re not familiar with an AI agent system, it’s the idea of a whole bunch of discrete problems being handled by small little AI, what we call agents, but they’re just small little models to do one specific thing, and then they’re controlled by an overseer AI that controls the whole project. So you could be like, the overseer bot could be like your instruction, the overseer bot could be like, “pentest this network. Here’s all the information I have about it.” And then the overseer bot knows how to split up that project into discrete things and give it to agents. And then the agents are equipped with tools. So some of them can run an attack string, and some of them can do research on an ATT and CK and maybe mutate it like we were talking about.

So these systems are what everybody wants to build towards because they get us more towards autonomous automatic pentesting. But it’s still very, very in its infancy right now. And I’m on the cutting edge of this. I’m researching it every day, and there’s still a lot of stuff to be ironed out in agenting this world right now.

Yeah, I agree. It’s one of the big Owasp Top Tens for LLMs, is giving them too much agency, excessive agency. That’s where I see a lot of that coming in. And a lot of people in our industry, rightly so, are paranoid about how AI is going to interact. And so giving it capabilities to do those sorts of things is definitely a game where I think we need to take deliberate steps in and kind of expand in that space because there is with that top-down level approach, I’ve seen a lot of research papers come out recently where they’re doing that and combining it with sandbox regression testing. So fund search was one of them, where they were doing the bin cap problem and they were able to achieve a higher level of efficiency. And so systems like that, where an AI can kind of be in a sandboxed environment and test things over and over again, I think is a really kind of interesting path that people are going down and could really help penetration testers, exploit developers, those kind of things.

We got one here from Steve Chang. I think we got two, actually good questions. So due to data poisoning, AI could provide false analysis results. Current vuln scans or pentests are already known to have false positives sometimes. How can we use AI and Vuln scans and pentests to reduce false positives instead of adding more false positives to the report?

This is a good question. I think, at least in my experience, I always tell anyone who’s using the current gen of LLMs, which is what most people are using right now, large language models like ChatGPT or Llama or whatever, you can trust, but you got to verify, which is why none of our jobs are going anywhere anytime soon, because there is still hallucination.

My example in my class that I teach about this stuff is the recent Google AI that they had integrated, which was, somebody asked, hey, my cheese is not sticking to my pizza. What do I do? The LLM behind it basically said, hey, if you add a little bit of Elmer’s glue to your pizza sauce, as long as it’s nontoxic, your cheese will stick better to your pizza. And so when people went down the rabbit hole of trying to find out where that training data came for the LLM, it turns out it came from Reddit comments that were like five years ago. And so, you know, one of the things for the model makers is choosing clean and reliable data in order to go into training the model or post-training out all of the ridiculous stuff that is in there. And so obviously nobody wants to put Elmer’s glue in their pizza sauce. So you got to make sure that you can trust at first glance, but you got to verify as well. Yeah, no, I definitely agree.

And I think an added element to that can kind of reduce those hallucinations and inaccurate data is doing grounding methods, having data stores using either rag-based implementations, those sorts of things to kind of give the AI guardrails for your data, so to speak, and for the data that it’s ingested. And more to your point, Steve, with the reducing false positives, they’re continuously growing, models are continuously growing. And our processes for providing data for those models and to train those models or build our different plugins or different sub-agents, as Jason was talking about, we’re going to continuously learn what works better and what doesn’t. So the last wave of AI we got anomaly detection, behavior analytics for firewalls, those kind of things, I think this wave of AI is going to be a lot bigger, but I think that the ability to kind of talk to your data with LLMs and getting that puzzle view kind of solved is where we’re heading with this.

And so taking and asking, do I have anything that could be a false positive? And the LLM knows like, yeah, you have 500 PHP vulnerabilities, everybody does, because you’re one package behind, you know, sorts of things like that. If you’re bringing in a scan, you have to de-dupe it somehow. So I think right now also is that, I mean, like this is where a hybrid system comes into play, right? So let’s say you’re a pentesting company and you have to do vulnerability scans is like the first level of coverage, right? Which everybody does, whether you’re using Nessus or you’re using something, right? Well, the place where LMS can help right now, because like I said, they’ve been trained on all of the documentation and blogging around all these tools that we, as you know, offensive security people use. They’re really great at creating templates.

So the two bots that I use the most are I have a nasal script bot for Tenable Nessus, which builds custom nasal scripts which I can identify vulnerabilities as soon as I hear about them and I just feed it context or the GitHub repo around a proof of concept and it’ll build me NASA script as well as a Nuclei template creator as well. So Nuclei is an open-source vulnerability scanning tool. And since those formats have been around, I mean, nasal scripting has been around forever, and Nuclei is very well documented on the Internet. When I provide it the context around what the vulnerability is, maybe a proof of concept for the vulnerability, it can write me a detection script at like 95 percentile, like success. Like I’ve only had a few, excuse me, I’ve only had a few that have failed before in those two cases. And then you can automatically run those using scripting or your testers can do that. And what that allows you to do is just be a better outfit of pentesting, right?

I mean, first of all, if the, if the scripts are written well, then you can verify there are no more false positives, right? You can verify vulnerability and then it also allows you to be faster than even some of those companies. You know, like sometimes it takes a research company two, three, four days to build a check for a certain vulnerability that comes out because, you know, maybe it’s a smaller deployment of software across the Internet and it’s not as popular, but it’s used in your organization. And so one of the things that’s really great to empower your team with is have a chat internally, whether you use teams or Slack or whatever, you know, have them, um, and bring your developers in. And what we did at one of the companies is basically everyone’s invited to the Security channel and when there’s a new proof of concept, someone just drops in.
They’re like, hey, do we care about this? And then you use like emojis to vote it up or vote it down like Reddit. And people are like, yeah, this is deployed pretty widely and security cares about this because it could be an issue for us. And then, so that decides on whether you need to action it. And then you have an AI bot with all the information that you provided it, build a proof of concept and you can find these vulns and correct them before even sometimes the bigger name vulnerability research companies come out with a check, which was fantastic because we were able to burn down vulnerabilities really quickly. Yeah, the clients love it. You’re able to turn around and nobody has an answer for it. But here you go.

This is how you can validate it, how you can detect it. And I like it too because the conversational aspect, instead of having to know how to structure those queries and do all that kind of stuff and know those languages, the LLMs fine-tuned on that data and what’s important can do it in such a shorter amount of time and then you can just talk to it and say, I also want to include this check or I want to include this ability and because you see something that could be a result of implementing the fix. So yeah, there’s definitely a lot of different avenues that it’s going to be able to assist in.

Kathy asked, which I’ll kick over to you, Jason, is what training do you recommend for beginners to help us understand AI reporting and scripting to verify AI results?

Yeah. So just for, you know, general usage of AI and kind of breaking it down, there’s a lot of new courses coming out. I think one of the key things that you have to learn is a little bit about the models themselves, how they work, and then a lot about prompt engineering to get the models to do what you want. And so I’m a big fan of one not sponsored or anything like that.
I just like them. It’s a company called And the reason I like is I met the guy who’s behind it, his name is Sander, at DEFCON, one of the hacker conferences. And we were talking, and he did this competition where he put up a website in front of a whole bunch of open-source models and then opened it to the world like a CTF competition. And everybody went after these models trying to get it to do certain things that were against its nature. And so then he took all that, analyzed it as a grad student in his university, and then wrote probably one of the most fantastic prompt engineering white papers I’ve ever seen. And he actually won an award for it at the AI Congress Conference. And so he has turned that into a training for normal people. So it’s and he has an introduction to LLM class for absolutely new people. All the way to advanced prompt engineering, all the way to image stuff. If you’ve never worked with the image AI, he talks about that kind of stuff, so. And it’s really affordable, too. So it’s been one of the things that I provide.
Oh, sorry about that. Don’t they prompt hacking and prompt injection methods as well? I think I have a coupl of occasions through them, and it’s, it’s really good course content. I like it. Yeah. And it does a very good job at really breaking everything down and kind of really, really digestible.

Was he the one behind the wizard, like, hacking the wizard? Was that, what? No, that’s. The other one I recommend is Lecarra. So Lecara is a AI security company that has done several competitions. CTFs around one, and they have an open one. It’s called Gandalf. It’s called Gandalf. Or the website is And so this is when you’re done with kind of the learn prompting stuff to get your feet wet, to understand AI and prompting. Then you can go do the CTF, or if you already have some knowledge, you can go to the CTF directly, and it gives you a fantastic eight-level challenge to try to get this wizard to give you his password. And each level is harder and harder. And now I think they have up to 14 levels. And so Lecara is also at the forefront of prompt security and prompt firewalling. And so they’re, they’ve released some excellent papers really recently they did a competition at RSA which was fantastic, of which I played in. So, yeah, so Lecara AI has a lot of really good resources as well.

Yeah. Oh, and to just touch on learn prompting again, obviously they’re not involved, sponsored or anything, but they’re also accessible for cost. I think it’s like $1.39 a month to be able to take any of their courses or anything like that. So if you want to learn about AI and prompting, which is a huge piece of what AI and LLMs are going to be, going to be consuming going forward, it’s a great place to start.

Do we have any other questions? Let’s see.

So for vuln scans and pentests that are run on a regular basis. So Steve asked this one and timely because part of that is up on the screen. So for vulnerability management, it’s kind of a touchy word in AI, but biases are built in not just to AIs and LLMs obviously, but to us as humans. And so when we’re looking at the security data, if we’ve done what we need to do to kind of ground the model and train it and ensure that the quality is high for that model, using that to kind of remove the bias in what we’re looking at. So if we run across the vulnerability, SMB one is enabled and that’s obviously bad, we’re going to do the sniffing and get your hashes and break into everything.

You know, your it manager, your security manager says, oh, well that’s not a high priority because this is, you know, we’ve got these protections, but going through your scans, going through your alerts and having AI kind of look at that as a whole picture using traditional methods of machine learning and deep learning combined with the LLM, you know, it allows us to kind of remove a lot of that bias that we would kind of inject into the process of what we’re going to do for remediation and what findings are the highest priority. And that can be achieved through ingesting threat data, intelligence feeds, log data and generating embeddings for that kind of stuff. And really kind of using those methods, retrieval, augmented generation and a few other things to kind of give you a much better picture of what you have going on for vulnerability and then just doing that continuous validation with methods like Jason mentioned earlier are just invaluable.

Yeah. I mean, as a person who wants to use this just as much as you do, Steve. You know, for cool stuff, I agree that we want, you know, what everyone’s trying to sell you right now in security products across the board, as a consumer of security products, is the idea of these autonomous like oracles that take all of this log data and tell you how to be easily secure and do all of your rules for you. And we’re not there yet.

I believe in a future state, we will be there. So right now, your tools that do the vuln scans and vulnerability management and stuff, that data can be augmented by AI, can be cleaned, enriched by AI. But right now, it’s not a full Oracle system that just like, is this awesome Cortana from Halo thing that’ll just tell you, hey, let’s do this thing. And we need to, there’s a lot of context that needs to be built into these bots. And even in RAG, which is, if any of you haven’t heard of retrieval augmented generation, it’s the idea that before you send a question to your actual model, you include a whole bunch of extra context from documents or from spreadsheets or from whatever to make the answer better from the model. And it can pull answers directly from your documents. So it’s like talking to your documents.

And even with RAG, the implementations are still getting figured out right now. It’s not like easy and as fine-tuned as you want it, but what you can do is in the proactive detection kind of world, is you can, like I said, build checks and stuff. So in the defensive world, let’s say you want to get more towards proactive detection. Well, the bots right now are really good at building threat hunting rules, detection, engineering rules, Splunk queries. And so you can automatically build these types of queries or templates to do things to empower your people and get closer towards proactive detection like you’re talking about, right? So a lot of teams right now, I know, are like, there’s always a Splunk guy at every organization, right? Like a lot of people use Splunk. I mean, not everybody uses Splunk, right? But you have a SIM, and you have one person there who’s usually your SIM expert, who knows the query language, who knows how to build the, you know, the alerts, who knows how to tune it really well. And that’s like a hard job to do as one person.
The AI can scale your one person to the power of two to three to four based on how fast it can help you build these detection rules. But there’s still a human in the loop right now for a lot of these things.

So there’s also other ways to scale. Right. Let’s say you find, you know, a vulnerability, like I said, that exists one place and you find it via pentest report because your pentest vendor gave it to you. Well, your pentest person, or you can take that finding and try to scan, you know, build a template, nuclei, and scan across your whole infrastructure and remediate holistically that whole finding, which it’s closer towards proactive detection. And, you know, even your pentest vendors can do that for you. Right? They’re definitely capable of doing that for you.

Yeah, no, I agree. And kind of part of what you were alluding to was cutting down time and cutting down that kind of stuff. And it’s a good segue into how the reporting can benefit from using it. And I think the big one is the last bullet. A lot of pentesters, a lot of cybersecurity people will just grind through and get things done. And it takes time, lots and lots of time. And so, you know, saving time, which saves the business money more efficiently. But the biggest thing, I think, is burnout, you know, being able to leverage the tools that you have available to you to reduce the time that you’re spending on something and reduce those iterations that it takes to, you know, write a report. It’s what, what PlexTrac was kind of founded as was to reduce the time it takes to write reports, make them more consistent. And so I think this is just the next evolution for that, to help testers and kind of increase the overall cybersecurity posture of the organizations that we help and subsequently the country with everything going on right now. So, but I think that those kind of things, it’ll allow testers to kind of take a step back and be able to focus more on their technical work and implementing those things, helping, advising the doers on how to fix it.

Yeah, I wouldn’t be too scared right now. The things it’s good at right now are helping us parse documentation, building queries and templates. It’s really good with things, with words, but it’s not very good yet at complex workflows, connecting the dots, being creative in technical disciplines, and it’s also never going to find anything that hasn’t been already talked about in its training data a lot of times. And so we’re still going to need technical people to come up with really creative solutions to problems. I don’t see it really taking anyone’s job. I just see it as a tool to be aware of and use to your advantage, and just like a new hacking tool comes out that’s awesome, you need to learn it, and it’s, you know, not sometimes it’s not trivial, and, you know, this is another tool in your tool belt to be a better, you know, technical person.

Yeah, I agree. It’s. It’s not the hammer for the thumbtack. You know, a lot of people think everybody’s just going to throw AI and LLMs are going to solve, you know, the world’s problems, and it is going to help, you know, it already is. But, yeah, to your point, I think that there’s a lot of work to be done in continuous validation, and I don’t think that the human in the loop is really ever going to be removed wholly. I think that, you know, at a certain point, we’re going to hit an inflection point of AI-generated data replacing human-generated data, and then what is it learning off of? It’s learning off of its own generated data. Yeah. Which is scary.
There’s a lot of white papers discussing that right now.

Yeah. Yeah. So it’s definitely some interesting problems that the industry is dealing with. And I think there’s a lot of great papers out there. I’ll post some stuff on my LinkedIn if anybody’s interested in looking at it. But there’s many research papers about how we’re solving this and how we’re trying to just increase security across the board for everyone. And, you know, for us, at least we were before we hit the hour here, where the biggest thing, I think, when interacting with these models is knowing the limitations and knowing what you’re doing with them. I think that, you know, data security is. Is a big piece that not a lot of people are thinking about. They’re used to going to Google and just typing in, you know, I need to buy this product, and it pops up. And I think that ChatGPT and other systems like that, they are providing that same level of convenience. And so users are just doing that without thinking of the follow on implications or where that data is going. And so, especially in cybersecurity, from a pentester or technical executor position, putting those sensitive pieces of data or that sensitive data in there can be bad for those specific clients.

But, you know, from an organizational perspective, like a CSO, controlling where your organization is putting that data is also going to be key, because if you have employees that are just dropping it in different locations, you end up with a situation where proprietary data is revealed during responses from ChatGPT. So I think it’s organization reining in. That data privacy thing is going to be big. One of the basic examples right there was, I think it was Samsung, where their employees started using ChatGPT and then had opted into self training on their data. And so then they started seeing their proprietary information and code come out as answers in ChatGPT. So they had to ban their developers from using it internally. So that’s the first thing you think about as far as your proprietary data can end up in the training sets, the post training sets.

Everybody right now is scrambling to build their own model or take one of the open source models that exists that are licensable for you to use internally and deploy that internally, and then anybody wants to use AI can use that internal model never goes out to the Internet. The only people that control it is you. And until really recently, we hadn’t had anything that had really hit the quality of standard of GPT 4. But in the last few months, we’ve gotten Llama3, which gets very close, not quite there, but very close to the standard. And you can deploy that internally at your organization, and you can add rag to it and post train it for your purposes. And so everyone’s scrambling right now to deploy Llama internally to be their, their internal model right now, which is, you know, I’ve got a box right here running, you know, Llama for my specialized use cases. So, yeah, that’s, that’s kind of, kind of where everyone’s going.

Yeah, there’s lots of ways that, you know, individual practitioners can, can get these things on their local box, get the models, smaller models and run them on there and kind of do all this pretty securely. And, you know, we kind of had that same sort of mentality going into this, was that continuous training loop and those kind of things. It is good to increase quality of model responses, but when you’re dealing with sensitive pentesting data and you’re dealing with sensitive customer data that they don’t want models to be trained on because that could potentially be repeated back. I think with ChatGPT, all you had to do is ask it to repeat poem a bunch of times and it would start dumping training data for the model been patched. But, yeah, so there’s all kinds of new vulnerabilities coming out, and AI and LLMs are not immune to those same types of vulnerabilities.

And so us coming into it, we really wanted to figure out how to do it and how to do it securely. And so static models, not continuous training, those kind of things, but really accentuating it with the other methods and grounding it were very important. And so we’ve got the benefits up here on the slides. But yeah we’ve got the speed up of reporting which we’ve kind of talked about a bunch on here and the communication. But I think another good piece, the last point, increasing security data awareness is going to be a big point that we’re going to be pursuing to incorporate threat intelligence and those other things to kind of keep the practitioners abreast and reports current with accurate and current security data. I mean to be honest, if you haven’t used PlexTrac new AI feature, just as a consumer, it’s a little button in the bottom of each section when you build a finding or a report to help you write out those sections. And this is not because they brought me on the website. It is absolutely fantastic.

If you’ve ever written reports before, you know how much of a pain it could be, right? It’s like a lot of times custom vulnerabilities that aren’t in a database already you’re completely writing those from scratch. You can make typos, you can describe something too technically like we talked about. And the button, you just push it and magic happens and out comes a description, technical remediation advice and yes you have to review it like everything, but the model is really really good. And honestly it’s been a lifesaver in writing reports for our stuff.

And then you can take those that you make in — Mike, correct me if I’m wrong — you can add them to the global database that you can use on any of your reports, right? Yeah, so we have, and they work really good. So PlexTrac uses another mechanism called shortcodes. It works really well with shortcodes if you utilize those. But yeah, you can generate those descriptions, those recommendations or entire findings and save those to your contentDB, you can go back and bring in old ones, import them into your report and even use AI to additionally refine those. I can give you a quick run through if you’d like.

So let’s see. But yeah, it’s been really great. I mean I was a user of PlexTrac before I joined the team and it already helped me significantly as a tester and as a cybersecurity practitioner. But this kind of, again it just grows on that and evolves it to what we’re looking at. So we have here we have our server-side code injection vulnerability description. We can either review it and edit it or we have missing recommendations right here. I can hit use AI and give it a couple of seconds to process and it will come back with recommendations that are, they work with that blend between technical and strategic recommendations for these findings, the first kind of reports usually consumed by those executives. And so having those high-level recommendations are big.
So as you can see over here, we have the overview of what the vulnerability is and then overall recommendations, and strategic recommendations for this vulnerability. So the normal input validation, sanitizing input. So that’s based off of a generic kind of finding description. But where this platform kind of really shines is Pentester takes their raw notes from their test, which are not pretty at best, and put them in here in their raw form. And our system will do its best to reformat that and pick out the important pieces and then enrich it with the data that you already have in the platform for that report or for that client, and doing all that securely, making sure that it only has access to the data that it should have access. And then we don’t know. Insert it.

I don’t know if it, I don’t know if this is spoilers, Mike, but are you, you know, like just as a consumer, I want to know. Right. So let’s say I want to write a whole new section, which I’ve done with you guys before, is build remediation sections that were defensive-oriented. And so I have no, I have more, I have more sections of each finding. Can I, will I in the future be able to tailor the system prompt to make those sections, you know, like customizable?

You really put me on the spot, aren’t you?

I’m sorry, man. I’m sorry.

Without saying too much. Yeah, that’s, it’s a, it’s an intent for us to move forward on, on that kind of avenue, starting out with kind of a set killer feature. Yeah. Yeah. So, and already we can kind of add a custom field here and, you know, give it a key. So if we want strategic, if I can spell insights. Not that one, we can come down here and ask it for strategic insights. Now, obviously, we’re not giving it a value to begin with, so we’re not really giving it much to work off of. It’s just. Yeah, it doesn’t have a lot of context yet. Exactly. And so normally, where we’re heading with it is to learn off of or emulate what you already have in there for different strategic insights to look at that do a context evaluation and a mimicry. But we can pretty much put whatever we want into the key and label for our custom fields and add those.

That’s super cool.

Yeah, we do plan on adding some additional functionality pretty quickly.

Very cool. Yeah. So. But yeah, we just, I. As a practitioner, as an old user, I think this is, you know, it’s, it’s going to be great. It’s going to be great for testers to be able to. And that’s, you know, I’m a little biased, of course, but, you know, it’s, we’re really trying to do some cool stuff with this and give back to the community and really enable them to kind of deliver value. So that’s awesome.

I’ll open up the floor, you know, for last-minute questions. I’m also gonna. Sorry, go ahead. I will.
I really want to stop and take questions right now, but we are so short on time. I know we went right up to it, but the conversation was fascinating. And I do want to say we definitely got some really big high-fives from the audience and kudos. People were enjoying the conversation. So I think you got to a lot of the questions that we had, and I know there’s a few that we’re going to have to leave behind. So I do want to assure everyone in the audience that we are going to make sure that PlexTrac gets all of the questions you asked today so they will have a chance to follow up with you. You will get answers back.

And I’m going to kind of sneak in a question here because a few people were asking things like, you know, what kind of training should I do and what’s the, I mean, it’s wonderful. And we got into sort of the foreign and technical side of things, but for somebody that’s maybe looking for those early days learning opportunities and how to set guardrails and kind of get started, I’m wondering if you could give us a little bit of advice for that. And then I think what I’ll ask you to do is transition that into how to reach out to you both and how to continue the conversation with each of you and with PlexTrac as well. Maybe, Jason, we can ask you to kick us off and then, Mike, you can kind of bring us home on that.

I really like IBM actually does a security series on all things AI, LLM and security and securing some of the models. So I think that they have some really great YouTube videos that are absolutely free, so you can go look those up. And then if you have any questions or anything, I am on Twitter jhaddix on the, you know, just at me and if you have any questions about AI stuff, I run a whole class on this. If you’re interested in the more advanced spectrum of how to use AI for security people, for red teaming, blue teaming or purple teaming, I run a class called Red, Blue, Purple AI. It’s linked on my Twitter profile so you can come grab it.

Super smart, too. Great, great. Course, I know a couple people that have taken it and it’s amazing. Yeah, we’ve put up a link on here to see if, if you want to demo the product and, you know, for the training piece as well. There’s, there’s also some additional videos out there for AI processes by hand or cheat sheets on learning, you know, the different terms and things like that for AI and LLMs. And I think those are the, if you’re just touching ground and trying to get into it, learning those terms and what they mean and how they affect the AI, talking about temperature and top k and other micro parameter things, knowing about what those are will really give you a better picture of how you’re interacting with it. And then, yeah, if you want to reach out to me, I’m on LinkedIn and I’m also on X rtificialmike.

Awesome. I so wish we could just keep going with this conversation. We’ll have to do an hour and a half next time just to get through all of the interesting facets here. But I do want to say thank you for taking a very large topic and kind of biting off a piece and making it really interesting and digestible and just a fascinating hour. I hope that we’ll get to digging a little bit further and keep going with this conversation. I hope we’ll have you back again soon. But I want to thank both of you for coming to talk to us today.