Getting Regex From ChatGPT
Using ChatGPT for helping with Signature writing.
Setting the Prompt
I will be providing log messages.
When I provide a prompt with:
Provide Fpl Regex for: "message"
Please provide a JSON response with the following format:
```json
{
name: "<Log Type Name>",
pattern: `<regex pattern>`
}
```
Include a short description of the pattern, less than 400 words.
Reply with "I am ready"
Providing the above prompt to ChatGPT will allow us to get smaller responses from ChatGPT.
It is important to recognize that ChatGPT will sometimes stray from a prompt. Just reissue the prompt to get it back on track. Also, ChatGPT can start using other RegEx patters, like double quotes and non-labeled regex. In this case correct the mistake and provide the correct response back to ChatGPT in the prompt.
Providing a Sample
Using a sample is the best way to make sure the signature is correct. If you used the parser framework, you will tag the data that is not parsed with a label in the format Failed <parser_name> Parse
Searching this in the Events Search will get us a listing of all messages that still need to be parsed.
Searching
Use the "Event Search" to get a list of unparsed messages.
Unparsed messages will show up with the default and then the message. The Tag will be visible (in this case "Failed Daemon Parse"). We can just Cut&Paste this into out ChatGPT prompt.
Note: That we code have displayed the message in the JSON format by opening the message up, and change the view to JSON.
Using our Prompt
In this case out prompt is:
Provide Fpl Regex for: "seelog internal error: invalid argument"
And ChatGPT responded with:
The copy code section is what we want. The rest is there in case we need to understand the variable groupings if we need to debug. Click copy code,
{
name: "Seelog Internal Error Log",
pattern: `^seelog internal error: (?P<message>.+)$`
}
Now we can insert this into our code as part of the patterns
array. Order matters, place common regex in the front, and uncommon in the of the array. We can adjust this latter by using the facet to see the actual parser distribution amounts.
Testing
To test we are going to the Code Editor page.
As a reminder (going clockwise from the top left)
- Editor: This is the processor code
- Input: This is the incoming data to the processor when we are testing. We are going to use this. This input will be used for when we hit the "Run Test" button.
- Output: This is the outgoing result of the code when we hit the "Run Test" Button.
- Console: This is the standard out. This data is not part of the pipe, but instead part of the system output.
Place a frame into the Input
We need to place the message in the manner it appears from the data source. A syslog message when received will have a structure like the following.
{
"obj": {
"@facility": "daemon",
"@level": "info",
"@message": "Replace this",
"@parser": "YourParserName",
"@sender": "ip-sample",
"@source": "ip-sample",
"@tags": [],
"@timestamp": 1719234357000,
"@type": "event"
},
"props": {},
"size": 0,
"source": ""
}
Cut and Past this into the Input window.
Test the response from ChatGPT
We can now use this JSON structure and put our message into the format. Make sure the @faility
matches.
Hit "Run Test", and the Output shows the new Event message, while the Console shows us that the pattern was found.
Refining the Rule
This is where human intelligence is better. We have the rule but it is for a specific product. This should be a variable. Let's be lazy and tell CatGPT to correct the issue.
We can replace the JSON snippet with our previous and rerun the test.
It worked. The output now shows that log type as a variable.
More
This was a simple regex that was easy for ChatGPT. But besides not seeing the variable name, ChatGPT sometimes make regex pattern mistakes. That is why we need to test.
Also, it misses when there are variables. We can address this too by asking ChatGPT if there are any other commands and variations. The caveat is you still need a sample to test it, as ChatGPT can often make up variations that do not exist.
Updated about 2 months ago