Creating Parsers for Unformatted Text
Older systems produce unformatted data. They were intended for people to read, not systems to process. This section addresses how to parse this data using FPL.
How to write parsers for Ingext
Because there are many formats in the syslog output, we first need to create a structure that detects multiple types of formats and then detect when there is a format failure. To do this we break the problem to three parts. The first one is to find a lists of regex expressions. The second part is creating a function that goes through this list and compares it to the message. Finally, a third part to handle the results.
Part One: A list of patterns
First, we're going to create a simple array structure that gives the rule a name and then also contains the regex pattern. We are going to use a backtick ‘`’ in the regex pattern to avoid the complications of escaping out escape characters, the backslash ‘\’.
let patterns = [
{
name: "pattern name",
pattern: `^regular expression pattern$`
}
]
Part Two: Pattern Search
The Fluency processing language allows us to use logic in the analysis. This means that we can use a for loop. For checking the patterns, we will go through each possible pattern and compare it to the message if there is a match. Then the function returns the result, otherwise it returns an undefined
value meaning that we did not detect a match.
function checkPatterns(patterns, message) {
for let i = 0; i < len(patterns); i++ {
let result = regexp(patterns[i].pattern, message)
if (result) {
result.parserName = patterns[i].name
return result
break
}
}
return undefined
}
Part Three: Update Record with Findings
Lastly, we need to update the object with the results. There are two possibilities, one being that we detected something the other is that the detection failed. If the result is not undefined then we put the data into the at fields position, otherwise the if fails and we update the tag to let us know that this particular parse failed. Because we can be looking at multiple patterns of searching we might have a different parser for a condition tested. When we tag the failed parser we want to know which branch of the search we are in. If there are no sub branches you do not need to put a parser subname.
obj["@parser"] = "LinuxProcessorSyslog"
if (result) {
obj["@fields"] = result
return "pass"
}
obj["@tags"] = ["Failed <parser subname> Parse"]
return "pass"
Complete Example
// Data input format: ({ obj, size, source }) or ( doc )
function main({obj, size, source}) {
// If no message, abort and move on.
let msg = obj["@message"]
if (!msg){
return {"status":"abort"}
}
// Output field settings
obj["@type"] = "event"
// Look at the facility to see the application generating the syslog message
if (obj["@facility"] == "authpriv") {
let patterns = [
{
name: "command",
pattern: "^(?P<Username>\\S+) : TTY=(?P<TTY>\\S+) ; PWD=(?P<PWD>\\S+) ; USER=(?P<User>\\S+) ; GROUP=(?P<Group>\\S+) ; COMMAND=(?P<Command>\\S+)$"
},
{
name: "user_action",
pattern: "^(?P<Module>\\S+)\\((?P<Context>\\S+)\\): (?P<Action>.+?) for user (?P<User>\\S+)$"
}
]
let result = checkPatterns(patterns, obj["@message"])
obj["@parser"] = "LinuxProcessorSyslog"
if (result) {
obj["@fields"] = result
return "pass"
}
obj["@tags"] = ["Failed Authpriv Parse"]
return "pass"
}
if (obj["@facility"] == "daemon") {
let patterns = [
{
name: "daemon log",
pattern: "^(?P<Timestamp>\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2} UTC) \\| (?P<Source>[A-Z]+) \\| (?P<LogLevel>[A-Z]+) \\| \\((?P<Location>[^\\)]+)\\) \\| (?P<Message>.+)$"
},
{
"name": "Daemon Log",
"pattern": "^(?P<action>:(write|read|connect|error|status|config): (?P<message>.+)|(Starting|Stopping) daemon\\.\\.\\.)$"
}
]
let result = checkPatterns(patterns, obj["@message"])
obj["@parser"] = "LinuxProcessorSyslog"
if (result) {
obj["@fields"] = result
return "pass"
}
obj["@tags"] = ["Failed Daemon Parse"]
return "pass"
}
// No subparsers matched, return abort to send to next processor.
return "abort"
}
// Global Funcitons
function checkPatterns(patterns, message) {
for let i = 0; i < len(patterns); i++ {
let result = regexp(patterns[i].pattern, message)
if (result) {
result.parserName = patterns[i].name
return result
break
}
}
return undefined
}
Updated 4 months ago