Creating Parsers for Unformatted Text

Older systems produce unformatted data. They were intended for people to read, not systems to process. This section addresses how to parse this data using FPL.

How to write parsers for Ingext

Because there are many formats in the syslog output, we first need to create a structure that detects multiple types of formats and then detect when there is a format failure. To do this we break the problem to three parts. The first one is to find a lists of regex expressions. The second part is creating a function that goes through this list and compares it to the message. Finally, a third part to handle the results.

Part One: A list of patterns

First, we're going to create a simple array structure that gives the rule a name and then also contains the regex pattern. We are going to use a backtick ‘`’ in the regex pattern to avoid the complications of escaping out escape characters, the backslash ‘\’.

let patterns = [
          {
              name: "pattern name",
              pattern: `^regular expression pattern$`
          }
]

Part Two: Pattern Search

The Fluency processing language allows us to use logic in the analysis. This means that we can use a for loop. For checking the patterns, we will go through each possible pattern and compare it to the message if there is a match. Then the function returns the result, otherwise it returns an undefined value meaning that we did not detect a match.

function checkPatterns(patterns, message) {
for let i = 0; i < len(patterns); i++ {
  let result = regexp(patterns[i].pattern, message)
   if (result) {
     result.parserName = patterns[i].name
     return result
     break
   }
}
return undefined
}

Part Three: Update Record with Findings

Lastly, we need to update the object with the results. There are two possibilities, one being that we detected something the other is that the detection failed. If the result is not undefined then we put the data into the at fields position, otherwise the if fails and we update the tag to let us know that this particular parse failed. Because we can be looking at multiple patterns of searching we might have a different parser for a condition tested. When we tag the failed parser we want to know which branch of the search we are in. If there are no sub branches you do not need to put a parser subname.

obj["@parser"] = "LinuxProcessorSyslog"

if (result) {
           obj["@fields"] = result
           return "pass"         
}
    
obj["@tags"] = ["Failed <parser subname> Parse"]
return "pass"

Complete Example

// Data input format: ({ obj, size, source }) or ( doc )
function main({obj, size, source}) {
  // If no message, abort and move on.
  let msg = obj["@message"]
  if (!msg){
      return {"status":"abort"}
  }
  
 // Output field settings
  obj["@type"] = "event"
        
  // Look at the facility to see the application generating the syslog message
  if (obj["@facility"] == "authpriv") {
     let patterns = [
        {
          name: "command",
          pattern: "^(?P<Username>\\S+) : TTY=(?P<TTY>\\S+) ; PWD=(?P<PWD>\\S+) ; USER=(?P<User>\\S+) ; GROUP=(?P<Group>\\S+) ; COMMAND=(?P<Command>\\S+)$"
        },
        {
          name: "user_action",
          pattern: "^(?P<Module>\\S+)\\((?P<Context>\\S+)\\): (?P<Action>.+?) for user (?P<User>\\S+)$"
        }
        ]
        
       let result = checkPatterns(patterns, obj["@message"])
       obj["@parser"] = "LinuxProcessorSyslog"

       if (result) {
         obj["@fields"] = result      
         return "pass"
       }
  
        obj["@tags"] = ["Failed Authpriv Parse"]
        return "pass"
  }
  
  if (obj["@facility"] == "daemon") {
      let patterns = [
          {
              name: "daemon log",
              pattern: "^(?P<Timestamp>\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2} UTC) \\| (?P<Source>[A-Z]+) \\| (?P<LogLevel>[A-Z]+) \\| \\((?P<Location>[^\\)]+)\\) \\| (?P<Message>.+)$"
          },
          {
              "name": "Daemon Log",
              "pattern": "^(?P<action>:(write|read|connect|error|status|config): (?P<message>.+)|(Starting|Stopping) daemon\\.\\.\\.)$"
          }
      ]
      
       let result = checkPatterns(patterns, obj["@message"])
       obj["@parser"] = "LinuxProcessorSyslog"

       if (result) {
         obj["@fields"] = result
         return "pass"
       }
  
      obj["@tags"] = ["Failed Daemon Parse"]
      return "pass"
    }
    
  // No subparsers matched, return abort to send to next processor.
  return "abort"
}

// Global Funcitons
function checkPatterns(patterns, message) {
for let i = 0; i < len(patterns); i++ {
  let result = regexp(patterns[i].pattern, message)
   if (result) {
     result.parserName = patterns[i].name
     return result
     break
   }
}

return undefined
}