Skip to content
Snippets Groups Projects
processing-log-lines.md 21.84 KiB

Processing Log Lines

A detailed look at how to setup promtail to process your log lines, including extracting metrics and labels.

Pipeline

Pipeline stages implement the following interface:

type Stage interface {
  Process(labels model.LabelSet, extracted map[string]interface{}, time *time.Time, entry *string)
}

Any Stage is capable of modifying the labels, extracted data, time, and/or entry, though generally a Stage should only modify one of those things to reduce complexity.

Typical pipelines will start with a regex or json stage to extract data from the log line. Then any combination of other stages follow to use the data in the extracted map. It may also be common to see the use of match at the start of a pipeline to selectively apply stages based on labels.

The example below gives a good glimpse of what you can achieve with a pipeline :

scrape_configs:
- job_name: kubernetes-pods-name
  kubernetes_sd_configs: ....
  pipeline_stages:
  - match:
      selector: '{name="promtail"}'
      stages:
      - regex:
          expression: '.*level=(?P<level>[a-zA-Z]+).*ts=(?P<timestamp>[T\d-:.Z]*).*component=(?P<component>[a-zA-Z]+)'
      - labels:
          level:
          component:
      - timestamp:
          format: RFC3339Nano
          source: timestamp
  - match:
      selector: '{name="nginx"}'
      stages:
      - regex:
          expression: \w{1,3}.\w{1,3}.\w{1,3}.\w{1,3}(?P<output>.*)
      - output:
          source: output
  - match:
      selector: '{name="jaeger-agent"}'
      stages:
      - json:
          expressions:
            level: level
      - labels:
          level:
- job_name: kubernetes-pods-app
  kubernetes_sd_configs: ....
  pipeline_stages:
  - match:
      selector: '{app=~"grafana|prometheus"}'
      stages:
      - regex:
          expression: ".*(lvl|level)=(?P<level>[a-zA-Z]+).*(logger|component)=(?P<component>[a-zA-Z]+)"
      - labels:
          level:
          component:
  - match:
      selector: '{app="some-app"}'
      stages:
      - regex:
          expression: ".*(?P<panic>panic: .*)"
      - metrics:
        - panic_total:
            type: Counter
            description: "total count of panic"
            source: panic
            config:
              action: inc

In the first job:

The first match stage will only run if a label named name == promtail, it then applies a regex to parse the line, followed by setting two labels (level and component) and the timestamp from extracted data.

The second match stage will only run if a label named name == nginx, it then parses the log line with regex and extracts the output which is then set as the log line output sent to loki

The third match stage will only run if label named name == jaeger-agent, it then parses this log as JSON extracting level which is then set as a label

In the second job:

The first match stage will only run if a label named app == grafana or prometheus, it then parses the log line with regex, and sets two new labels of level and component from the extracted data.

The second match stage will only run if a label named app == some-app, it then parses the log line and creates an extracted key named panic if it finds panic: in the log line. Then a metrics stage will increment a counter if the extracted key panic is found in the extracted map.

More info on each field in the interface:

labels

A set of prometheus style labels which will be sent with the log line and will be indexed by Loki.