Tokenizer Transform
The Vector tokenizer transform
parses logs
Configuration
- Common
- Advanced
- vector.toml
- vector.yaml
- vector.json
[transforms.my_transform_id]# Generaltype = "tokenizer" # requiredinputs = ["my-source-or-transform-id"] # requireddrop_field = true # optional, defaultfield = "message" # optional, defaultfield_names = ["timestamp", "level", "message", "parent.child"] # required# Typestypes.status = "int" # exampletypes.duration = "float" # exampletypes.success = "bool" # exampletypes.timestamp_iso8601 = "timestamp|%F" # exampletypes.timestamp_custom = "timestamp|%a %b %e %T %Y" # exampletypes.parent.child = "int" # example
- optionalbool
drop_field
If
truethefieldwill be dropped after parsing.- Default:
true - View examples
- Default:
- optionalstring
field
The log field to tokenize.
- Default:
"message" - View examples
- Default:
- required[string]
field_names
The log field names assigned to the resulting tokens, in order.
- View examples
- optionaltable
types
Key/value pairs representing mapped log field names and types. This is used to coerce log fields into their proper types.
Output
Telemetry
This component provides the following metrics that can be retrieved through
the internal_metrics source. See the
metrics section in the
monitoring page for more info.
- counter
processing_errors_total
The total number of processing errors encountered by this component. This metric includes the following tags:
component_kind- The Vector component kind.component_name- The Vector component ID.component_type- The Vector component type.error_type- The type of the errorinstance- The Vector instance identified by host and port.job- The name of the job producing Vector metrics.
- counter
processed_events_total
The total number of events processed by this component. This metric includes the following tags:
component_kind- The Vector component kind.component_name- The Vector component ID.component_type- The Vector component type.file- The file that produced the errorinstance- The Vector instance identified by host and port.job- The name of the job producing Vector metrics.
- counter
processed_bytes_total
The total number of bytes processed by the component. This metric includes the following tags:
component_kind- The Vector component kind.component_name- The Vector component ID.component_type- The Vector component type.instance- The Vector instance identified by host and port.job- The name of the job producing Vector metrics.
Examples
Given the following Vector event:
{"log": {"message": "5.86.210.12 - zieme4647 [19/06/2019:17:20:49 -0400] \"GET /embrace/supply-chains/dynamic/vertical\" 201 20574"}}
And the following configuration:
[transforms.tokenizer]type = "tokenizer"field = "message"field_names = ["remote_addr", "ident", "user_id", "timestamp", "message", "status", "bytes"]types.timestamp = "timestamp"types.status = "int"types.bytes = "int"
The following Vector log event will be output:
{"remote_addr": "5.86.210.12","user_id": "zieme4647","timestamp": "19/06/2019:17:20:49 -0400","message": "GET /embrace/supply-chains/dynamic/vertical","status": 201,"bytes": 20574}
How It Works
Blank Values
Both " " and "-" are considered blank values and their mapped fields will
be set to null.
Special Characters
In order to extract raw values and remove wrapping characters, we must treat certain characters as special. These characters will be discarded:
"..."- Quotes are used tp wrap phrases. Spaces are preserved, but the wrapping quotes will be discarded.[...]- Brackets are used to wrap phrases. Spaces are preserved, but the wrapping brackets will be discarded.\- Can be used to escape the above characters, Vector will treat them as literal.