Architecture
Introduction
vhs is a network traffic utility that works by chaining modules loaded from plugins. It offers a high-performance
concurrent architecture for routing data and executing modules that enables users to configure and extend vhs for
a variety of purposes, including traffic recording, replay, live metrics collection, and many others.
Concepts
The architecture of vhs is built around the concept of a data flow, a directed graph that represents the routing of a
stream of data through software components that act on that data stream. In a data flow graph, nodes represent software
components that originate, terminate, or modify the data stream passing through them, and edges represent the data
stream passing between components.
In vhs, the data flow graph looks something like this:
graph LR
src[Source]
in_mod[[Input Modifier]]
in_fmt[[Input Format]]
out_fmt[[Output Format]]
out_mod[[Output Modifier]]
sink[Sink]
src --> in_mod
in_mod --> in_fmt
in_fmt --> out_fmt
out_fmt --> out_mod
out_mod --> sink
Where each node represents a particular type of software component and the edges represent the connections between
those components. The following two subsections will describe the components and connections that make up the vhs
data flow in more detail.
Nodes: VHS Components
Each node in the graph represents a concurrently-executed software component. In vhs, these components fall into four
categories, as listed below:
source: source components originate data streams. A source brings data intovhsfrom somewhere else. This could mean capturing data from a network interface, reading from cloud storage, reading from a local file, etc. More information on the internal architecture of source components can be found here: source architecture. Information about the sources currently available invhscan be found here: sources.modifier: modifier components modify the data passing through them in its raw format (stream of bytes). Modifiers may exist in either the input chain or the output chain of thevhsdata flow, and input and output modifiers are implemented separately. More information about the architecture of modifiers can be found here: modifier architecture. Information about the input and output modifiers currently available invhscan be found here: input modifiers and output modifiers.format: format components modify or interpret the data passing through them by imposing a format on it, usually in terms of native Go datatypes. Like modifiers, formats may exist in either the input chain or output chain, and input and output formats are implemented separately. More information about the architecture of formats can be found here: format architecture. Information about the input and output formats currently available invhscan be found here: input formats and output formats.sink: sink components terminate data streams. A sink provides a way for data to leavevhs. This data could be written to a file, stored on cloud storage, transmitted to a network location, etc. More information on the internal architecture of sink components can be found here, and information about the sinks currently available invhscan be found here.
Additionally, vhs provides an optional facility for middleware. The middleware facility allows
users to place their own external modifier code into the vhs data flow. If used, the middleware is placed into the
data flow between the output format and the output modifier as shown in the diagram below. This external middleware
will receive formatted data in the form of stringified JSON from the chosen output format on stdin and must write
modified data to stdout.
graph LR
src[Source]
in_mod[[Input Modifier]]
in_fmt[[Input Format]]
out_fmt[[Output Format]]
mware[[Middleware]]
out_mod[[Output Modifier]]
sink[Sink]
src --> in_mod
in_mod --> in_fmt
in_fmt -.-> mware
mware -.-> out_fmt
in_fmt --> out_fmt
out_fmt --> out_mod
out_mod --> sink
style mware fill:#D55E00
Edges: Connections Between Components
The edges of the data flow graph represent data streams that pass between components. In vhs, these edges represent
the connections between the components described in the previous section. These connections are implemented using
channels, a facility for communication between concurrent software components provided by the Go language. At the most
basic level, communications between components in vhs utilize two basic strategies. Where raw data streams are needed,
vhs uses types from the io package in the Go standard library, specifically the io.ReadCloser interface. Where
structured data needs to be passed between components, the empty interface type interface{} is used for maximum
flexibility.
Metadata
Sometimes it is useful to pass descriptive information about a data stream between two connected components. For
example, the tcp source tracks information about source and destination IP addresses and ports for the tcp streams it
captures. This information may be useful to components downstream in the vhs data flow, so vhs provides a key-value
metadata facility for recording this type of information and passing it between components. This metadata facility takes the
form of a construct called Meta that is implemented in
core/meta.go.
To pass Meta between components, it is wrapped together with an io.ReadCloser into struct. For example, the
InputReader interface is used as the connection between a source and an input modifier. It is defined as follows:
// InputReader is an input reader.
type InputReader interface {
io.ReadCloser
Meta() *Meta
}
Metadata is not currently supported on the output chain of the vhs data flow, so the corresponding output interface
OutputWriter is much simpler:
// OutputWriter is an output writer.
type OutputWriter interface {
io.WriteCloser
}
Putting this all together, we can see the components and their connections on the following data flow graph:
graph LR
src[Source]
in_mod[[Input Modifier]]
in_fmt[[Input Format]]
out_fmt[[Output Format]]
mware[[Middleware]]
out_mod[[Output Modifier]]
sink[Sink]
src -- InputReader --> in_mod
in_mod -- InputReader --> in_fmt
in_fmt -. JSON string .-> mware
mware -. JSON string .-> out_fmt
in_fmt -- "interface{}" --> out_fmt
out_fmt -- OutputWriter --> out_mod
out_mod -- OutputWriter --> sink
style mware fill:#D55E00
More details on the implementation of each component and the connections between them can be found on their pages in this section.
Supporting Infrastructure
vhs provides several software constructs to implement and manage its data flow and modules.
flow: Flow is the highest level construct. It contains the input chain and output chains and manages the execution of all the modules for a givenvhssession. Defined inflow/flow.go.input: The input construct contains and manages the input chain of avhsdata flow.vhssupports a single input chain per session. Defined inflow/input.go.output: The output construct contains and manages the output chain(s) of avhsdata flow.vhssupports multiple output chains per session. Defined inflow/output.go.
The conceptual arrangement of these constructs is shown in the figure below. In most cases, it should not be necessary
to modify these portions of the codebase when adding new data flow components, but a general understanding of these
constructs should be helpful for both vhs developers and end users.
graph LR
src[Source]
in_mod[[Input Modifier]]
in_fmt[[Input Format]]
out_fmt[[Output Format]]
out_mod[[Output Modifier]]
sink[Sink]
src --> in_mod
in_mod --> in_fmt
in_fmt --> out_fmt
out_fmt --> out_mod
out_mod --> sink
subgraph in [Input]
src
in_mod
in_fmt
end
subgraph out [Output]
out_fmt
out_mod
sink
end
subgraph flow [Flow]
in
out
end
style in fill:#F0E442
style flow fill:#009E73
style out fill:#56B4E9
Parser: Specifying VHS data flows
vhs data flows are defined at runtime with command line flags as described on the reference page
(Inputs and Outputs). Internally, these flags are processed by a
parser. This parser reads the specified input and output chain descriptions and instantiates a Flow that contains the
specified components. All components must register with the parser by calling the appropriate Load... function with
the identifying token and the constructor for the component. The parser is implemented in
flow/parser.go and component registration with the
default Parser is done in cmd/vhs/main.go.