Architecture
Introduction
vhs
is a network traffic utility that works by chaining modules loaded from plugins. It offers a high-performance
concurrent architecture for routing data and executing modules that enables users to configure and extend vhs
for
a variety of purposes, including traffic recording, replay, live metrics collection, and many others.
Concepts
The architecture of vhs
is built around the concept of a data flow, a directed graph that represents the routing of a
stream of data through software components that act on that data stream. In a data flow graph, nodes represent software
components that originate, terminate, or modify the data stream passing through them, and edges represent the data
stream passing between components.
In vhs
, the data flow graph looks something like this:
graph LR
src[Source]
in_mod[[Input Modifier]]
in_fmt[[Input Format]]
out_fmt[[Output Format]]
out_mod[[Output Modifier]]
sink[Sink]
src --> in_mod
in_mod --> in_fmt
in_fmt --> out_fmt
out_fmt --> out_mod
out_mod --> sink
Where each node represents a particular type of software component and the edges represent the connections between
those components. The following two subsections will describe the components and connections that make up the vhs
data flow in more detail.
Nodes: VHS Components
Each node in the graph represents a concurrently-executed software component. In vhs
, these components fall into four
categories, as listed below:
source
: source components originate data streams. A source brings data intovhs
from somewhere else. This could mean capturing data from a network interface, reading from cloud storage, reading from a local file, etc. More information on the internal architecture of source components can be found here: source architecture. Information about the sources currently available invhs
can be found here: sources.modifier
: modifier components modify the data passing through them in its raw format (stream of bytes). Modifiers may exist in either the input chain or the output chain of thevhs
data flow, and input and output modifiers are implemented separately. More information about the architecture of modifiers can be found here: modifier architecture. Information about the input and output modifiers currently available invhs
can be found here: input modifiers and output modifiers.format
: format components modify or interpret the data passing through them by imposing a format on it, usually in terms of native Go datatypes. Like modifiers, formats may exist in either the input chain or output chain, and input and output formats are implemented separately. More information about the architecture of formats can be found here: format architecture. Information about the input and output formats currently available invhs
can be found here: input formats and output formats.sink
: sink components terminate data streams. A sink provides a way for data to leavevhs
. This data could be written to a file, stored on cloud storage, transmitted to a network location, etc. More information on the internal architecture of sink components can be found here, and information about the sinks currently available invhs
can be found here.
Additionally, vhs
provides an optional facility for middleware. The middleware facility allows
users to place their own external modifier code into the vhs
data flow. If used, the middleware is placed into the
data flow between the output format and the output modifier as shown in the diagram below. This external middleware
will receive formatted data in the form of stringified JSON from the chosen output format on stdin
and must write
modified data to stdout
.
graph LR
src[Source]
in_mod[[Input Modifier]]
in_fmt[[Input Format]]
out_fmt[[Output Format]]
mware[[Middleware]]
out_mod[[Output Modifier]]
sink[Sink]
src --> in_mod
in_mod --> in_fmt
in_fmt -.-> mware
mware -.-> out_fmt
in_fmt --> out_fmt
out_fmt --> out_mod
out_mod --> sink
style mware fill:#D55E00
Edges: Connections Between Components
The edges of the data flow graph represent data streams that pass between components. In vhs
, these edges represent
the connections between the components described in the previous section. These connections are implemented using
channels, a facility for communication between concurrent software components provided by the Go language. At the most
basic level, communications between components in vhs
utilize two basic strategies. Where raw data streams are needed,
vhs
uses types from the io
package in the Go standard library, specifically the io.ReadCloser
interface. Where
structured data needs to be passed between components, the empty interface type interface{}
is used for maximum
flexibility.
Metadata
Sometimes it is useful to pass descriptive information about a data stream between two connected components. For
example, the tcp
source tracks information about source and destination IP addresses and ports for the tcp streams it
captures. This information may be useful to components downstream in the vhs
data flow, so vhs
provides a key-value
metadata facility for recording this type of information and passing it between components. This metadata facility takes the
form of a construct called Meta
that is implemented in
core/meta.go
.
To pass Meta
between components, it is wrapped together with an io.ReadCloser
into struct. For example, the
InputReader
interface is used as the connection between a source and an input modifier. It is defined as follows:
// InputReader is an input reader.
type InputReader interface {
io.ReadCloser
Meta() *Meta
}
Metadata is not currently supported on the output chain of the vhs
data flow, so the corresponding output interface
OutputWriter
is much simpler:
// OutputWriter is an output writer.
type OutputWriter interface {
io.WriteCloser
}
Putting this all together, we can see the components and their connections on the following data flow graph:
graph LR
src[Source]
in_mod[[Input Modifier]]
in_fmt[[Input Format]]
out_fmt[[Output Format]]
mware[[Middleware]]
out_mod[[Output Modifier]]
sink[Sink]
src -- InputReader --> in_mod
in_mod -- InputReader --> in_fmt
in_fmt -. JSON string .-> mware
mware -. JSON string .-> out_fmt
in_fmt -- "interface{}" --> out_fmt
out_fmt -- OutputWriter --> out_mod
out_mod -- OutputWriter --> sink
style mware fill:#D55E00
More details on the implementation of each component and the connections between them can be found on their pages in this section.
Supporting Infrastructure
vhs
provides several software constructs to implement and manage its data flow and modules.
flow
: Flow is the highest level construct. It contains the input chain and output chains and manages the execution of all the modules for a givenvhs
session. Defined inflow/flow.go
.input
: The input construct contains and manages the input chain of avhs
data flow.vhs
supports a single input chain per session. Defined inflow/input.go
.output
: The output construct contains and manages the output chain(s) of avhs
data flow.vhs
supports multiple output chains per session. Defined inflow/output.go
.
The conceptual arrangement of these constructs is shown in the figure below. In most cases, it should not be necessary
to modify these portions of the codebase when adding new data flow components, but a general understanding of these
constructs should be helpful for both vhs
developers and end users.
graph LR
src[Source]
in_mod[[Input Modifier]]
in_fmt[[Input Format]]
out_fmt[[Output Format]]
out_mod[[Output Modifier]]
sink[Sink]
src --> in_mod
in_mod --> in_fmt
in_fmt --> out_fmt
out_fmt --> out_mod
out_mod --> sink
subgraph in [Input]
src
in_mod
in_fmt
end
subgraph out [Output]
out_fmt
out_mod
sink
end
subgraph flow [Flow]
in
out
end
style in fill:#F0E442
style flow fill:#009E73
style out fill:#56B4E9
Parser: Specifying VHS data flows
vhs
data flows are defined at runtime with command line flags as described on the reference page
(Inputs and Outputs). Internally, these flags are processed by a
parser. This parser reads the specified input and output chain descriptions and instantiates a Flow
that contains the
specified components. All components must register with the parser by calling the appropriate Load...
function with
the identifying token and the constructor for the component. The parser is implemented in
flow/parser.go
and component registration with the
default Parser
is done in cmd/vhs/main.go
.