Reference
Introduction
vhs
is designed for flexibility and operates on the concept of a data flow that originates with a source
and terminates with one or more sinks. Sources may capture network data, read data from files, etc. Sinks may
write data to cloud or local storage, standard output, or send data to other destinations. Along the way, data may pass
through a series of input modifiers and formats and output
modifiers and formats that transform the data. For more information on the
technical implementation of vhs
, see the architecture overview.
Example vhs
Command
./vhs --inputs "tcp|http" --outputs "json|stdout" --address 0.0.0.0:8080 --capture-response
This command captures two-way TCP data from the network on address 0.0.0.0
and port 8080
, extracts HTTP requests
and responses from the TCP data, formats them as JSON, and prints them to the standard output.
Specifying Inputs and Outputs
The core command line flags for vhs
are focused on defining the data flow that vhs
will use for a recording/replay
session. Inputs and outputs are specified in terms of a simple domain specific language that will be detailed in
the following sections.
Inputs
--input "<source|modifier(s)|format>"
Inputs are specified in a pipe-delimited (|
), double-quoted string following the --input
command line flag.
This string must begin with a source, optionally contain modifiers, and end with an
input format. The specified source originates a data stream that is modified by the
specified modifiers and then formatted, or interpreted by the specified input format.
In the example command given above, the input specifier is --inputs "tcp|http"
where tcp
specifies the TCP source
and http
specifies the HTTP input format. This example does not use any input modifiers.
Only one input definition can be specified in a vhs
session.
Sources
The following sources are currently available:
tcp
file
gcs
(Google cloud storage)s3compat
(S3 compatible cloud storage)
tcp
The tcp
source captures live TCP/IP network data. It uses the following additional command line flags for
configuration:
--address <ip address:port>
Required. Specifies the address and port on whichvhs
will listen.--capture-response
Optional. If set,vhs
captures requests and responses (2-way traffic).
file
The file
source reads data from a file on the local filesystem. It requires the following command line flag
for configuration. This source reads a file from the filesystem and emits a raw stream of bytes to the
modifiers and formats in the specified input chain. These modifiers and formats
implement specific support for various file formats. JSON and gzipped-JSON files are currently supported by the
available input modifiers and formats.
--input_file <path to input file>
Required. Specifies the path to the input file to be read.
gcs
The gcs
source reads data from a Google Cloud Storage object. It requires the following command line flags for
configuration. Note that the GCS source also requires Google Cloud authentication credentials to be present
on the machine or in the container where vhs
is run. The GOOGLE_APPLICATION_CREDENTIALS
environment variable can be
used to specify the location of the credentials file. For more information on GCS authentication, see Google’s
documentation here.
--gcs-bucket-name <GCS bucket name>
Required. Name of bucket that contains the object to be read.--gcs-object-name <object name>
Required. Name of object to be read.
Note that this source also requires a JSON key file containing Google Cloud authentication credentials.
s3compat
The s3compat
source reads from an object in an S3-compatible cloud storage location. It requires the following command
line flags for configuration.
--s3-compat-access-key <access key>
Required. Access key for S3 compatible storage.--s3-compat-secret-key <secret key>
Required. Secret key for S3 compatible storage.--s3-compat-token <token>
Required. Session token for S3 compatible storage.--s3-compat-secure
Optional. This flag specifies encrypted transport (HTTPS). Default istrue
.--s3-compat-endpoint <S3 URL>
Required. URL for S3-compatible storage.--s3-compat-bucket-name <bucket name>
Required. Name of bucket that contains the object to be read.--s3-compat-object-name <object name>
Required. Name of object to be read.
Input Modifiers
The following input modifiers are currently available in vhs
:
gzip
gzip
The gzip
input modifier uncompresses data that has been compressed in the gzip format. It is primarily for use with
the file
, gcs
, and s3compat
sources, enabling the reading of compressed files.
Input Formats
The following input formats are currently available in vhs
:
http
json
http
The http
input format decodes the incoming data stream into HTTP requests and responses. This format is primarily
intended for use with the tcp
source.
json
The json
input format interprets the incoming data stream as JSON. It is primarily intended for use with the
file
and cloud storage sources (gcs
and s3compat
) for processing data stored
in a JSON file.
Outputs
--output "<format|modifier(s)|sink>"
Outputs are specified in a pipe-delimited (|
), double-quoted string following the --output
command line flag.
The first element in the string must specify an output format, followed by a pipe character, followed
by zero or more modifiers separated by pipe characters, followed by another pipe character, and
ending with a sink specifier. The output chain works similarly to the input chain. The input format receives
the data stream from the end of the input chain and formats or interprets the data. This data can then be modified by
an output modifier before it leaves vhs
through a sink.
In the example command given above, the output specifier is --outputs "json|stdout"
where json
specifies the JSON
output format and stdout
specifies the standard output sink. This example does not use any output modifiers.
The next sections will detail the currently available output format, modifiers,
and sink in vhs
. Each format, modifier, and sink may require additional configuration in the form of
additional command line flags. These will be described where applicable.
vhs
supports an arbitrary number of outputs for any given session. Each output will receive the same data from
the input chain.
Output Formats
The following output formats are currently available in vhs
:
har
(HTTP archive)json
har
The har
output format receives incoming data in the form of a stream of HTTP requests and responses and encodes
it into the HTTP Archive (HAR) format. The output of this format can be saved to cloud storage or printed to standard
output depending on the sink chosen by the user. For more information on the HTTP Archive format see
the specification.
json
The json
output format receives incoming data in the form of a stream of HTTP requests and responses and serializes
those requests and responses to the JSON format. The output of this format can be saved to cloud storage or printed to
standard output depending on the sink chosen by the user.
Output Modifiers
The following output modifiers are currently available in vhs
:
gzip
gzip
The gzip
output modifier compresses the data passing through it into the gzip format. This can be used in conjunction
with the stdout
or cloud storage (gcs
or s3compat
) sinks to save compressed
output data from vhs
.
Sinks
The following sinks are currently available in vhs
:
gcs
(Google cloud storage)s3compat
(S3-compatible cloud storage)stdout
discard
gcs
The gcs
sink writes data to a Google Cloud Storage object. It requires the following command line flags for
configuration. Note that the GCS sink also requires Google Cloud authentication credentials to be present
on the machine or in the container where vhs
is run. For more information on GCS authentication, see Google’s
documentation here.
--gcs-bucket-name <GCS bucket name>
Required. Bucket name that contains the GCS object to be written to.--gcs-object-name <object name>
Required. Name of object to be read.
s3compat
The s3compat
sink writes to an object in an S3-compatible cloud storage location. It requires the following command
line flags for configuration.
--s3-compat-access-key <access key>
Required. Access key for S3 compatible storage.--s3-compat-secret-key <secret key>
Required. Secret key for S3 compatible storage.--s3-compat-token <token>
Required. Session token for S3 compatible storage.--s3-compat-secure
Optional. This flag specifies encrypted transport (HTTPS). Default istrue
.--s3-compat-endpoint <S3 URL>
Required. URL for S3-compatible storage.--s3-compat-bucket-name <bucket name>
Required. Name of bucket that contains the object to be written.--s3-compat-object-name <object name>
Required. Name of object to be written.
stdout
The stdout
sink writes the data stream it receives to the standard output. This sink can be used in conjunction with
shell redirection to save the output of vhs
to a file on the local filesystem.
discard
The discard
sink silently discards the data that is sent to it.
Middleware
--middleware <path to middleware executable>
vhs
optionally supports the use of user-supplied middleware to modify data as it passes through vhs
. As an example,
user supplied middleware could be utilized to remove sensitive user credentials from recorded HTTP data before saving
it to cloud storage. Middleware, if used, is placed in the vhs
data flow in the output chain between the output format
and the output modifiers. It is implemented as a separate binary that will receive formatted data on the standard input
and must write modified data on the standard output. A simple example middleware can be found
here in the vhs
repository.
Prometheus metrics
--prometheus-address <ip adddress:port>
vhs
supports calculating metrics on HTTP exchanges captured live from the network. This facility can be used to
non-invasively gather metrics for services utilizing HTTP. Specifying the --prometheus-address
flag enables metrics
support. Metrics support is implemented internally as an output format, and requires an input chain that includes
the http
input format. A typical command for utilizing the metrics support looks like this:
./vhs --input "tcp|http" --address 0.0.0.0:80 --capture-response --prometheus-address 0.0.0.0:8080
This command will capture all http traffic on port 80, calculate metrics, and make them available at a /metrics
endpoint on port 8080 of the machine/vm/container running vhs
.
The provided metrics include measures of request rate, error rate, and request duration, sufficient for implementing the RED method of microservice monitoring. Metrics are supplied on a Prometheus endpoint. Request counts are available in a counter vector labeled with HTTP method, status code, and path. Errors counts are available by querying for HTTP error status codes, and timeouts are counted with empty status codes. Request durations are available in a summary vector with quantiles given in the table below. Durations are also labeled with HTTP method, status code, and path.
Quantile | Error (+ / -) |
---|---|
50% | 5% |
75% | 1% |
90% | 0.5% |
95% | 0.5% |
99% | 0.1% |
99.9% | 0.01% |
99.99% | 0.001% |
Complete Command Line Flag Reference
Command line flag | Description |
---|---|
–help, -h | Show brief help for VHS. |
–address string | Address VHS will use to capture traffic. (default “0.0.0.0:80”) |
–buffer-output | Buffer output until the end of the flow. |
–capture-response | Capture the responses. |
–debug | Emit debug logging. |
–debug-http-messages | Emit all parsed HTTP messages as debug logs. |
–debug-packets | Emit all packets as debug logs. |
–flow-duration duration | The length of the running command. (default 10s) |
–gcs-bucket-name string | Bucket name for Google Cloud Storage |
–gcs-object-name string | Object name for Google Cloud Storage |
–http-timeout duration | A length of time after which an HTTP request is considered to have timed out. (default 30s) |
–input string | Input description. |
–input-drain-duration duration | A grace period to allow for a inputs to drain. (default 2s) |
–input-file string | Path to an input file |
–middleware string | A path to an executable that VHS will use as middleware. |
–output strings | Output description. |
–profile-http-address string | Expose profile data on this address. |
–profile-path-cpu string | Output CPU profile to this path. |
–profile-path-memory string | Output memory profile to this path. |
–prometheus-address string | Address for Prometheus metrics HTTP endpoint. |
–s3-compat-access-key string | Access key for S3-compatible storage. |
–s3-compat-bucket-name string | Bucket name for S3-compatible storage. |
–s3-compat-endpoint string | URL for S3-compatible storage. |
–s3-compat-object-name string | Object name for S3-compatible storage. |
–s3-compat-secret-key string | Secret key for S3-compatible storage. |
–s3-compat-secure | Encrypt communication for S3-compatible storage. (default true) |
–s3-compat-token string | Security token for S3-compatible storage. |
–shutdown-duration duration | A grace period to allow for a clean shutdown. (default 2s) |
–tcp-timeout duration | A length of time after which unused TCP connections are closed. (default 5m0s) |