Getting Started
The quickest way to see vhs
in action is to use the development Docker image in this repo to run a demo.
Prerequisites
The first step is to clone the VHS repository.
git clone https://github.com/rename-this/vhs.git
You will need a working Docker installation to successfully execute the following commands. You should be able to install Docker from your favorite package manager, or you can see the Docker site for more details.
Once you have Docker set up and working, you can build the development container by changing directory into the repository you cloned and running:
$ make dev
This command will build and run a Docker container called vhs_dev
that contains a standardized development environment
for vhs
. It includes the Go toolchain as well as other useful utilities such as curl
and jq
, among others.
The vhs
source tree that you cloned will be mounted inside the development container at /go/vhs
for convenience.
Additionally, if present, your Google Cloud SDK configuration folder (~/.config/gcloud
) will be mounted in the
container at /root/.config/gcloud
and the GOOGLE_APPLICATION_CREDENTIALS
environment variable will be set to
/root/.config/gcloud/application_default_credentials.json
. You may need to ensure that this file exists on your
system or change this environment variable to point to your own GCS service account credentials before using the GCS
functionality of vhs
.
Demo Setup
Open a bash session on the container by running the following command in a terminal:
docker exec -it vhs_dev bash
In your new bash session, run this script to start a simple echo server and a script that will make an HTTP request to the server once every second using curl.
cd testdata && ./echo.bash
This will generate some local HTTP traffic for vhs
to capture.
Run The Demo
Open another bash session on the container in a new terminal:
docker exec -it vhs_dev bash
In this new session, run the following command to build vhs
and run it to capture our local HTTP traffic.
go build ./cmd/vhs && ./vhs --input "tcp|http" --output "json|stdout" --capture-response --address 0.0.0.0:1111 --middleware ./testdata/http_middleware.bash | jq -R "fromjson | .body" 2> /dev/null
The output should look something like this:
"hello, world 1594678195 [[hijacked 0]]"
"hello, world 1594678195 [[hijacked 1]]"
"hello, world 1594678196 [[hijacked 0]]"
"hello, world 1594678196 [[hijacked 1]]"
"hello, world 1594678197 [[hijacked 0]]"
"hello, world 1594678197 [[hijacked 1]]"
"hello, world 1594678198 [[hijacked 0]]"
"hello, world 1594678198 [[hijacked 1]]"
"hello, world 1594678199 [[hijacked 0]]"
"hello, world 1594678199 [[hijacked 1]]"
"hello, world 1594678200 [[hijacked 0]]"
Explanation of the Demo
We can break down the demo command we just ran to get a better feel for how vhs
works and what it can do:
go build ./cmd/vhs && ./vhs
This portion of the command builds the vhs
from the source tree and executes the resulting binary
--input "tcp|http"
This portion of the command defines input portion of the data flow for this vhs
session. Currently, only one source
can be specified for a given vhs
session.
In this case:
tcp
specifies the TCP data source. This source will capture TCP data off the network.http
specifies the HTTP input format. This input format will extract HTTP requests and responses from the captured TCP data streams.
--output "json|stdout"
This portion of the command specifies the output portion of the data flow for this vhs
session. A vhs
session may
have any number of outputs.
In this case:
json
specifies the JSON output format. This output format will serialize the HTTP requests and responses into JSON.stdout
specifies the standard output sink. This sink will print the data to the console.
--capture-response
This flag tells the TCP input source to capture two-way network traffic (requests and responses) instead of one-way (requests only).
--address 0.0.0.0:1111
This flag specifies the address and port from which vhs
will capture traffic, in the form <IP address>:<port>
.
--middleware ./testdata/http_middleware.bash
This flag specifies the middleware to run for this vhs
session. Middleware enables users to modify or rewrite data
as it passes through vhs
from source to sink. The middleware specified here appends
" [[hijacked <HTTP_MESSAGE_TYPE>]]"
to the end of the http request or response body as it passes through vhs
. More
information on middleware can be found here.
| jq -R "fromjson | .body"
This portion of the command pipes the output of vhs
to jq
for filtering and pretty printing. This functionality is
external to vhs
. You can find more information on jq
here.
2> /dev/null
Discards stderr to keep the demo output clean.
Common Use Cases
Capture live HTTP traffic to cloud storage
./vhs --input "tcp|http" --output "json|gzip|gcs" --address 0.0.0.0:80 --capture-response --gcs-bucket-name <some_bucket> --gcs-object-name <some_object> --flow-duration 2m
The above command will capture live HTTP traffic on port 80 for 2 minutes and save the captured data as gzipped JSON to Google Cloud Storage (GCS) in the specified bucket and object.
Generate HAR file from saved HTTP data
./vhs --input "gcs|gzip|json --output "har|stdout" --gcs-bucket-name <some_bucket> --gcs-object-name <some_object> --flow-duration 15s > harfile.json
The above command will “replay” saved HTTP data in gzipped JSON format from the specified GCS bucket and object and
produce an HTTP Archive (HAR) file called harfile.json
on the local
filesystem.
Provide live HTTP metrics
./vhs --input "tcp|http" --address 0.0.0.0:80 --capture-response --prometheus-address 0.0.0.0:8888 --flow-duration 60m
The above command will capture live HTTP traffic on port 80 and calculate
RED metrics for the captured
data in real time. These include metrics on request rate, error rate, and request duration. These metrics will be
available on a Prometheus endpoint at port 8888. For more information on metrics see the
vhs
reference entry on metrics support. The command will run for 60 minutes.