rodtreweek / docker_docs

Docker_docs

This is my place for random, collected musings/observations/discoveries related to Docker. Opinions are my own :) I didn't have time to make it shorter ;)

Update: I might need to start looking at runc rootless containers in place of running a Docker daemon as suggested by this article: https://www.cyphar.com/blog/post/20160627-rootless-containers-with-runc

Pull request here: https://github.com/opencontainers/runc/pull/774

First of all just a bit about Docker and what problems it aims to solve, and perhaps some of its idiosyncracies... It's important to remember that Docker containers are more or less "lightweight virtual machines" which in contrast to more traditional vm's don't attempt to virtualize hardware and instead focus on virtualizing user-space only. As such, they are also generally meant to be "ephemeral", meaning (perhaps even in a literal sense..) that you shouldn't get too attached to them. The "cattle not pets" analogy is probably more descriptive here than in any other software-defined approach to abstracting/managing infrastructure.

The basic idea is that you have a container (which essentially takes advantage of some "virtualization" features already built-in to the Linux kernel) that you spin up with a docker command (the docker client) that might begin as a docker pull (much like a combination of git clone and git pull) and/or a docker run command which might feature additional flags/options for how the container should run (i.e. "interactive" vs. "detached"), defining various log drivers, or mapping a volume on the container to a location on the host system (i.e. your home directory on the machine where you're running the Docker daemon, etc.).

As far as installation goes, there are some things that I found a bit less straightforward/inconsistent during installation than I would have expected. While I was finally able to cobble together a reasonable Ansible role for installation on CentOS 7/Ubuntu 16.04, I initially installed Docker by hand on Centos 7 and was promptly met with an error that seemed to be related to this issue: https://dcos.io/docs/1.7/administration/installing/custom/system-requirements/install-docker-centos/ the "fix" for which then later itself needed fixing as noted here: https://github.com/moby/moby/issues/27661.

Again, typically (though not always, as may be the case with some of the approaches to logging I'll discuss later...) each container would ideally run as a "microservice" - which might be better thought of as a process that is "fired once", terminating in the delivery of its "payload", which arrives in the form of json output (the default) to stderr/stdout (i.e. your screen). While this is a more typical example, there isn't necessarily any rigidly upheld "orthodoxy" as to how you might choose to use containers (in fact, depending on the use case you might even run a container within a single vm, or conversely to "contain" or additionally boundary one or more vm's), regardless, the concept of "microservices" is an important one to understand as it has been key to both the genesis and continuing evolution of the Docker "ecosystem".

It then naturally follows that there are several different ways you might approach configuring and maintaining your Docker infrastructure. I'm not even talking about Kubernetes yet!...before leap-frogging into the orchestration layer, I think it's important to develop at least a competent awareness of the Docker "plumbing", configured minimally with "the least number of moving parts", then iterating on this basic structure where necessary to scale to additional use cases.

At present, I've explored just sourcing a bash dotfile (still my favored approach) as well as using docker-compose to essentially script my functions for pulling and running my containers. I've been a bit resistant to the use of docker-compose for this purpose simply due to it's python/pip dependency, which I often find to be less straightforward to install and maintain than I'd generally prefer, especially on CentOS (i.e. yum's dependency on older/insufficient versions of python, and the fact that I'd just rather not have to build/maintain multiple packages unless I have a compelling reason to do so...). At the moment, bash seems to be working just fine for me, so it seems appropriate for me to scale in more of an "event-driven" manner for now.
One specific item that results from spinning up ephemeral container instances as microservices is how to take what exists as stderr/stdout and either ship this to some place where it can be meaningfully represented, or effectively "log" this output for somewhat more persistent reference. This is where the concept of "logging" as specific to Docker emerges as an important consideration. Although much like you would expect, logging in Docker follows a fairly conventional pattern where "log" output is directed, or perhaps re-directed to a system designated suitable for storage/additional processing/relay. Where this idea conceptually differs is that contrary to a vm or bare metal installation, where a conventional log file might be expected to exist/persist, "log" output for a docker microservice typically exists mainly as output to stderr/stdout, since after process invocation and termination, by design all data is flushed and the container is effectively "reset", so that it can again be ran from a configured baseline. However, this log data need not remain as stderr/out, and instead can be "handed off" in a number of ways, for example to an external logging service (either on the host itself, or to another conatainer instance capable of consuming this data and converting it to something meaningful, i.e. a container running Graylog, Fluentd, or more traditionally as rsyslog output to /var/log/messages on the host machine).

This is where it becomes particularly important to settle upon a suitable logging strategy to match your use case, and understand the implications of this choice. This is also where it becomes important to begin understanding the various nuances and differences between the Docker daemon and client, as well as perhaps taking an at least cursory initial inventory of the various endpoints offered by the Docker API.

I'll be following up here at some point with a brief assessment as to each of the built-in plugins/log drivers offered by Docker here, ~~as well as (hopefully soon) an "index" of~~ some generally useful curl commands you might use to interact with the API.

Begin Notes Here:

First, if you are running docker in a non-production, testing environment, you're going to get tired of having to preface each and every command with sudo. To make this go away, you can do the following:

Add the docker group if it doesn't already exist:

sudo groupadd docker

Next, add the connected user "$USER" to the docker group. Change the user name to match your preferred user if you do not want to use your current user:

sudo gpasswd -a $USER docker

Either do a newgrp docker or log out/in to activate the changes to groups.
Next, you can use:

docker run hello-world

to check if you can run docker without sudo.

Here's another important observation I feel I can add now after a few months of Docker use:

Storage driver (the thing that allows for containers to write to disk in a more scalable manner than having to specifically mount (and share) manually created disk volumes for each container) availability wholly depends, on which version of the Linux kernel you are using. For instance, if you are using kernel version 3.10.x (the kernel version currently featured in CentOS 7.x), you will only be able to run the Overlay storage driver not the (Docker recommended) updated and more performant Overlay2 driver. Update: With the latest version of Docker, this has now been changed to allow CentOS 7 with kernel 3.10.x full use of Overlay2 (and is now I believe the default on at least CentOS 7.4).

Update - the following instructions/observations I believe are mostly relevant to older versions of the Docker daemon, and I think have been mostly fixed in the most recent version (I haven't confirmed this completely however just yet..)

This becomes a rather important distinction/consideration once you start seeing your Docker containers consuming surprisingly large amounts of disk space. This condition is currently only mitigated by:

a) Adding more disk (the Docker argument here has been "disk is cheap", so the Docker team have apparently opted for "ease of deployment" over "efficient use of disk space"...however, more recently in kernel 4.x and the appearance of Overlay2, this issue might be "solved"...) b) Upgrading your kernel to 4.x (fwiw, Ubuntu 16.04 uses the 4.x version of the kernel by default..), and choosing the Overlay2 storage driver. c) Scheduling/scripting some pretty drastic "cleanup" measures, including (and perhaps not limited to) the following list of tasks:

Disabling any existing/previously scheduled tasks (i.e. user and system-wide crontabs for anything that could be potentially disruptive, i.e. any existing Docker/Puppet runs, etc...).
Scheduling the backup and proper storage of your containers - **Note: this should obviously be done before executing any of the following steps i.e. with the docker save command.
Stopping the Docker daemon.
Unfortunately - as this is the only thing that appears to work in getting rid of the "detritis" left behind by old stopped/removed containers - issuing an rm -rf /var/lib/docker command (again, make sure your containers are being backed-up correctly and successfully before executing/scheduling this task).
Restarting the Docker daemon with sudo systemctl start docker - and capturing the exit code status.
Upon the successful restart of the Docker daemon, re-loading each of your backed-up containers using docker load -i <container_name.tar>
Waiting awhile (especially if container sizes number in the gigs...) for Docker to load your images.
Starting all of your containers with the appropriate flags/options, and checking to make sure they are running with docker ps, etc.

Also a quick note on backing up some of your larger/more persistent images:

Use docker commit -p <container id> <new name:tag> (Note that this should target a running container id rather than an image id) to copy the container to an image stored locally. (also look at the instructions here for other things you might do with docker commit: https://docs.docker.com/engine/reference/commandline/commit/#commit-a-container-with-new-cmd-and-expose-instructions)
Use docker save -o <new name>.tar <image> to save it to a .tar file
Restore with: docker load -i <new name>.tar

If you say don't like the stuff that happens with an entrypoint script for a particular container you just pulled from dockerhub change it!

First create a new directory, and cd into it. Then:

copy or create a file called script.sh in it
Create Dockerfile with the following content:

FROM repo/image
ADD script.sh /
ENTRYPOINT /script.sh

Next, docker build -t="myimage" .
Then finally docker run myimage Presto! you made a new container from another.

Docker also allows you to pass arguments to your build command.

The current list of predefined args is available here -- however these appear to be all proxy-related at present. For example, if you wanted to set some environment variables for proxies, you could do this:

$ docker build -t some_image:latest --build-arg=http_proxy="http://some.proxy.url" --build-arg=https_proxy="https://some.ssl.proxy.url" .

Additional Notes:

When running the container (step 4), it's no longer necessary to specify --entrypoint since we have it defaulted in the Dockerfile.
It's really that simple; no need to sign up to docker hub or any such thing (although it's of course recommended in time ;-)
One pretty frustrating caveat: It looks as though there is still a lingering dependency on the container you build from if you are using a local image - meaning that you can't simply docker rmi -f <old image>, as any attempt at removal throws "dependency" errors...I'm still looking at whether this is a "bug" or a "feature"...

Oh! This is also something to note: If you have a container image that was built for a specific user, i.e. "dev" for a generic dev user, etc., and this user doesn't have sudo, Docker provides a very easy/straightforward workaround:

docker exec -u 0 -it <container name> bash

This should drop you straight into an interactive shell as the root user :)

Use this command to clean things up:

docker system prune -a -f

Another thing, once your bash scripts/.yml/python code etc., starts piling-up, you'll quite prudently want to get this stuff checked into a git repo (prior or in later addition to something slightly more advanced like an internal Docker registry - which is basically the same as your own internally-hosted version of something like the public docker hub registry...). Also, once the need to version and track changes is felt more accutely, you might also start looking at adding a "Makefile" at the root of your repo, potentially along with some other text files that spell-out some meaningful metadata which you may be using to govern your deployments (for instance dev behavior vs. prod in something like an env.txt file, etc.). I'll add more related to this effort later - once I've established something workable for my own use case(s).

One other thing that I'm finding pretty routine: The need for a summarized, topical reference for a whole lotta bash stuffs, especially as related to Docker and/or the Linux kernel.

Since Docker has a particular dependency on the Linux kernel (I'll just inject a parenthetical "apology" here in advance for my lack of sophistication in describing this relationship) - and if we accept the analogy of Docker as sort of an "API" for meaningfully interacting with underlying Linux kernel virtualization features (more info: look up LXC, and/or have a look at the Docker FAQ here: https://docs.docker.com/engine/faq/), then arguably the default Linux shell (Bash) starts to feel like the least-abstracted and most immediately accessible tool for interacting with Docker. Obviously it's pretty trivial to circumvent anything approaching a firm Bash requirement when it comes to Docker (likely more than I'd care to know...), and the Docker client iself (i.e. "docker <run/inspect/info>", etc.) provides a considerable, if not wholly sufficient number of command-line utilities and options to satisfy the majority of use cases, however just simply based on my own (very subjective/idiosyncratic) personal preference, I've been finding lately the more persistent need for a rapidly/easily accessed "touchstone" for memorization/repetition of Bash scripting syntax and conventions (interesting how things have come a bit "full circle" right??).

While this: http://tiswww.case.edu/php/chet/bash/bashref.html#SEC31 along with this (extremely comprehensive repo!): https://github.com/rodtreweek/awesome-shell - have proven amazingly helpful, I'll be attempting to distill much of what I've found helpful down to the "cheat sheet" level here:

Some General Notes on Debugging or: "Just because it's abstracted doesn't mean that you are at last free to drift toward the light of blissful kernel/host OS ignorance..."

While transparency to the host OS has certainly been key to the very nature and evolution of containers and obviously important to de-coupling workloads from what was once a disruptive - at times even halting friction resulting from dependencies, it doesn't take much more than a trip through the Docker release notes to get an idea as to the rapidly changing complexity of this relationship.

Despite the now common use of terms such as "serverless" to describe a distinctly hardware/OS "agnostic" approach to systems design/development featuring an emphasis on code "functions" executed as "microservices" and embedded within the service fabric of the Cloud-provider (i.e. AWS Lambdas, API Gateway, Azure Functions, etc.), this steady march toward "utility" infrastructure hasn't quite yet provided the "power-grid" level of stability sufficient to exclude the need for more "classic" troubleshooting of the host system and it's more discrete interactions among containers - aka that set of skills located somewhere near the bottom of the resume under "Linux Systems Administrator" :)

First, Docker has decided (not unreasonalby) to disallow the unrestricted use of a variety of system-level commands, including ltrace/strace. You can read more about this here but basically you're going to need to pass something to your docker run command if you want to trace system calls and libraries. This will differ slightly as to whether you are using apparmor or the newer seccomp profiler, but for seccomp this might look something like the below:

docker run --rm -it --security-opt seccomp=unconfined debian:jessie

Alternately, you might choose to enable this more conservatively by using the --cap-add SYS_PTRACE option specific to ptrace, but for some reason I wasn't able to make this work, and really for one-off considerations the above would seem fine to use ad-hoc.

Bash Cheat Sheet

In Bash, test and [ are builtins.

The double bracket enables additional functionality. For example, you can use && and || instead of -a and -o and there's a regular expression matching operator =~.

The braces (These things: {} ), in addition to delimiting a variable name, are used for parameter expansion - so you can do things like >>>

Truncate the contents of a variable:

$ var="abcde"; echo ${var%d*} abc

Make substitutions similar to sed:

$ var="abcde"; echo ${var/de/12} abc12

Use a default value:

$ default="hello"; unset var; echo ${var:-$default} hello

Also, brace expansions create lists of strings which are typically iterated over in loops:

$ echo f{oo,ee,a}d food feed fad

$ mv error.log{,.OLD} (error.log is renamed to error.log.OLD because the brace expression expands to "mv error.log error.log.OLD")

$ for num in {000..2}; do echo "$num"; done 000 001 002

$ echo {00..8..2} 00 02 04 06 08

$ echo {D..T..4} D H L P T Note that the leading zero and increment features weren't available before Bash 4.

Double parentheses are used for arithmetic operations:

((a++))

((meaning = 42))

for ((i=0; i<10; i++))

echo $((a + b + (14 * c))) - and also allow you to omit the dollar signs on integer and array variables and include spaces around operators for readability.

Single brackets are also used for array indices:

array[4]="hello"

element=${array[index]} Curly braces are required for (most/all?) array references on the right hand side.

Parentheses are also used for subshells (basically stuff in parentheses that you want to run in parallel with the "parent" script...more info here: http://tldp.org/LDP/abs/html/subshells.html) and that these can be used to create arrays:

array=(1 2 3) echo ${array[1]} 2

Handling output streams:

Please note that the n.e. in the syntax column means "not existing".

Syntax	StdOut (visible in terminal)	StdErr (visible in terminal)	StdOut (visible in file)	StdErr (visible in file)	file (existing)
`>`	no	yes	yes	no	overwrite
`>>`	no	yes	yes	no	append
2>	yes	no	no	yes	overwrite
2>>	yes	no	no	yes	append
&>	no	no	yes	yes	overwrite
&>>	no	no	yes	yes	append
tee	yes	yes	yes	no	overwrite
tee -a	yes	yes	yes	no	append
n.e. (*)	yes	yes	no	yes	overwrite
n.e. (*)	yes	yes	no	yes	append
& tee	yes	yes	yes	yes	overwrite
& tee -a	yes	yes	yes	yes	append

List:

command > output.txt

The standard output stream will be redirected to the file only, it will not be visible in the terminal. If the file already exists, it gets overwritten.

command >> output.txt

The standard output stream will be redirected to the file only, it will not be visible in the terminal. If the file already exists, the new data will get appended to the end of the file.

command 2> output.txt

The standard error stream will be redirected to the file only, it will not be visible in the terminal. If the file already exists, it gets overwritten.

command 2>> output.txt

The standard error stream will be redirected to the file only, it will not be visible in the terminal. If the file already exists, the new data will get appended to the end of the file.

command &> output.txt

Both the standard output and standard error stream will be redirected to the file only, nothing will be visible in the terminal. If the file already exists, it gets overwritten.

command &>> output.txt

Both the standard output and standard error stream will be redirected to the file only, nothing will be visible in the terminal. If the file already exists, the new data will get appended to the end of the file..

command | tee output.txt

The standard output stream will be copied to the file, it will still be visible in the terminal. If the file already exists, it gets overwritten.

command | tee -a output.txt

The standard output stream will be copied to the file, it will still be visible in the terminal. If the file already exists, the new data will get appended to the end of the file.

command |& tee output.txt

Both the standard output and standard error streams will be copied to the file while still being visible in the terminal. If the file already exists, it gets overwritten.

command |& tee -a output.txt

Both the standard output and standard error streams will be copied to the file while still being visible in the terminal. If the file already exists, the new data will get appended to the end of the file.

Stay tuned for more!

Some links:

Terrific presentation summary on troubleshooting containerized Java microservices: http://docs.huihoo.com/javaone/2015/CON6832-Debugging-Java-Apps-in-Containers-No-Heavy-Welding-Gear-Required.pdf
More on Docker + Java debugging here: https://ptmccarthy.github.io/2014/07/24/remote-jmx-with-docker/
High(er) level overview of capturing log data with fluentd
The Fluentd github repo for the docker container with some excellent instructions
Logging to rsyslog
Logging to fluentd and gelf
Video for Fluentd configuration - timecode for Docker discussion = 19:27
Video featuring interesting comparison b/w Logstash and Fluentd
Debugging tips
Fix an Overlay storage driver problem on CentOS
Container Networking Part 1
Container Networking Part 2
Host networking mode explained:
Important! Backing up your Docker Containers
Cleaning things up so you don't run out of space
Great discussion on storage drivers
Terrrific repo featuring a bunch of useful tables related to common storage driver issues
Suggested fixes for Docker for Windows/WSL
Bash conditionals: https://www.thegeekstuff.com/2010/06/bash-conditional-expression
Some great suggestions for bash functions: http://bitjudo.com/blog/2014/03/11/docker-scripts
Real-world use case example for building and deploying a temperature conversion microservice using docker: https://alm.engr.colostate.edu/cb/wiki/23468
Great reference for deciphering what each of the elements mean in the Docker Engine .json log file, or as relates to the Docker API here: https://docs.docker.com/engine/api/v1.28/#operation/ContainerInspect

rodtreweek / docker_docs

README.md

Docker_docs

Begin Notes Here:

Docker also allows you to pass arguments to your build command.

Additional Notes:

Some General Notes on Debugging or: "Just because it's abstracted doesn't mean that you are at last free to drift toward the light of blissful kernel/host OS ignorance..."

Bash Cheat Sheet

Handling output streams:

About

Releases

Packages

rodtreweek / docker_docs

Join GitHub today

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio

Latest commit

Git stats

Files

README.md

Docker_docs

Begin Notes Here:

Docker also allows you to pass arguments to your build command.

Additional Notes:

Some General Notes on Debugging or: "Just because it's abstracted doesn't mean that you are at last free to drift toward the light of blissful kernel/host OS ignorance..."

Bash Cheat Sheet

Handling output streams:

About

Topics

Resources

Releases

Packages 0

Packages