A little while ago I wrote a cool little project called kif to scratch an itch I had. Kif has the ability to encrypt, compress, and transfer files to a remote server over SSH. A searchable database (SQLite) of these files is kept, along with their original names and checksums which can be used for fetching and integrity checking. Pretty neat!

The heart of kif looks like this:

LOCAL_CHECKSUM=$(gpg --encrypt --recipient "${KIF_GPGKEY}" -o - ${FILENAME} |\
gzip |\
tee >(
  ssh ${KIF_HOST} "cat > ${REMOTE_PATH}"
) | shasum -a 256 | awk '{print $1}')

What the hell is going on here?! Let's break it down line by line (or pipe by pipe...).

gpg --encrypt --recipient "${KIF_GPGKEY}" -o - ${FILENAME}

Pretty simple, the file ${FILENAME} is encrypted with gpg using whatever key is named in ${KIF_GPGKEY}. The option -o - means our encrypted data is streamed out instead of written to a file.


The stream is compressed.

tee >(ssh ${KIF_HOST} "cat > ${REMOTE_PATH}")

Now comes the tricky part. tee is a shell command which splits streams — whatever goes in is split and streamed to its arguments. Here that's >(ssh ${KIF_HOST} "cat > ${REMOTE_PATH}"), which is a secondary inline bash script.

ssh ${KIF_HOST} "cat > ${REMOTE_PATH}"

This does the actual uploading. The encrypted and compressed data is streamed into ssh, which cats the data to a file. You can try this yourself!

$ echo "Hello world." | ssh server "cat > ~/hello"
$ ssh server "cat ~/hello"
Hello world.

The rest is pretty straight forward. The output of tee (exactly the same as what went in) is piped to

shasum -a 256 | awk '{print $1}'

So in conclusion, we have:

  1. Our input file is encrypted.
  2. A stream of this encrypted data is compressed.
  3. The data is streamed to a remote server and saved to a file.
  4. The shasum of the streamed data is found and store in a variable.

Since it's all done with streaming, the local disk isn't touched. Great!

Kif uses the shasum to check that the ssh upload was successful.

We can also do the reverse operation, although it's much simpler because we have no need for the checksum:

ssh ${KIF_HOST} "cat ${KIF_LOCATION}/store/${HASH}" | gzip -d -c | gpg --decrypt > ${FILENAME}

I have tested this method with files up to 250MB and haven't experienced any problems.