
Find interesting referers in access.log
A tiny bit of awk
logic is all that’s needed:
host=jay.gooby.org; zcat -f /path/to/access.log.gz | awk -v host=$host '{if ($7 !~ /\.[^html]/ && $11 != "\"-\"" && $11 !~ host) {print $7 " " $11}}' | sort -u
Decompress a potentially compressed access log (the -f
ensures the zcat will also work with uncompressed files) and pipe it to awk
.
awk looks at the URLs ($7
) in your log, ignoring any ending with an extension; .png
, .js
, etc, apart from .html
which it will match.
It also ignores any requested URLs that have an empty referer ($11
) or are internal links from your own $host
and therefore, not “interesting”, displaying the remaining with unique combinations of URL and their referer:
/2021/09/30/remove-the-dst-root-ca-x3-crt-from-ubuntu-14-04-lts "https://jira.astraia.com/"
/2021/09/30/remove-the-dst-root-ca-x3-crt-from-ubuntu-14-04-lts "https://one.zoho.com/"
If your log isn’t in common access log format or uses some additional custom fields, you’ll probably need to adjust the $7
and $11
awk fields so they match where your url and referer fields occur in the log format.
Here it is, interesting-referers
, wrapped up as script, with a small amount of usage help:
All links, in order of mention:
- awk: https://www.gnu.org/software/gawk/manual/gawk.html#Getting-Started
- common access log format: https://en.wikipedia.org/wiki/Common_Log_Format
- interesting-referers: https://gist.github.com/jaygooby/6b57ad9d28b91c7d7faef3636d6ae2f1
Recent posts:
- Patch for aarch64 (aka arm64) openssl 1.0.2 'relocation R_AARCH64_PREL64 against symbol OPENSSL_armcap_P error'
- TIL: the `NO_COLOR` informal standard to suppress ANSI colour escape codes
- Copy the contents of a branch into an existing git branch without merging
- Adding search to a static Jekyll site using pagefind
- asdf, python and automatically enabling virtual envs