A tiny bit of awk logic is all that’s needed:

host=jay.gooby.org; zcat -f /path/to/access.log.gz | awk -v host=$host '{if ($7 !~ /\.[^html]/ && $11 != "\"-\"" && $11 !~ host) {print $7 " " $11}}' | sort -u

Decompress a potentially compressed access log (the -f ensures the zcat will also work with uncompressed files) and pipe it to awk.

awk looks at the URLs ($7) in your log, ignoring any ending with an extension; .png, .js, etc, apart from .html which it will match.

It also ignores any requested URLs that have an empty referer ($11) or are internal links from your own $host and therefore, not “interesting”, displaying the remaining with unique combinations of URL and their referer:

/2021/09/30/remove-the-dst-root-ca-x3-crt-from-ubuntu-14-04-lts "https://jira.astraia.com/"
/2021/09/30/remove-the-dst-root-ca-x3-crt-from-ubuntu-14-04-lts "https://one.zoho.com/"

If your log isn’t in common access log format or uses some additional custom fields, you’ll probably need to adjust the $7 and $11 awk fields so they match where your url and referer fields occur in the log format.

Here it is, interesting-referers, wrapped up as script, with a small amount of usage help: