Extracting and archiving my #music-we-like Slack chats
At work, we have a #music-we-like
Slack channel, and it’s full of gems. I thought I’d see how hard it would be to extract my messages, and then archive them to a page on this blog.
Pretty straightforward as it turns out, thanks to Seb Seager’s slack-exporter
.
I followed the setup instructions, got an oauth token and could slurp down the messages from the channel.
Because I only want to archive and post my own messages, I was hoping there’d be a user filter option as well as the --ch
channel one. There isn’t, but no worries, as there’s also the --json
option. I made a quick PR so the JSON it outputs is valid (objects were the single-quoted python data structure and thus not valid JSON) and now I can pipe to jq
to extract just my messages:
python3 ~/src/slack-exporter/exporter.py --json -c --ch CHANNELID | jq '[.[] | select(.user=="USERID") | {id: .client_msg_id, timestamp: .ts, text: .text, attachments: .attachments}]' | sed -E 's,<(https?://[^>]*)>,<a href=\\"\1\\">\1</a>,g' > ~/src/blog/_data/music-i-like.json
The sed
portion of the pipe converts the urls in Slack messages that look like <https://example.com>
into Markdown links [https://example.com](https://example.com)
.
Then we need a markdown page to render the _data/music-i-like.json
file:
---
layout: page
title: Music I've liked
permalink: /music-i-like/
---
{% for message in site.data.music-i-like %}
<article>
{{ message.text }}
{% for attachment in message.attachments %}
{{ attachment.audio_html }}
{% endfor %}
</article>
{% endfor %}
This works, but the page is huge. Jekyll does paginate, but only index.html pages. There’s jekyll-paginate-v2, but this doesn’t support paginating data pages.
Time to for another PR. We can turn pagination on for a single page in the frontmatter like this:
pagination:
enabled: true
# data: music-i-like
That commented-out last line is how I was initially thinking about triggering the data pagination, specifying what site.data
to use, but this is already implicit in your page (which is named the same as your data file), so it felt unnecessary. Adding this to a data page should be enough to trigger pagination:
pagination:
enabled: true
You can use all the other features of the jekyll-paginate-v2 gem, e.g. per_page:
, trail:
, limit:
, sort_reverse:
, and so on. Sorting is limited to date, as the data entries are coerced into fake posts with no other attributes.
Your markdown page needs some changes so the paginated content is displayed correctly:
<!-- Slice the data array to match the pagination. data_start sets the offset
of the data array and this_page_data is the data for the page -->
{% assign data_start = page.pagination_info.curr_page | minus: 1 | times: paginator.per_page %}
{% assign this_page_data = site.data.music-i-like | slice: data_start, paginator.per_page %}
<!-- loop over the data that's just for the paginated page -->
{% for message in this_page_data %}
<article id="{{ message.id}}">
<time datetime="{{message.timestamp | floor | date: "%b %d %Y"}}">{{message.timestamp | floor | date: "%b %d %Y"}}</time>
{{ message.text }}
{% for attachment in message.attachments %}
{{ attachment.audio_html }}
{% endfor %}
</article>
<hr>
{% endfor %}
All links, in order of mention:
- archive them to a page on this blog: /music-i-like
- slack-exporter: https://github.com/sebseager/slack-exporter
- PR so the JSON it outputs is valid: https://github.com/sebseager/slack-exporter/pull/9
- jekyll-paginate-v2: https://github.com/sverrirs/jekyll-paginate-v2
- doesn’t support: https://github.com/sverrirs/jekyll-paginate-v2/issues/86
- paginating data: https://github.com/sverrirs/jekyll-paginate-v2/issues/96
- another PR: https://github.com/sverrirs/jekyll-paginate-v2/pull/232
Recent posts:
- Patch for aarch64 (aka arm64) openssl 1.0.2 'relocation R_AARCH64_PREL64 against symbol OPENSSL_armcap_P error'
- TIL: the `NO_COLOR` informal standard to suppress ANSI colour escape codes
- Copy the contents of a branch into an existing git branch without merging
- Adding search to a static Jekyll site using pagefind
- asdf, python and automatically enabling virtual envs