👨‍💻 Wesley Moore

Exporting YouTube Subscriptions to OPML and Watching via RSS

·updated

This post describes how I exported my 500+ YouTube subscriptions to an OPML file so that I could import them into my RSS reader. I go into fine detail about the scripts and tools I used. If you just want to see the end result the code is in this repository, which describes the steps needed to run it.

I was previously a YouTube Premium subscriber but I cancelled it when they jacked up the already high prices. Since then I’ve been watching videos in NewPipe on my Android tablet or via an Invidious instance on real computers.

To import my subscriptions into NewPipe I was able to use the subscriptions.csv file included in the Google Takeout dump of my YouTube data. This worked fine initially but imposed some friction when adding new subscriptions.

If I only subscribed to new channels in NewPipe they were only accessible on my tablet. If I added them to YouTube then I had to remember to also add them in NewPipe, which was inconvenient if I wasn’t using the tablet at the time. Inevitably the subscriptions would drift out of sync and I would have to periodically re-import the subscriptions from YouTube into NewPipe. This was cumbersome as it doesn’t seem to have a way to do this incrementally. Last time I had to nuke all its data in order to re-import.

To solve these problems I wanted to manage my subscriptions in my RSS reader, Feedbin. This way Feedbin would track my subscriptions and new/viewed videos in a way that would sync between all my devices. Notably this is possible because Google actually publishes an RSS feed for each YouTube channel.

To do that I needed to export all my subscriptions to an OPML file that Feedbin could import. I opted to do that without requesting another Google Takeout dump as they take a long time to generate and also result in multiple gigabytes of archives I have to download (it includes all the videos I’ve uploaded to my personal account) just to get at the subscriptions.csv file within.

Generating OPML

I started by visiting my subscriptions page and using some JavaScript to generate a JSON array of all the channels I am subscribed to:

copy(JSON.stringify(Array.from(new Set(Array.prototype.map.call(document.querySelectorAll('a.channel-link'), (link) => link.href))).filter((x) => !x.includes('/channel/')), null, 2))

This snippet:

With the list of channel URLs on my clipboard I pasted this into a subscriptions.json file. The challenge now was that these URLs were of the channel pages like:

https://www.youtube.com/@mooretech

but the RSS URL of a channel is like:

https://www.youtube.com/feeds/videos.xml?channel_id=<CHANNEL_ID>,

which means I needed to determine the channel id for each page. To do that without futzing around with Google API keys and APIs I needed to download the HTML of each channel page.

First I generated a config file for curl from the JSON file:

jaq --raw-output '.[] | (split("/") | last) as $name | "url \(.)\noutput \($name).html"' subscriptions.json > subscriptions.curl

jaq is an alternative implementation of jq that I use. This jaq expression does the following:

This results in lines like this for each entry in subscriptions.json, output to subscriptions.curl:

url https://www.youtube.com/@mooretech
output @mooretech.html

I then ran curl against this file to download all the pages:

curl --location --output-dir html --create-dirs --rate 1/s --config subscriptions.curl

Now that I had the HTML for each channel I needed to extract the channel id from it. While I was processing each HTML file I also extracted the channel title for use later. For each HTML file I ran this script on it. I called the script generate-json-opml:

#!/bin/sh

set -eu

URL="$1"
NAME=$(echo "$URL" | awk -F / '{ print $NF }')
HTML="html/${NAME}.html"
CHANNEL_ID=$(scraper -a content 'meta[property="og:url"]' < "$HTML" | awk -F / '{ print $NF }')
TITLE=$(scraper -a content 'meta[property="og:title"]' < "$HTML")
XML_URL="https://www.youtube.com/feeds/videos.xml?channel_id=${CHANNEL_ID}"

json_escape() {
  echo "$1" | jaq --raw-input .
}

JSON_TITLE=$(json_escape "$TITLE")
JSON_XML_URL=$(json_escape "$XML_URL")
JSON_URL=$(json_escape "$URL")

printf '{"title": %s, "xmlUrl": %s, "htmlUrl": %s}\n' "$JSON_TITLE" "$JSON_XML_URL" "$JSON_URL" > json/"$NAME".json

Let’s break that down:

Update: Stephen pointed out on Mastodon that the HTML contains the usual <link rel="alternate" tag for RSS auto-discovery. I did check for that initially but I think the Firefox dev tools where having a bad time with the large size of the YouTube pages and didn’t show me any matches at the time. Anyway, that could have been used to find the feed URL directly instead of building it from the og:url.

Ok, almost there. That script had to be run for each of the channel URLs. First I generated a file with just a plain text list of the channel URLs:

jaq --raw-output '.[]' subscriptions.json > subscriptions.txt

Then I used xargs to process them in parallel:

xargs -n1 --max-procs=$(nproc) --arg-file subscriptions.txt --verbose ./generate-json-opml

This does the following:

Finally all those JSON files need to be turned into an OPML file. For this I used Python:

#!/usr/bin/env python

import email.utils
import glob
import json
import xml.etree.ElementTree as ET

opml = ET.Element("opml")

head = ET.SubElement(opml, "head")
title = ET.SubElement(head, "title")
title.text = "YouTube Subscriptions"
dateCreated = ET.SubElement(head, "dateCreated")
dateCreated.text = email.utils.formatdate(timeval=None, localtime=True)

body = ET.SubElement(opml, "body")
youtube = ET.SubElement(body, "outline", {"title": "YouTube", "text": "YouTube"})

for path in glob.glob("json/*.json"):
    with open(path) as f:
        info = json.load(f)
        ET.SubElement(youtube, "outline", info, type="rss", text=info["title"])

ET.indent(opml)
print(ET.tostring(opml, encoding="unicode", xml_declaration=True))

This generates an OPML file (which is XML) using the ElementTree library. The OPML file has this structure:

<?xml version='1.0' encoding='utf-8'?>
<opml>
  <head>
    <title>YouTube Subscriptions</title>
    <dateCreated>Sun, 05 May 2024 15:57:23 +1000</dateCreated>
  </head>
  <body>
    <outline title="YouTube" text="YouTube">
      <outline title="MooreTech" xmlUrl="https://www.youtube.com/feeds/videos.xml?channel_id=UCLi0H57HGGpAdCkVOb_ykVg" htmlUrl="https://www.youtube.com/@mooretech" type="rss" text="MooreTech" />
    </outline>
  </body>
</opml>

It does the following:

Whew that was a lot! With the OPML file generated I was finally able to import all my subscriptions into Feedbin.

All the code is available in this repository. In practice I used a Makefile to run the various commands so that I didn’t have to remember them.

Watching videos from Feedbin

Now that Feedbin is the source of truth for subscriptions, how do I actually watch them? I set up the FeedMe app on my Android tablet. In the settings I enabled the NewPipe integration and set it to open the video page when tapped:

Screenshot of the FeedMe integration settings. There are lots of apps listed. The entry for NewPipe is turned on.
Screenshot of the FeedMe integration settings

Now when viewing an item in FeedMe there is a NewPipe button that I can tap to watch it:

Screenshot of FeedMe viewing a video item. In the top left there is a NewPipe button, which when tapped opens the video in NewPipe.
Screenshot of FeedMe viewing a video item

Closing Thoughts & Future Work

Could I have done all the processing to generate the OPML file with a single Python file? Yes, but I rarely write Python so I preferred to just cobble things together from tools I already knew.

Should I ever become a YouTube Premium subscriber again I can continue to use this workflow and watch the videos from the YouTube embeds that Feedbin generates, or open the item in the YouTube app instead of NewPipe.

At some point I’d like to work out how to get Feedbin to filter out YouTube Shorts. It has the ability to automatically filter items matching any of the supported search syntax but I’m not sure if Shorts are easily identifiable.

Update 6 June 2024: Feedbin has a media_duration search term. I was able to use that in an action to filter out YouTube items less than 90 seconds long, successfully filtering out Shorts.

Screenshot of the Feedbin settings UI. It shows a new action with name "Filter out YouTube Shorts", the search term is "media_duration:<90" and Article is in Tag has "YouTube" ticked.
Shorts filter in Feedbin

Lastly, what about desktop usage? When I’m on a real computer I read my RSS via the Feedbin web app. It supports custom sharing integrations. In order to open a video on an Invidious instance I need to rewrite it from a URL like:

https://www.youtube.com/watch?v=u1wfCnRINkE

to one like:

https://invidious.perennialte.ch/watch?v=u1wfCnRINkE.

I can’t do that directly with a Feedbin custom sharing service definition but it would be trivial to set up a little redirector application to do it. I even published a video on building a very similar thing last year. Alternatively I could install a redirector browser plugin, although that would require set up on each of the computers and OS installs I use so I prefer the former option.

Comments

Stay in touch!

Follow me on the ⁂ Fediverse, subscribe to the feed, or send me an email.