Astro RSS Feeds with Full MDX Content
by
RSS is pretty cool. We can pick our favorite websites, magazines, blogs, really anything that offers a public feed, add them to our aggregator of choice, and get notified about new content. I use NetNewsWire, because it has a great UI and lets me sync my reading across Apple devices, but there are many other great options available.
This blog is built with Astro. I initially used the recommended @astrojs/rss library to generate an RSS feed, but now I want my feed to include the full post content, not just title and description. Astro has a guide for that, but there is one major problem with that approach.
“Note: this will not process components or JSX expressions in MDX files.”
… I’m using MDX. There hasn’t been a clear solution for this problem until recently, when Astro 4.9 was released. In this article, I show you exactly how I use the new Astro Container API to render the full article content when using MDX.
The Astro Container API
This new API, at this point still marked as experimental, lets us render a single Astro Component to a string. That’s perfect for generating RSS feeds, because we only need the HTML of the post content, not the whole document structure. Let’s look at the code.
// getPostWithContent.ts
import type { APIContext } from 'astro';
import { experimental_AstroContainer as AstroContainer } from 'astro/container';
import { loadRenderers } from 'astro:container';
import { getContainerRenderer as getMDXRenderer } from '@astrojs/mdx';
import { render } from 'astro:content';
import { rehype } from 'rehype';
export default async function getPostsWithContent(context: APIContext) {
const siteURL = getSiteURL(context);
const container = await AstroContainer.create({
renderers: await loadRenderers([getMDXRenderer()]),
});
const posts = await getSortedBlogPosts();
return Promise.all(
posts.map(async post => {
const { Content } = await render(post);
const rawContent = await container.renderToString(Content);
const file = await rehype()
.data('settings', {
fragment: true,
})
.use(sanitizeHTML, { siteURL })
.process(rawContent);
return {
post,
content: String(file),
};
})
);
}
This function is responsible for fetching the post metadata and render the post content to a sanitized string. The siteURL comes from Astro’s APIContext which is available in all API functions. The function will be used inside of a GET request handler, but I’ll get to that later.
I’m using MDX for my blog posts, so the renderer I need to load for the Astro container is the MDX renderer (There are more renderers available and you can also write your own). Once the container is created, I load the posts from the content collection and render it to an Astro component.
At this point I could add the <Content /> component to a .astro page, but since this is an API function, I pass it to container.renderToString() instead, which renders the Astro component as HTML to a string.
I could stop here and put this string into the RSS post content, but there are some issues with the HTML output that I have to fix first.
Sanitizing the Output
Astro, being built for websites, rendered the post for a web page. Links to pages and images of the website use relative paths. Unfortunately that won’t work in RSS readers. To fix this I need to prefix each path with the site URL. To do this properly it requires a three-step process.
- parse the HTML into AST format
- modify some of the nodes
- render the modified AST back into a string
This sounds like a lot. Parsing and rendering HTML is far from trivial. Lucky for us, there are great tools available. I choose unified, or rather rehype, because it’s well-documented and widely used. In fact, it’s used by Astro internally for rendering Markdown.
I added a single plugin to the processing chain, sanitizeHTML. rehype internally wraps that into a rehypeParse, to turn the HTML string into an abstract syntax tree (AST) and rehypeStringify, which turns the AST into serialized HTML.
Let me show you the sanitizing plugin.
// sanitizeHTML.ts
import type { Element, Root } from 'hast';
import type { Plugin } from 'unified';
import { visitParents } from 'unist-util-visit-parents';
interface SanitizeHTMLOptions {
siteURL: string;
}
const sanitizeHTML: Plugin<[SanitizeHTMLOptions], Root> = ({ siteURL }) => {
return tree => {
visitParents(tree, (node, parents) => {
if (node.type !== 'element') {
return;
}
// Remove all style tags
if (node.tagName === 'style') {
return removeElementNode(node, parents);
}
// Remove all script tags
if (node.tagName === 'script') {
return removeElementNode(node, parents);
}
// Remove all spans inside code tags
if (
node.tagName === 'span' &&
parents.some(parent => parent.type === 'element' && parent.tagName === 'code')
) {
return removeElementNode(node, parents, true);
}
// Fix relative link URLs
if (node.tagName === 'a' && typeof node.properties.href === 'string') {
node.properties.href = new URL(node.properties.href, siteURL).href;
}
if (node.tagName === 'a' && 'target' in node.properties) {
delete node.properties.target;
}
// Fix relative image URLs
if (node.tagName === 'img' && typeof node.properties.src === 'string') {
node.properties.src = new URL(node.properties.src, siteURL).href;
}
// Drop all style attributes
if ('style' in node.properties) {
delete node.properties.style;
}
// Drop all class attributes
if ('className' in node.properties) {
delete node.properties.className;
}
// Remove Astros data-astro-cid-... attributes
for (const key of Object.keys(node.properties)) {
if (key.startsWith('dataAstroCid')) {
// eslint-disable-next-line @typescript-eslint/no-dynamic-delete
delete node.properties[key];
}
}
});
};
};
export default sanitizeHTML;
I’m using unist-util-visit-parents because it gives me access to the parent nodes of each visited node. removeElementNode is a simple helper function that replaces a node with its children. The plugin does the following sanitization steps.
- Remove all
<style>tags. RSS readers take care of text styling and an RSS feed shouldn’t come with styles attached. - Remove all
<script>tags, for similar reasons. RSS feeds should provide only text content and semantics and most RSS readers will ignore inline scripts. - Remove all
<span>tags inside<code>tags. Astro’s syntax highlighting plugin adds a lot of<span>tags for styling purposes, but without styles they only add bloat. - Fix relative link URLs. For example, a link to /blog/view-transitions will be converted to https://prass.tech/blog/view-transitions
- Fix relative image URLs for similar reasons.
- Drop inline style attributes.
- Drop all class attributes.
- Remove
data-astro-cid-...attributes. Those are used for styling in Astro.
This will produce minimal clutter-free HTML output.
Creating RSS, ATOM and JSON feeds
@astrojs/rss has no built-in support for Atom feeds. That’s why I decided to use the popular feed library. It can handle RSS 2.0, Atom 1.0 and also JSON Feed 1.0. Perfect!
In Astro, to create an XML file, we can use a GET handler. I added the following files to the /pages/feed folder: atom.ts, json.ts and rss.ts. The generateFeed function contains the shared logic to create a new Feed().
import { SITE } from '@config';
import getPostsWithContent from '@utils/feed/getPostsWithContent';
import type { APIContext } from 'astro';
import { Feed, type Author } from 'feed';
import getSiteURL from './getSiteURL';
export async function generateFeed(context: APIContext): Promise<Feed> {
const siteURL = getSiteURL(context);
const author: Author = {
name: SITE.author,
email: SITE.email,
link: SITE.website,
};
const feed = new Feed({
id: SITE.website,
link: siteURL,
language: SITE.language,
title: SITE.title,
description: SITE.desc,
favicon: new URL('/favicon.ico', siteURL).toString(),
copyright: SITE.license,
author,
feedLinks: {
json: new URL('/feed/json', siteURL).toString(),
atom: new URL('/feed/atom', siteURL).toString(),
rss: new URL('/feed/rss', siteURL).toString(),
},
});
const postsWithContent = await getPostsWithContent(context);
for (const { post, content } of postsWithContent) {
const link = new URL(`/blog/${post.id}/`, siteURL).toString();
feed.addItem({
id: link,
link,
title: post.data.title,
description: post.data.description,
published: post.data.pubDate,
content,
date: post.data.updatedDate || post.data.pubDate,
category: post.data.tags.map(tag => ({
name: tag,
term: tag.toLowerCase(),
domain: new URL(`/tags/${tag.toLowerCase()}/`, siteURL).toString(),
})),
});
}
return feed;
}
On each page I added the GET function with the feed output in the Response.
import { generateFeed } from '@utils/feed/generateFeed';
import type { APIContext } from 'astro';
export async function GET(context: APIContext) {
const feed = await generateFeed(context);
return new Response(feed.rss2(), {
headers: {
'Content-Type': 'application/xml',
},
});
}
NOTE: The Content-Type header for feed.json1() is application/json and for feed.atom1() it’s application/atom+xml.
Discoverability
One more small change is needed to make the feeds discoverable from every page of my website. I’m using a shared layout component, where I added the following lines inside of the <head> tag.
<link rel="alternate" type="application/rss+xml" title={`${SITE.shortTitle} RSS Feed`} href="/feed/rss" />
<link rel="alternate" type="application/json" title={`${SITE.shortTitle} JSON Feed`} href="/feed/json" />
<link rel="alternate" type="application/atom+xml" title={`${SITE.shortTitle} Atom Feed`} href="/feed/atom" />
And that’s it. Everyone can now read the full article content inside their favorite RSS/ATOM reader.
Curious to see the live result?
- RSS: https://prass.tech/feed/rss
- Atom: https://prass.tech/feed/atom
- JSON: https://prass.tech/feed/json
I hope you enjoyed this little excursion into the world of RSS and HTML parsing.