Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RSS feed is missing description/article content for posts #5132

Open
nschonni opened this issue Mar 15, 2023 · 30 comments
Open

RSS feed is missing description/article content for posts #5132

nschonni opened this issue Mar 15, 2023 · 30 comments
Labels
bug on hold website redesign Issue/PR part of the Node.js Website Redesign

Comments

@nschonni
Copy link
Member

URL:

https://nodejs.org/en/feed/blog.xml

Browser Name:

Firefox

Browser Version:

Operating System:

Windows 11

How to reproduce the issue:

The previous plugin would include the article. The current feed https://nodejs.org/en/feed/blog.xml just includes the title/link but no other content.
Similar to https://github.com/nodejs/node/releases.atom, it should contain the full post content

@nschonni nschonni added the bug label Mar 15, 2023
@ovflowd
Copy link
Member

ovflowd commented Mar 15, 2023

The problem is that including the full content of the article is quite problematic. First, because the original plugin wouldn't parse the markdown (if I recall correctly?)

I think we can update the current plugin to also add the parsed content of each blog post, I think.

@ovflowd
Copy link
Member

ovflowd commented Mar 15, 2023

Anyhow, thanks for reporting, @nschonni :)

@ovflowd
Copy link
Member

ovflowd commented Mar 15, 2023

@azu if you feel into tackling this PR, the place where we have the logic is done here https://github.com/nodejs/nodejs.org/blob/main/scripts/next-data/generatePreBuildFiles.mjs#L46

You can debug console.log(post) to see what comes out of it (pretty sure the body should be there).

And the documentation to the feed package is here: https://github.com/jpmonette/feed, you might simply need to add a content field on each item :)

@ovflowd ovflowd added website redesign Issue/PR part of the Node.js Website Redesign requires-ssr labels Mar 27, 2023
@ovflowd
Copy link
Member

ovflowd commented Mar 27, 2023

Update: I've suggested Nextra implement buil-tin RSS support. Let's see what they say.

@kkkrist
Copy link

kkkrist commented Jun 28, 2023

Some time during the last ca. 24h the whole blog feed (and the other two advertised feeds) went missing (404).

Could you please take a look?

@ovflowd
Copy link
Member

ovflowd commented Jun 28, 2023

Hey @kkkrist indeed that sounds like a bug. Yesterday we changed the approach of generating the RSS feeds during pre-build towards Next.js's App Router Dynamic Routes (https://github.com/nodejs/nodejs.org/blob/main/app/en/feed/%5Bfeed%5D/route.ts)

I think the solution here would be to find a way to tell Next.js to build these on static exports.

@ovflowd
Copy link
Member

ovflowd commented Jun 29, 2023

cc #5458 fixes #5456 and @kkkrist comment.

@Harkunwar
Copy link
Contributor

@ovflowd If we want to restore the original functionality and include the post content as well, we would need to be mindful of the RSS feed size. Currently our RSS feed contains all the previous posts, however most RSS feeds that do include content, limit their feed to some number, usually around 10.

@ovflowd
Copy link
Member

ovflowd commented Jul 2, 2023

@ovflowd If we want to restore the original functionality and include the post content as well, we would need to be mindful of the RSS feed size. Currently our RSS feed contains all the previous posts, however most RSS feeds that do include content, limit their feed to some number, usually around 10.

Well, I guess we can definitely open another issue to investigate what our RSS feed should have or not, how many blog posts it should have, etc.

For the content, it would blow up the RSS feed if we added the content for every single post.

We would need to revise these aspects, as we would need to change the Serverless Function that generates feeds if we want to include their contents.

We would also need to be able to extract X paragraphs and append a "Read more" button. All of that sounds a little bit overkill.

I'm strongly leaning towards not implementing this as I'm not sure how worth it is, and so far no one has been complaining about missing "description" for the RSS feeds.

What do y'all think @nodejs/website

@nschonni
Copy link
Member Author

nschonni commented Jul 2, 2023

The built-in release feed contains the information except the download links that were only available on the website feed. If @nodejs/releasers wants to modify the process for the release posts to include the missing information it would add back part of the missing functionality.

I don't think ignoring breaking core parts of the open web like RSS is a great approach though.

@nschonni
Copy link
Member Author

nschonni commented Jul 2, 2023

I'm strongly leaning towards not implementing this as I'm not sure how worth it is, and so far no one has been complaining about missing "description" for the RSS feeds.

How would anyone else complain? There is an open ticket, and others piled on when the route changes broke the feed entirely. Normally people look for open tickets and don't open duplicates

@ovflowd
Copy link
Member

ovflowd commented Jul 2, 2023

The built-in release feed contains the information except the download links that were only available on the website feed.

What built-in release feed? What sort of information was only available on the website feed? Could you please point out what is being done differently? Because this is definitely before my time here and I might be missing context.

I don't think ignoring breaking core parts of the open web like RSS is a great approach though.

Sorry, what are we breaking here? Could you give examples? The description is not a required field of the RSS spec (https://validator.w3.org/feed/docs/rss2.html#hrelementsOfLtitemgt) either a title or description should be provided.

The release blog posts are too long to include as a description, for example. And what I'm debating here is how valuable it is to add a description here, given the technicalities we would have to face.

@ovflowd
Copy link
Member

ovflowd commented Jul 2, 2023

How would anyone else complain? There is an open ticket, and others piled on when the route changes broke the feed entirely. Normally people look for open tickets and don't open duplicates

People give +1 if they agree that something is missing, even tho if they don't open more issues. I haven't seen any interest in the addition of description besides yours.

I might be wrong, but again, since our blog posts were never designed in a way starting with a short description (introduction) of sorts, the description field even if containing one or two paragraphs would still not bring that much value, in my opinion. I'm more than happy to debate over this.

I haven't even seen interest on the @nodejs/releasers about this blog feed. Again, I'm debating here how much value it is to add a description field, knowing that they can simply click on the link to the blog posts on their RSS readers.

@ovflowd

This comment was marked as off-topic.

@nschonni
Copy link
Member Author

nschonni commented Jul 2, 2023

What built-in release feed? What sort of information was only available on the website feed? Could you please point what is being done differently?

The link from the description https://github.com/nodejs/node/releases.atom

@nschonni
Copy link
Member Author

nschonni commented Jul 2, 2023

The release blog posts are too long to include as a description, for example. And what I'm debating here is how valuable it is to add a description here, given the technicalities we would have to face.

The feeds previously did include everything (most of the time), and that made the posts readable and actionable (Eg: downloading a release from the links directly from an RSS client).

@ovflowd
Copy link
Member

ovflowd commented Jul 2, 2023

The link from the description nodejs/node/releases.atom

I have genuinely never seen this file before. Could I get some context on what this releases.atom file is used for, and who even uses them? Was this being used before here?

I think we could definitely source that releases.atom file and incorporate it here. But that ATOM feed seems to be independent of the website.

@ovflowd
Copy link
Member

ovflowd commented Jul 2, 2023

The feeds previously did include everything (most of the time), and that made the posts readable and actionable (Eg: downloading a release from the links directly from an RSS client).

From what I saw in the source code, what you're referring to was never done. Could you please point me to where we sourced the releases.atom file? I'm looking all over the old build of the Metalsmith Website, and I see no reference.

@nschonni
Copy link
Member Author

nschonni commented Jul 2, 2023

The GitHub feed was never sourced. It's an automatic feed created/updated when releases are created on the nodejs/node repo.
What I meant was that previously, even with the truncation, most release posts were fully published since their size was usually inside the character count. People following the feeds could read the notes and get the download link without clicking through to the page/website

@ovflowd
Copy link
Member

ovflowd commented Jul 2, 2023

The GitHub feed was never sourced. It's an automatic feed created/updated when releases are created on the nodejs/node repo. What I meant was that previously, even with the truncation, most release posts were fully published since their size was usually inside the character count. People following the feeds could read the notes and get the download link without clicking through to the page/website

I see, then I misunderstood you.

Going back to my original point, I would still like to debate the value of adding the description field. I'd argue that we can add it again because this was done on the old Website. It got omitted from the Next.js transition precisely because of some technical challenges (that back then) we would have had by adding this.

The Website RSS feeds containing entire blog posts are way bigger than your average RSS feed.

Most RSS feeds, I know, don't put the entire page on the description field. The RSS spec itself even states that the description is meant to be a synopsis. I argue that the old behaviour was in relaxed accordance with the RSS spec.

I am still leaning against implementing this, but that doesn't mean we can't. But we should consider stripping down the number of entries on the generated feed to 10.

It's also important to mention we would need to parse the Markdown into HTML and append it as CDATA, if we're going the easy route. But I'm against attaching the whole blog page on a description field.

RSS readers can embed the actual blog post within the RSS entry. For example, we could add support to AMP over here to generate minimalistic versions of the feeds.

I'm open to suggestions.

@richardlau
Copy link
Member

I haven't even seen interest on the @nodejs/releasers about this blog feed. Again, I'm debating here how much value it is to add a description field, knowing that they can simply click on the link to the blog posts on their RSS readers.

I suspect most @nodejs/releasers are not even aware the nodejs.org blog feed existed. I knew of it, but until I looked just now had no idea how it was generated -- for reference, it used to be done via metalsmith-feed,

nodejs.org/build.js

Lines 167 to 189 in f34b913

// Generates the feed XML files from their respective collections which were
// defined earlier on.
.use(
feed({
collection: 'blog',
destination: 'feed/blog.xml',
title: 'Node.js Blog'
})
)
.use(
feed({
collection: 'blogReleases',
destination: 'feed/releases.xml',
title: 'Node.js Blog: Releases'
})
)
.use(
feed({
collection: 'blogVulnerability',
destination: 'feed/vulnerability.xml',
title: 'Node.js Blog: Vulnerability Reports'
})
)
.

But we should consider stripping down the number of entries on the generated feed to 10.

The old feed had 20 entries, e.g. https://web.archive.org/web/20230103023604/https://nodejs.org/en/feed/releases.xml from the beginning of the year.

@nschonni
Copy link
Member Author

nschonni commented Jul 2, 2023

You're probably on to something with the sourcing/delegating of the GitHub release feed. It would just need buy-in to harmonize the missing content from the template https://github.com/nodejs/nodejs.org/blob/main/scripts/release-post/template.hbs (download links, sha, etc...)

RSS readers can embed the actual blog post within the RSS entry. For example, we could add support to AMP over here to generate minimalistic versions of the feeds.

I'm not sure i'd bet on AMP. Since Google removed the search incentive, I believe larger platforms have been dropping support, so I don't think it's future looks healthy

@ovflowd
Copy link
Member

ovflowd commented Jul 2, 2023

The old feed had 20 entries, e.g. web.archive.org/web/20230103023604/https://nodejs.org/en/feed/releases.xml from the beginning of the year.

That is fair. Apparently, the metalsmith-feed plugin has a default limit of 20 entries. As stated in their readme. I believe we could also adopt this limit.

@ovflowd
Copy link
Member

ovflowd commented Jul 2, 2023

I'm not sure i'd bet on AMP. Since Google removed the search incentive, I believe larger platforms have been dropping support, so I don't think it's future looks healthy

AMP is probably dead. I'm trying to find alternatives that wouldn't require us to build markdown again. But if we limit the results to 20 entries, I feel more compelled to do this. Yet, I still need to figure out how to embed full blog posts. I'm also not sure about generating tailored description fields for the releases blog posts 🤔

@ovflowd
Copy link
Member

ovflowd commented Jul 2, 2023

Anyhow, let's sulk this in for a while, and get some more opinions from @nodejs/releasers and @nodejs/website folks and see if we find an agreement, something that we feel like it's feasible and meets popular demand.

I just wanted to clarify that I'm fine by adding a description field, but I'm arguing over what should be there and how complex that could be. 😅

@nschonni
Copy link
Member Author

nschonni commented Jul 2, 2023

Yup, you've raised some points that may simplify things:

  • Delegate the Release feeds/pages to the nodejs/node releases. That would skip @nodejs/releasers needing to do the PR to this repo, but would required the few extra bits that are currently added in the template.
  • I see from a quick scan, a few instruction posts mention a post.exerpt, so maybe there is a built-in piece that might handle the old truncation. That might be a good solution for the non-release posts

@nschonni
Copy link
Member Author

nschonni commented Jul 2, 2023

I'm now remembering that the GitHub release page does have a character limit that was hit in the past. Not sure if that is still an issue or something that GitHub addressed

@ovflowd
Copy link
Member

ovflowd commented Jul 2, 2023

I see from a quick scan, a few instruction posts mention a post.exerpt, so maybe there is a built-in piece that might handle the old truncation. That might be a good solution for the non-release posts

Well, the excerpt on this tutorial is a frontmatter field, not an actual "feature".

Our RSS feeds are generated by a Serverless-Function (that also is statically generated during build-time for all the RSS feeds we support).

At that point, I'd recall, we read the generated blog-post-data.json contents. This JSON file was introduced to allow Dynamically Generated pages to peek into Blog Data at any point without requiring the file system to be accessed.

So I would argue that adding the contents to this JSON isn't a good approach, but, that doesn't mean our serverless function couldn't peek at the source Markdown file (all source pages are uploaded and part of the Vercel deployment and also available statically on our DigitalOcean server).

So now the complication goes to -> read each Markdown file (through either node:fs) and then again compile the Markdown to HTML through remark.

This seems quite an exhaustive process to go through each file, even tho this is a one-time process... Computational-wise, it shouldn't be bad to access only the last 20 blog post entries for the aforementioned categories.

But it's still something that requires FS access for each one of these files. Hence my concerns here 😅

@velu-murugesan
Copy link

I am new to open source. Can you please give me a roadmap for open source

@AugustinMauroy
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug on hold website redesign Issue/PR part of the Node.js Website Redesign
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants