Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: use file hashes to download the same asset only once #12080

Closed
baobabKoodaa opened this issue Feb 25, 2019 · 11 comments
Closed
Labels
help wanted Issue with a clear description that the community can help with. stale? Issue that may be closed soon due to the original author not responding any more. status: needs reproduction This issue needs a simplified reproduction of the bug for further troubleshooting. type: bug An issue or pull request relating to a bug in Gatsby

Comments

@baobabKoodaa
Copy link
Contributor

baobabKoodaa commented Feb 25, 2019

Summary

If a static file is downloaded and cached in the browser, it shouldn't be downloaded again. The browser should just serve it from cache. Currently this is not working as it should, because separate GraphQL queries generate separate sets of assets - even when the output files are exactly same. This means that the exact same file will be generated into different paths, so the browser has to download the same file from multiple paths.

More information

I've put up a live demo at https://stoic-raman-037869.netlify.com/

When I go to the front page, my browser fetches this image. When I click on the image and go to the article page, the browser again fetches the exact same image from a different path.

These are the exact same file:

/public/static/be21c7dd0d11c74c18530fc39aefa922/4b90c$ sha256sum mailboxes.webp 
cc5c80f43d8a4e21f2488f2356ecf9a3adaeeb1b722b19dbe4de6e4f4eb98cf4  mailboxes.webp
/public/static/be21c7dd0d11c74c18530fc39aefa922/6a184$ sha256sum mailboxes.webp 
cc5c80f43d8a4e21f2488f2356ecf9a3adaeeb1b722b19dbe4de6e4f4eb98cf4  mailboxes.webp

An easy way to fix this issue would be to place all generated static files into the same folder and make the file hash a part of the filename. This way the second generation of this image would simply overwrite the first generation. And any page which needs this image would reference the same file, allowing the browser to cache it efficiently and avoid downloading the same asset multiple times.

In addition: I'm all ears if anybody knows how I can make this work for me now without forking Gatsby?

@freiksenet freiksenet added help wanted Issue with a clear description that the community can help with. type: feature or enhancement labels Feb 26, 2019
@freiksenet
Copy link
Contributor

We generate file names in gatsby-plugin-sharp, it contains file internal digest (which is the same for your images) and sorted picture resize argument digest (which is different). Unless arguments that are used to process this image are different, then this is a bug somewhere. Could you link to the source of the blog so I can investigate more?

@freiksenet freiksenet added type: bug An issue or pull request relating to a bug in Gatsby status: needs reproduction This issue needs a simplified reproduction of the bug for further troubleshooting. labels Feb 26, 2019
@baobabKoodaa
Copy link
Contributor Author

It looks like it's not possible for me to set the exact same arguments in both places.

Here is my gatsby-config.js for "how to transform <re-img> images inside markdown files":

resolve: `gatsby-remark-rehype-images`,
options: {
              tag: 're-img',
              maxWidth: 800,
              toFormat: 'WEBP',
              quality: 90,
              tracedSVG: { color: '#f9ebd2' },
              generateTracedSVG: true
}

Here is the comparable portion of my GraphQL query to generate images for my front page (not from markdown):

... on ImageSharp {
                  fluid(maxWidth: 800, toFormat: WEBP, quality: 90, traceSVG: { color: "#f9ebd2" }) {
                    ...GatsbyImageSharpFluid_withWebp_tracedSVG
                  }
}

So we have 2 differences:

  1. Markdown image transform arguments have tag: 're-img'. If I remove this argument, then my Markdown images don't get transformed. If I add the same argument to my GraphQL query, I get an error.

  2. Markdown image transform arguments have generateTracedSVG: true. If I remove this argument, then my Markdown images don't get tracedSVG placeholders. If I add the same argument to my GraphQL query, I get an error.

(Of course I don't know how those config arguments are passed down to plugin-sharp, maybe the issue is not either of those ones but something else entirely. For example, I'm unable to define the return types in gatsby-config, so they might be different.)

Source for a demo: https://github.com/baobabKoodaa/blog/tree/temp384

Here is a demo consistent with that source: https://happy-euler-ccaeea.netlify.com/dont-send-your-own-email/

@baobabKoodaa
Copy link
Contributor Author

Even if it's possible to fix this by modifying gatsby-remark-rehype-images or plugin-sharp, I think a better, more general fix, would be to generate asset paths using file hashes. That way the same issue is not going to re-occur with many different plugins, etc.

@oorestisime
Copy link
Contributor

Well gatsby-transformer-sharp is generating the tracedSVG transformations in a different way this is why the generateTracedSVG is not available there. This can get fixed in a 3.x.x version since would be kind of a breaking change. More info on this #11981 and #11912

This also extends to cache (meaning it should avoid transforming those twice) because cache in gatsby is not global but a per plugin level one. So if a plugin generates an image and then another plugin generates the same image the cache won't actually see it.

I am not sure what the correct solution would be here.

@oorestisime
Copy link
Contributor

Ok so this in the end is more tightly related to my plugin and recent changes in sharp.

proposed solution is to

  • strip generateTracedSVG from the hec calculation so that transformer-sharp and bare sharp images are using the same arguments.
  • update my plugin in order to strip the tag before going in the sharp processing

@polarathene
Copy link
Contributor

I think my issue(closed by @gatsbot but still valid afaik) is related to this.

I had added a feature to support different sized base64 images, however nothing else about the queries were different, yet the build seems to process the images a 2nd time instead of realizing by cache that it's been done before and only needs to handle the base64 images that have different dimensions.

Ok so this in the end is more tightly related to my plugin and recent changes in sharp.

I'm not aware of the recent changes since. I'll have to test it again and see if it's been fixed or if my feature was implemented incorrectly too causing the behaviour.

@oorestisime
Copy link
Contributor

This should be solved by PR #12129

pieh pushed a commit that referenced this issue Mar 27, 2019
…g args (#12129)

So continuing on the work to make gatsby-plugin-sharp friendly on third-party packages (oorestisime/gatsby-remark-rehype-images#2) we now face this issue #12080

TL;DR images generated from transformer-sharp or and other plugins are the same (same sha256) but with different hash (depends on the available options). The problem is that all the plugins leak their options into the hash generation and even ones that are unrelated to the resizing processing (base64, generateTraceSVG, traceSVG etc)

I've been discussing this with @pieh for the last 2-3 hours :D
Proposed solution was to sanitize the args based on a whitelist, which is implemented on the PR. Which leads to the first question:

- is it ok if after this release _all_ image hashes are changing? 

By doing so it can also potentially help  lazy image processing @wardpeet by getting rid of unrelated options when persisting jobs.

Anyway current state of the PR is that it kind of works, i ve tested this in websites using my plugin and other that not. The issue here is that there are some differences in the number of images created when this modification is on or not.

I also verified that public folder doesn't contain duplicate images:

`find public -regex ".*\.\(jpg\|webp\|png\|jpeg\)" -type f -exec sha256sum {} \; | cut -d ' ' -f 1 |sort| uniq -d | wc -l`
@baobabKoodaa
Copy link
Contributor Author

Still experiencing this issue.

Steps to reproduce:

  1. Checkout branch temp384 of this repo
  2. npm i gatsby-plugin-sharp@2.0.32
  3. Create .env file with content POSTS_FOLDER=posts
  4. Clear cache and public, build and serve
  5. Open the page in browser with network tools on, copy the path of the loaded mailbox image
  6. Click the mailbox image to get to a different page with the same mailbox image. Again, copy the path of the loaded image in devtools.
  7. Compare the paths for the images and notice how they are different.

@gatsbot
Copy link

gatsbot bot commented Apr 17, 2019

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here.

If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!

Thanks for being a part of the Gatsby community! 💪💜

@gatsbot gatsbot bot added the stale? Issue that may be closed soon due to the original author not responding any more. label Apr 17, 2019
@baobabKoodaa baobabKoodaa added not stale and removed stale? Issue that may be closed soon due to the original author not responding any more. labels Apr 17, 2019
@gatsbot gatsbot bot added the stale? Issue that may be closed soon due to the original author not responding any more. label Oct 11, 2019
@gatsbot
Copy link

gatsbot bot commented Oct 11, 2019

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here.

If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!

As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

@gatsbot
Copy link

gatsbot bot commented Oct 22, 2019

Hey again!

It’s been 30 days since anything happened on this issue, so our friendly neighborhood robot (that’s me!) is going to close it.

Please keep in mind that I’m only a robot, so if I’ve closed this issue in error, I’m HUMAN_EMOTION_SORRY. Please feel free to reopen this issue or create a new one if you need anything else.

As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks again for being part of the Gatsby community!

@gatsbot gatsbot bot closed this as completed Oct 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Issue with a clear description that the community can help with. stale? Issue that may be closed soon due to the original author not responding any more. status: needs reproduction This issue needs a simplified reproduction of the bug for further troubleshooting. type: bug An issue or pull request relating to a bug in Gatsby
Projects
None yet
Development

No branches or pull requests

5 participants