Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add support for gatsby-source-graphql #32

Closed
wants to merge 1 commit into from
Closed

fix: add support for gatsby-source-graphql #32

wants to merge 1 commit into from

Conversation

wKovacs64
Copy link
Contributor

BREAKING CHANGE: See the migration guide for details.

Closes #7

@wKovacs64
Copy link
Contributor Author

Sorry for the size of this PR, @graysonhicks - it was just stuck in my head and I had to get it out. 😆

Check it out, see what you think...

@graysonhicks
Copy link
Owner

Sorry this has taken me a while to get back, but this looks great and I am leaning towards accepting it. Basic thought is that being able to work with gatsby-source-graphql is more important for now than the nested arrays behavior. I believe the nested arrays behavior could be added in the future. So I would definitely do this as a major version bump and add very documentation for those in version 2.

@wKovacs64
Copy link
Contributor Author

No worries. Obviously, it's up to you whether you want to go the direction of this PR or not. I could not dream up a use-case where separating the image source URL field from the newly created File node would matter, but there's a real possibility I just haven't considered something. The implementation in this PR works in every scenario I could find to test, with nested arrays just requiring some config and query updates (no schema changes). 🤷‍♂

Do you have a concrete example of a schema in mind that this PR would not support (that the current version does support)? Just curious.

@graysonhicks
Copy link
Owner

No, I don't have the schema in mind. I just know that was one of the first features that issues started popping up for. After thinking of it more and reading the recent #33 I do think this PR is the way to go. I can do a merge and new 3.0.0 release after the Christmas/New Year holidays.

Also, I think ignore the conflict in here for now since it is conflicting with #34 which is a v2 fix.

@wKovacs64
Copy link
Contributor Author

I just rebased, should be a clean merge when it's time.

@kilburn
Copy link

kilburn commented Jan 8, 2020

I have tried using this in my largeish project (3k pages) instead of the currently released version.

Benefits: it actually works. The current implementation is very fragile especially in development mode (I've had issues with the caching part, stale images when nodes are updated and stuff like that)

Problems:

  1. This crashes the build when the url field is empty for some node. I think this is a common situation that should be handled (I just added an escape hatch where the resolve function returns undefined in this case and everything seems fine)

  2. More importantly, our build times have tripled with this. This happens because images are downloaded during the query running phase instead of the sourcing phase. Gatsby has a hardcoded limit of 4 concurrent queries. Combining both these things, our build is now spending a lot of time during the query running phase just waiting for images to download (4 queries are launched concurrently, a few images are discovered and start downloading, and until all images for a query are downloaded gatsby is just waiting. After a bit one query finishes, another gets run, a few more images are discovered and start downloading, etc.).

This is not this plugin's fault, but we could try to improve.the situation.

On thw one hand, we could consider asking gatsby to allow to modify the query concurrency with an env variable or something. I don't know what other unintended consequences that may have though...

On the other hand, we've decided we'll be using an ad-hoc solution for our case. We use a CMS that changes the image urls whenever an image is modified, making our image urls effectively immutable. Hence, we are going to implement an extra caching layer, separate from gatsby's cache (that gets cleared in many situations), and just copy the images from there once they've been downloaded at some point in the past. If we see good results from that I will opem a separate ticket to see if there's interest in integrating this possibility as a plugin option or something like that.

@wKovacs64
Copy link
Contributor Author

wKovacs64 commented Jan 8, 2020

Great feedback, thanks @kilburn.

I completely agree we should handle the first issue. I'll update the PR. I've updated the PR.

The second issue is a bummer and I'm not sure how to best handle it. If I understand it right, the time in the life cycle when this (Gatsby schema customization) runs is by design, so I'm not sure how to get around the slowness other than a Gatsby change similar to what you suggested.

BREAKING CHANGE: See the migration guide for details.
@graysonhicks
Copy link
Owner

Yea, thanks for the feedback @kilburn . In general, image processing and downloading is by far the slowest part of the Gatsby build processes. I'm curious if you are saying this plugin in particular is going slower than getting the same images with the same settings using another source plugin?

@kilburn
Copy link

kilburn commented Jan 10, 2020

I'm afraid I have to come back with bad news. I said before that build times had tripled with this new approach. This is true, but only in build mode. It turns out that develop mode with a cold gatsby cache is unbearably slow (it went from taking ~15min. with the current version of the plugin to ~90min. with the current one).

What's more, we've implemented the "out-of-gatsby" caching I explained before. This reduced our build time significantly but doesn't noticeably reduce the develop time.

That is, there's something really wrong with the develop pipeline when using the code in this PR. I will keep investigating and report back here (I'll try to prepare a reproduction repository) but in the meantime I would not recommend you to release this yet.

@wKovacs64
Copy link
Contributor Author

Interesting. Perhaps this is the wrong approach after all - or we're missing something?

@kilburn
Copy link

kilburn commented Jan 10, 2020

I've done a few more experiments. Here are the timings I'm getting now (using latest versions of all gatsby plugins and our out-of-gatsby caching of images):

| build   | hot .img-cache | hot public  | hot .cache  |  192.49 sec
| build   | hot .img-cache | hot public  | cold .cache |  378.46 sec
| build   | hot .img-cache | cold public | hot .cache  |  367.85 sec
| build   | hot .img-cache | cold public | cold .cache |  764.55 sec

| develop | hot .img-cache | hot public  | hot .cache  |  412.76 sec
| develop | hot .img-cache | hot public  | cold .cache | 3132.76 sec
| develop | hot .img-cache | cold public | cold .cache | 3758.23 sec
| develop | hot .img-cache | cold public | hot .cache  |   90.01 sec*

* This is extremely fast but doesn't work. When you try to open the site the gatsby console shows an error: "Error loading a result for the page query in "/". Query was not run and no cached result was found".

Observations:

  • If I start with a cold cache, it is faster to do gatsby build && gatsby develop (~20min) than simply gatsby develop (~62min). This is strange and seems to point to a problem in gatsby, not something caused by this plugin itself.

  • Thumbnail generation does not seem to block queries from completing. Gatsby finishes the "run-queries" phase even while thumbnails are still being generated, but doesn't proceed to the next phase (building html pages) until they are all done.
    In contrast, this PR blocks queries from finishing until the images are downloaded. I don't understand why this difference and think it would make for a nice improvement if we could somehow replicate what transformer-sharp does here.

  • With all the fiddling, I think I've spotted one of the main problems with the old code. Warning: The following reasoning reflects my current understanding, which may very well be wrong!

    When we create nodes using createRemoteFileNode, we tell that function the parentNodeId for the created FileNode. However, in some cases (many in our case) we request the same image (url) for several different source nodes. Unfortunately gatsby-source-filesystem either reuses (if it is currently being downloaded) or overwrites (by creating a new node with the same id) the FileNode for a given url every time we call it after the first one.

    In the end, there exists a single FileNode in the store, but "referenced" by multiple source nodes. This is good because we want to reuse the image. BUT, that single FileNode points to a single one of the sources nodes as its parent. This causes issues later on when, for instance, that single parent gets deleted and gatsby automatically deletes the FileNode (because it is one of its children). When this happened, the other source nodes would now refer to a non-existant FileNode and would error when queried.

    With the current code this doesn't happen because FileNodes are recreated every time they are queried, and hence they always seem to exist.

I'll keep researching the issue, but for the moment our decision is to keep using the code in this PR combined with our out-of-gatsby image cache. This code makes gatsby develop not crash with the live updating of source info which is the most important part for us right now. If we have to delete the gatsby's cache for some reason we'll just run gatsby build before gatsby develop which is bearable because it doesn't happen often.

@wKovacs64 wKovacs64 mentioned this pull request Feb 1, 2020
@graysonhicks
Copy link
Owner

Gonna close this as gatsby-source-graphql is undergoing a major overhaul.
https://github.com/vladar/gatsby-graphql-toolkit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How to using plugin with gatsby-source-graphql
3 participants