fix: add support for gatsby-source-graphql #32

wKovacs64 · 2019-12-07T21:46:36Z

BREAKING CHANGE: See the migration guide for details.

Closes #7

wKovacs64 · 2019-12-07T21:47:42Z

Sorry for the size of this PR, @graysonhicks - it was just stuck in my head and I had to get it out. 😆

Check it out, see what you think...

graysonhicks · 2019-12-19T14:17:21Z

Sorry this has taken me a while to get back, but this looks great and I am leaning towards accepting it. Basic thought is that being able to work with gatsby-source-graphql is more important for now than the nested arrays behavior. I believe the nested arrays behavior could be added in the future. So I would definitely do this as a major version bump and add very documentation for those in version 2.

wKovacs64 · 2019-12-19T19:50:07Z

No worries. Obviously, it's up to you whether you want to go the direction of this PR or not. I could not dream up a use-case where separating the image source URL field from the newly created File node would matter, but there's a real possibility I just haven't considered something. The implementation in this PR works in every scenario I could find to test, with nested arrays just requiring some config and query updates (no schema changes). 🤷‍♂

Do you have a concrete example of a schema in mind that this PR would not support (that the current version does support)? Just curious.

graysonhicks · 2019-12-23T15:12:15Z

No, I don't have the schema in mind. I just know that was one of the first features that issues started popping up for. After thinking of it more and reading the recent #33 I do think this PR is the way to go. I can do a merge and new 3.0.0 release after the Christmas/New Year holidays.

Also, I think ignore the conflict in here for now since it is conflicting with #34 which is a v2 fix.

wKovacs64 · 2019-12-23T16:09:47Z

I just rebased, should be a clean merge when it's time.

kilburn · 2020-01-08T06:30:19Z

I have tried using this in my largeish project (3k pages) instead of the currently released version.

Benefits: it actually works. The current implementation is very fragile especially in development mode (I've had issues with the caching part, stale images when nodes are updated and stuff like that)

Problems:

This crashes the build when the url field is empty for some node. I think this is a common situation that should be handled (I just added an escape hatch where the resolve function returns undefined in this case and everything seems fine)
More importantly, our build times have tripled with this. This happens because images are downloaded during the query running phase instead of the sourcing phase. Gatsby has a hardcoded limit of 4 concurrent queries. Combining both these things, our build is now spending a lot of time during the query running phase just waiting for images to download (4 queries are launched concurrently, a few images are discovered and start downloading, and until all images for a query are downloaded gatsby is just waiting. After a bit one query finishes, another gets run, a few more images are discovered and start downloading, etc.).

This is not this plugin's fault, but we could try to improve.the situation.

On thw one hand, we could consider asking gatsby to allow to modify the query concurrency with an env variable or something. I don't know what other unintended consequences that may have though...

On the other hand, we've decided we'll be using an ad-hoc solution for our case. We use a CMS that changes the image urls whenever an image is modified, making our image urls effectively immutable. Hence, we are going to implement an extra caching layer, separate from gatsby's cache (that gets cleared in many situations), and just copy the images from there once they've been downloaded at some point in the past. If we see good results from that I will opem a separate ticket to see if there's interest in integrating this possibility as a plugin option or something like that.

wKovacs64 · 2020-01-08T17:24:00Z

Great feedback, thanks @kilburn.

I completely agree we should handle the first issue. ~~I'll update the PR.~~ I've updated the PR.

The second issue is a bummer and I'm not sure how to best handle it. If I understand it right, the time in the life cycle when this (Gatsby schema customization) runs is by design, so I'm not sure how to get around the slowness other than a Gatsby change similar to what you suggested.

BREAKING CHANGE: See the migration guide for details.

graysonhicks · 2020-01-08T17:58:15Z

Yea, thanks for the feedback @kilburn . In general, image processing and downloading is by far the slowest part of the Gatsby build processes. I'm curious if you are saying this plugin in particular is going slower than getting the same images with the same settings using another source plugin?

kilburn · 2020-01-10T06:59:04Z

I'm afraid I have to come back with bad news. I said before that build times had tripled with this new approach. This is true, but only in build mode. It turns out that develop mode with a cold gatsby cache is unbearably slow (it went from taking ~15min. with the current version of the plugin to ~90min. with the current one).

What's more, we've implemented the "out-of-gatsby" caching I explained before. This reduced our build time significantly but doesn't noticeably reduce the develop time.

That is, there's something really wrong with the develop pipeline when using the code in this PR. I will keep investigating and report back here (I'll try to prepare a reproduction repository) but in the meantime I would not recommend you to release this yet.

wKovacs64 · 2020-01-10T15:18:46Z

Interesting. Perhaps this is the wrong approach after all - or we're missing something?

kilburn · 2020-01-10T17:20:11Z

I've done a few more experiments. Here are the timings I'm getting now (using latest versions of all gatsby plugins and our out-of-gatsby caching of images):

| build   | hot .img-cache | hot public  | hot .cache  |  192.49 sec
| build   | hot .img-cache | hot public  | cold .cache |  378.46 sec
| build   | hot .img-cache | cold public | hot .cache  |  367.85 sec
| build   | hot .img-cache | cold public | cold .cache |  764.55 sec

| develop | hot .img-cache | hot public  | hot .cache  |  412.76 sec
| develop | hot .img-cache | hot public  | cold .cache | 3132.76 sec
| develop | hot .img-cache | cold public | cold .cache | 3758.23 sec
| develop | hot .img-cache | cold public | hot .cache  |   90.01 sec*

* This is extremely fast but doesn't work. When you try to open the site the gatsby console shows an error: "Error loading a result for the page query in "/". Query was not run and no cached result was found".

Observations:

If I start with a cold cache, it is faster to do gatsby build && gatsby develop (~20min) than simply gatsby develop (~62min). This is strange and seems to point to a problem in gatsby, not something caused by this plugin itself.
Thumbnail generation does not seem to block queries from completing. Gatsby finishes the "run-queries" phase even while thumbnails are still being generated, but doesn't proceed to the next phase (building html pages) until they are all done.
In contrast, this PR blocks queries from finishing until the images are downloaded. I don't understand why this difference and think it would make for a nice improvement if we could somehow replicate what transformer-sharp does here.
With all the fiddling, I think I've spotted one of the main problems with the old code. Warning: The following reasoning reflects my current understanding, which may very well be wrong!

When we create nodes using createRemoteFileNode, we tell that function the parentNodeId for the created FileNode. However, in some cases (many in our case) we request the same image (url) for several different source nodes. Unfortunately gatsby-source-filesystem either reuses (if it is currently being downloaded) or overwrites (by creating a new node with the same id) the FileNode for a given url every time we call it after the first one.

In the end, there exists a single FileNode in the store, but "referenced" by multiple source nodes. This is good because we want to reuse the image. BUT, that single FileNode points to a single one of the sources nodes as its parent. This causes issues later on when, for instance, that single parent gets deleted and gatsby automatically deletes the FileNode (because it is one of its children). When this happened, the other source nodes would now refer to a non-existant FileNode and would error when queried.

With the current code this doesn't happen because FileNodes are recreated every time they are queried, and hence they always seem to exist.

I'll keep researching the issue, but for the moment our decision is to keep using the code in this PR combined with our out-of-gatsby image cache. This code makes gatsby develop not crash with the live updating of source info which is the most important part for us right now. If we have to delete the gatsby's cache for some reason we'll just run gatsby build before gatsby develop which is bearable because it doesn't happen often.

graysonhicks · 2020-07-06T17:00:13Z

Gonna close this as gatsby-source-graphql is undergoing a major overhaul.
https://github.com/vladar/gatsby-graphql-toolkit

wKovacs64 mentioned this pull request Dec 21, 2019

Impossible to add two image fields in the same node #33

Closed

fix: add support for gatsby-source-graphql

e217a9d

BREAKING CHANGE: See the migration guide for details.

wKovacs64 mentioned this pull request Feb 1, 2020

Array of images #37

Closed

graysonhicks closed this Jul 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add support for gatsby-source-graphql #32

fix: add support for gatsby-source-graphql #32

wKovacs64 commented Dec 7, 2019

wKovacs64 commented Dec 7, 2019

graysonhicks commented Dec 19, 2019

wKovacs64 commented Dec 19, 2019

graysonhicks commented Dec 23, 2019

wKovacs64 commented Dec 23, 2019

kilburn commented Jan 8, 2020

wKovacs64 commented Jan 8, 2020 •

edited

Loading

graysonhicks commented Jan 8, 2020

kilburn commented Jan 10, 2020 •

edited

Loading

wKovacs64 commented Jan 10, 2020

kilburn commented Jan 10, 2020 •

edited

Loading

graysonhicks commented Jul 6, 2020

fix: add support for gatsby-source-graphql #32

fix: add support for gatsby-source-graphql #32

Conversation

wKovacs64 commented Dec 7, 2019

wKovacs64 commented Dec 7, 2019

graysonhicks commented Dec 19, 2019

wKovacs64 commented Dec 19, 2019

graysonhicks commented Dec 23, 2019

wKovacs64 commented Dec 23, 2019

kilburn commented Jan 8, 2020

wKovacs64 commented Jan 8, 2020 • edited Loading

graysonhicks commented Jan 8, 2020

kilburn commented Jan 10, 2020 • edited Loading

wKovacs64 commented Jan 10, 2020

kilburn commented Jan 10, 2020 • edited Loading

graysonhicks commented Jul 6, 2020

wKovacs64 commented Jan 8, 2020 •

edited

Loading

kilburn commented Jan 10, 2020 •

edited

Loading

kilburn commented Jan 10, 2020 •

edited

Loading