-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TRAP Caching: Add timeouts to upload/download operations #1280
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just some minor comments.
Will we get sufficient telemetry to tell how often the timeout is reached? |
5a19e95
to
7d9a971
Compare
Our telemetry will record an upload/download time that's very near to the timeout. This should allow us to pick out runs where we timed out fairly easily by filtering on telemetry rows that are near this value. I suppose there might be a few runs that finished just before the timeout that might mistakenly get attributed as timeouts by this. If we find that becomes an issue, we can add an extra telemetry field that records whether we hit a timeout, but as long as that's fairly rare (which I expect to be, because most normal upload/downloads we've seen thus far are very far from the timeout) I don't think it warrants an extra field and the associated complexity that threading that through the code entails. |
7d9a971
to
c0641ea
Compare
IIRC we measure the aggregate upload / download time over all languages. So I think we could tell whether we timed out when analyzing a single language, but not when analyzing multiple languages (unless they all timed out). It should be an unlikely case though, so I wouldn't personally push for extra telemetry fields at this time. |
() => { | ||
logger.info( | ||
`Timed out waiting for TRAP cache for ${language} to upload, will continue without uploading` | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor, optional: You can remove the curly braces here and in other calls like this. I'm surprised the linter doesn't complain about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave them in. I personally prefer how it reads with the curly braces, since it makes it clearer that this is part of the callback. If we ever want to add a second statement here (e.g. to record a telemetry field saying we timed out), then it will also make the diff then cleaner.
Yes, it will be a little tricker to infer from the timings whether timeouts occurred when there's more than one language. We may want to consider adding an extra field then, although I hope this will be rare enough to not warrant it. |
Per @henrymercer's suggestion as he observed one instance where TRAP cache downloading hanged and caused a whole run to fail, this PR adds timeouts to both the download and upload operations.
Merge / deployment checklist