You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have an interesting one for you. I noticed that whenever I run the BTF pipeline, with the same patient and same data, the hash on the DMP is never the same - even though the size of the files might be identical. See the DMP HACKATHON_TEST dataset, for example the upload that is 6.33KB (or 6.32 sometimes!).
A little diff told me that the data files are identical, but - thinking back - the meta.json files are logically not. In particular, these files include the url pointing to the download file. These links change each time you query BTF API to retrieve these for download.
I do not foresee any immediate issue - but if WP5 is implementing a safety feature to reject duplicate uploads it will not be triggered for any of the BTF uploads. I believe DRM meta.json files do not include temporary URIs, so no problem there. If we want to be able to compare/detect duplicates in the future and consider our current uploads, it would be worth tackling this now.
My suggestion is to delete or replace the "rawdata" key value pair before storing it in the meta.json. Thoughts?
The text was updated successfully, but these errors were encountered:
davidverweij
changed the title
Byteflies DMP payload will never be the same when rerunning the pipeline
[ByteFlies] DMP hash will never be the same when rerunning the pipeline
Mar 26, 2021
My suggestion is to delete or replace the "rawdata" key value pair before storing it in the meta.json. Thoughts?
@davidverweij -- agreed that this would be a good solution so we can begin pushing BTF data to DMP. I would would all links with an empty string assuming that is the only characteristic of the response that changes.
For referene, this also means that if any of the API's we use change then historical data cannot be compared on DMP as we store meta.json, which would be impacted on such an API change and thus impact the hash.
I have an interesting one for you. I noticed that whenever I run the BTF pipeline, with the same patient and same data, the hash on the DMP is never the same - even though the size of the files might be identical. See the DMP
HACKATHON_TEST
dataset, for example the upload that is 6.33KB (or 6.32 sometimes!).A little diff told me that the data files are identical, but - thinking back - the
meta.json
files are logically not. In particular, these files include the url pointing to the download file. These links change each time you query BTF API to retrieve these for download.I do not foresee any immediate issue - but if WP5 is implementing a safety feature to reject duplicate uploads it will not be triggered for any of the BTF uploads. I believe DRM
meta.json
files do not include temporary URIs, so no problem there. If we want to be able to compare/detect duplicates in the future and consider our current uploads, it would be worth tackling this now.My suggestion is to delete or replace the
"rawdata"
key value pair before storing it in themeta.json
. Thoughts?The text was updated successfully, but these errors were encountered: