Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ByteFlies] DMP hash will never be the same when rerunning the pipeline #54

Open
davidverweij opened this issue Mar 26, 2021 · 1 comment
Labels
bug Oh my byteflies FS Device data-transfer Data Transfer Protocol

Comments

@davidverweij
Copy link
Member

I have an interesting one for you. I noticed that whenever I run the BTF pipeline, with the same patient and same data, the hash on the DMP is never the same - even though the size of the files might be identical. See the DMP HACKATHON_TEST dataset, for example the upload that is 6.33KB (or 6.32 sometimes!).

A little diff told me that the data files are identical, but - thinking back - the meta.json files are logically not. In particular, these files include the url pointing to the download file. These links change each time you query BTF API to retrieve these for download.

{
    "id": "uuid",
    "signals": [
        {
            "rawData": "unique_download_link_with_temp_auth",
        },
     ]
}

I do not foresee any immediate issue - but if WP5 is implementing a safety feature to reject duplicate uploads it will not be triggered for any of the BTF uploads. I believe DRM meta.json files do not include temporary URIs, so no problem there. If we want to be able to compare/detect duplicates in the future and consider our current uploads, it would be worth tackling this now.

My suggestion is to delete or replace the "rawdata" key value pair before storing it in the meta.json. Thoughts?

@davidverweij davidverweij added bug Oh my byteflies FS Device data-transfer Data Transfer Protocol labels Mar 26, 2021
@davidverweij davidverweij changed the title Byteflies DMP payload will never be the same when rerunning the pipeline [ByteFlies] DMP hash will never be the same when rerunning the pipeline Mar 26, 2021
@jawrainey
Copy link
Member

jawrainey commented Mar 26, 2021

My suggestion is to delete or replace the "rawdata" key value pair before storing it in the meta.json. Thoughts?

@davidverweij -- agreed that this would be a good solution so we can begin pushing BTF data to DMP. I would would all links with an empty string assuming that is the only characteristic of the response that changes.

For referene, this also means that if any of the API's we use change then historical data cannot be compared on DMP as we store meta.json, which would be impacted on such an API change and thus impact the hash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Oh my byteflies FS Device data-transfer Data Transfer Protocol
Projects
None yet
Development

No branches or pull requests

2 participants