Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harvesting : Fix DDI Import for otherId #10772

Merged
merged 4 commits into from
Nov 5, 2024

Conversation

luddaniel
Copy link
Contributor

What this PR does / why we need it:

This PR fixes issue while harvesting DDI with multiple otherId
This fix mirrors citation.tsv otherId configuration : allowmultiples = TRUE

otherId Other Identifier Another unique identifier for the Dataset (e.g. producer's or another repository's identifier) none 4 : FALSE FALSE TRUE FALSE FALSE FALSE citation

@luddaniel luddaniel changed the title Fix MultipleCompoundField for otherId Fix DDI Import for otherId Aug 14, 2024
@coveralls
Copy link

coveralls commented Aug 14, 2024

Coverage Status

coverage: 21.224%. remained the same
when pulling 483e99f on Recherche-Data-Gouv:fix_importDDI_otherId
into 6a00ce5 on IQSS:develop.

@pdurbin pdurbin added the Size: 3 A percentage of a sprint. 2.1 hours. label Aug 20, 2024
@pdurbin pdurbin added the Type: Bug a defect label Oct 9, 2024
@cmbz cmbz added FY25 Sprint 8 FY25 Sprint 8 (2024-10-09 - 2024-10-23) FY25 Sprint 9 FY25 Sprint 9 (2024-10-23 - 2024-11-06) labels Oct 23, 2024
@landreev landreev self-assigned this Oct 24, 2024
Copy link
Contributor

@landreev landreev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems non-controversial - just something we forgot to do when otherId was made multiple.
Thank you for the fix.

@ofahimIQSS
Copy link
Contributor

@luddaniel Can you please provide additional details on what this PR is doing and how we can test it.

@landreev landreev removed their assignment Oct 29, 2024
@luddaniel
Copy link
Contributor Author

@ofahimIQSS This PR fixes the edu.harvard.iq.dataverse.util.json.JsonParseException: incorrect multiple for field otherId error when DDI harvested data contains multiple ortherId.
This can be reproduced by harvesting the following repo: https://data.progedo.fr/oai

{
  "nickName": "progedo",
  "dataverseAlias": "root",
  "type": "oai",
  "style": "default",
  "harvestUrl": "https://data.progedo.fr/oai",
  "archiveUrl": "https://data.progedo.fr",
  "archiveDescription": "This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.",
  "metadataFormat": "oai_ddi25",
  "schedule": "none",
  "allowHarvestingMissingCVV": true
}

curl -H "X-Dataverse-key: 600e69df-f046-490a-824e-33f3430b9476" -H "Content-Type: application/json" -X POST "http://localhost:8080/api/harvest/clients/progedo" --upload-file "client.json"

This cannot be reproduced by Importing a Dataset into a Dataverse Installation with a DDI file as this code line is properly coded :

citation.addField(FieldDTO.createMultipleCompoundFieldDTO("otherId", otherIds));

@pdurbin @landreev Is there a way to not waste too much time for this type of PR? Finding the problem and making a PR took me 1 hour whereas providing a detailed explanation, material to test took me 4 hours of headache.

@luddaniel luddaniel changed the title Fix DDI Import for otherId Harvesting : Fix DDI Import for otherId Oct 30, 2024
@pdurbin
Copy link
Member

pdurbin commented Oct 30, 2024

@luddaniel sorry about your headache. I'm not sure what would help. Do you have any suggestions?

@scolapasta scolapasta assigned ofahimIQSS and unassigned luddaniel Oct 31, 2024
@ofahimIQSS
Copy link
Contributor

@luddaniel Thank you for providing the testing details.

After uploading the client.json file with the same data provided, I ran the harvest job and received Success with "1660 failed" in results. I am testing in my local environment. Harvest Log/server log can be found below:

image
harvest_cleanup_progedo_2024-11-04T18-18-49.txt
harvest_progedo_2024-11-04T18-18-49.log
server.log

@luddaniel
Copy link
Contributor Author

@ofahimIQSS My bad, to fix Specified metadatalanguage not allowed. I forgot to give you this instruction :

curl http://localhost:8080/api/admin/settings/:MetadataLanguages -X PUT -d '[{"locale":"en","title":"English"},{"locale":"fr","title":"Français"}]'

This should give you SUCCESS; 1072 harvested, 0 deleted, 588 failed.
Without this PR you should see edu.harvard.iq.dataverse.util.json.JsonParseException: incorrect multiple for field otherId using develop branch for example.

Updated with develop as it contained an harvesting regression fixed by #10990

@ofahimIQSS
Copy link
Contributor

@luddaniel Thanks again - that did the trick. Testing Complete - Merging PR

Testing of 10772.docx

@ofahimIQSS ofahimIQSS merged commit e3b3fea into IQSS:develop Nov 5, 2024
11 checks passed
@ofahimIQSS ofahimIQSS removed their assignment Nov 5, 2024
@pdurbin pdurbin added this to the 6.5 milestone Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FY25 Sprint 8 FY25 Sprint 8 (2024-10-09 - 2024-10-23) FY25 Sprint 9 FY25 Sprint 9 (2024-10-23 - 2024-11-06) Size: 3 A percentage of a sprint. 2.1 hours. Type: Bug a defect
Projects
Status: 🚀 Done (Recherche Data Gouv)
Development

Successfully merging this pull request may close these issues.

6 participants