Skip to content
This repository was archived by the owner on Nov 17, 2022. It is now read-only.

Extract data from Schemas #303

Open
hayfield opened this issue Mar 19, 2018 · 0 comments
Open

Extract data from Schemas #303

hayfield opened this issue Mar 19, 2018 · 0 comments
Labels
api Changes to the pyIATI API. enhancement Some sort of new functionality (rather than fixing or tweaking something that already existed). missing-feature A major feature that should exist, but does not. parent-issue An issue that makes reference to a number of other issues that split the large task into parts. schemas Relating to IATI Schemas.

Comments

@hayfield
Copy link
Contributor

Schemas are used to represent the structure that IATI XML is expected to be in. They contain a number of elements and attributes. Each of these has information that would be useful to extract. This includes descriptions, the occurrence properties, and XPaths that things occur at. Following research into this area, there does not appear to be a standard method to undertake this task using open tooling.

#64 provides an initial attempt at extracting this information. This is, however, using tools that aren't really designed for the job, leading to hundreds of lines of fairly confusing code that is hard to comprehend, doesn't really handle all the cases that it needs to, and would be a challenge to maintain.

It is therefore proposed to implement this functionality using a two-stage process:

  • Utilise XSLT to transform the Schema into an Intermediate Representation (IR) that has the information structured in an easy-to-query format
  • Have capabilities available within the schemas module to access the information presented in the IR through a defined Python API

Based on preliminary investigation, the IR will likely:

  • Treat elements and attributes as equivalents
    • ie. an optional attribute would become: min_occurs = 0 and max_occurs = 1
  • Be designed such that the primary key is an XPath
@hayfield hayfield added enhancement Some sort of new functionality (rather than fixing or tweaking something that already existed). missing-feature A major feature that should exist, but does not. schemas Relating to IATI Schemas. api Changes to the pyIATI API. parent-issue An issue that makes reference to a number of other issues that split the large task into parts. labels Mar 19, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api Changes to the pyIATI API. enhancement Some sort of new functionality (rather than fixing or tweaking something that already existed). missing-feature A major feature that should exist, but does not. parent-issue An issue that makes reference to a number of other issues that split the large task into parts. schemas Relating to IATI Schemas.
Projects
None yet
Development

No branches or pull requests

1 participant