Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standards landing pages generated by build script #29

Merged
merged 16 commits into from
Mar 3, 2020
Merged

Conversation

baskaufs
Copy link
Contributor

The Python script build.py uses metadata from the CSV tables in the rs.tdwg.org repo to create standards landing pages that conform to the Standards Documentation Specification (SDS). To run it, one downloads the "stds-pages" branch of the repo onto a local drive, then runs the script. The script generates folders and Markdown files that mirror the structure of the standards directory of the website repo. The resulting index.md files generated by the script have been pushed to GitHub here so that they can be viewed from their respective standards directories in rendered form rather than as raw Markdown.

There are two compelling reasons for generating the pages by script rather than by hand. One is simply that it would be a lot of extra work to manually create all of the pages for the old standards, given that nearly all of the information about the standards is already present in the standards.csv, docs.csv, and docs-authors.csv files. But the other, more compelling reason is that the SDS implies that all representations of an abstract resource (such as a standard or document) should contain substantively the same metadata about that resource. The part of the standards landing page that is strictly controlled by the SDS (the header section) should provide exactly the same information as is included in machine-readable serializations. The way to ensure that is to generate the header section from the same information (the tables in the rs.tdwg.org repo) that is used to generate the machine-readable metadata.

There are several key requirements of Section 3.1 the SDS (regarding landing pages for standards) that this script satisfies that are not currently found in many of the existing landing pages:

  • text indicating that the HTTP IRI is the IRI that should be cited (section 3.1.2)
  • lists TDWG as the publisher and links to the TDWG website (section 3.1.3)
  • provides an abstract for every standard using the dcterms:description property of the standard (section 3.1.4)
  • includes a preferred citation for every standard (section 3.1.6)
  • lists all parts of the standard with links to the parts (section 3.1.7)

The last item is the major feature that the script enables. It is critical in two ways: it makes it clear what documents are part of a standard (and by exclusion which ones are not) and it disambiguates standards and the documents that compose them. For some modern standards that include a single document, the distinction may not seem important, but for many of the older standards, TDWG is not the publisher of the document, even though by the act of ratification it is the publisher of the standard. It is also important to distinguish among the standard and documents because some standards contain many documents that may have different contributors, who should be acknowledged independently, as well as different publication dates (which may differ from the ratification date of the standard itself). Taxonomic Literature, Edition 2 and its Supplements is a notable example of this.

Note: using this build script doesn't require that all of the generated pages actually be used on the TDWG website. For landing pages of standards that are actively managed (like DwC or AC for example) it would probably be better to use manually-built Markdown. One could compare with the script-generated page to make sure that the header section is consistent, but create the rest of the page (e.g. links to documents included in the standard rather than full descriptions) manually. But for many of the older standards where it would be a pain to build the page manually (tl-2 is a notable example), the script-generated pages would be very useful. I did make an attempt to include manually generated content as part of the page - I simply copied the extra Markdown from the existing landing pages and pasted it into the "other" column of the pageInfo.csv file, newlines and all. So the most of the manually edited content will still appear in the automatically generated index.md files (although this may not be the best way to manage this content).

I have requested reviews by @peterdesmet (as Jekyll/website guru), @stanblum (as infrastructure czar), and @chicoreus (as fearless TAG leader). I've assigned the various issues below to the three of you based on my guess about which of your experience would be most relevant. The issues below have all been assigned to the "deploy standards landing pages generated by build script" milestone.

The following items need to be resolved before this can be fully implemented:

It would be desirable, but not required, to resolve the following issues before implementation:

This issue should be addressed after implementation:

@baskaufs baskaufs merged commit 8607a35 into master Mar 3, 2020
@baskaufs baskaufs deleted the stds-pages branch March 3, 2020 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant