Standards landing pages generated by build script #29

baskaufs · 2019-05-24T19:16:40Z

The Python script build.py uses metadata from the CSV tables in the rs.tdwg.org repo to create standards landing pages that conform to the Standards Documentation Specification (SDS). To run it, one downloads the "stds-pages" branch of the repo onto a local drive, then runs the script. The script generates folders and Markdown files that mirror the structure of the standards directory of the website repo. The resulting index.md files generated by the script have been pushed to GitHub here so that they can be viewed from their respective standards directories in rendered form rather than as raw Markdown.

There are two compelling reasons for generating the pages by script rather than by hand. One is simply that it would be a lot of extra work to manually create all of the pages for the old standards, given that nearly all of the information about the standards is already present in the standards.csv, docs.csv, and docs-authors.csv files. But the other, more compelling reason is that the SDS implies that all representations of an abstract resource (such as a standard or document) should contain substantively the same metadata about that resource. The part of the standards landing page that is strictly controlled by the SDS (the header section) should provide exactly the same information as is included in machine-readable serializations. The way to ensure that is to generate the header section from the same information (the tables in the rs.tdwg.org repo) that is used to generate the machine-readable metadata.

There are several key requirements of Section 3.1 the SDS (regarding landing pages for standards) that this script satisfies that are not currently found in many of the existing landing pages:

text indicating that the HTTP IRI is the IRI that should be cited (section 3.1.2)
lists TDWG as the publisher and links to the TDWG website (section 3.1.3)
provides an abstract for every standard using the dcterms:description property of the standard (section 3.1.4)
includes a preferred citation for every standard (section 3.1.6)
lists all parts of the standard with links to the parts (section 3.1.7)

The last item is the major feature that the script enables. It is critical in two ways: it makes it clear what documents are part of a standard (and by exclusion which ones are not) and it disambiguates standards and the documents that compose them. For some modern standards that include a single document, the distinction may not seem important, but for many of the older standards, TDWG is not the publisher of the document, even though by the act of ratification it is the publisher of the standard. It is also important to distinguish among the standard and documents because some standards contain many documents that may have different contributors, who should be acknowledged independently, as well as different publication dates (which may differ from the ratification date of the standard itself). Taxonomic Literature, Edition 2 and its Supplements is a notable example of this.

Note: using this build script doesn't require that all of the generated pages actually be used on the TDWG website. For landing pages of standards that are actively managed (like DwC or AC for example) it would probably be better to use manually-built Markdown. One could compare with the script-generated page to make sure that the header section is consistent, but create the rest of the page (e.g. links to documents included in the standard rather than full descriptions) manually. But for many of the older standards where it would be a pain to build the page manually (tl-2 is a notable example), the script-generated pages would be very useful. I did make an attempt to include manually generated content as part of the page - I simply copied the extra Markdown from the existing landing pages and pasted it into the "other" column of the pageInfo.csv file, newlines and all. So the most of the manually edited content will still appear in the automatically generated index.md files (although this may not be the best way to manage this content).

I have requested reviews by @peterdesmet (as Jekyll/website guru), @stanblum (as infrastructure czar), and @chicoreus (as fearless TAG leader). I've assigned the various issues below to the three of you based on my guess about which of your experience would be most relevant. The issues below have all been assigned to the "deploy standards landing pages generated by build script" milestone.

The following items need to be resolved before this can be fully implemented:

Issue Sort out titles in Jekyll header, header section, and citations #21 page vs. standards titles @peterdesmet @stanblum
Issue harmonize summary in Jekyll header and abstract in header section #22 summary in Jekyll header vs. standard abstract @peterdesmet @stanblum
Issue Should document abstracts in standards landing pages be abbreviated? #23 should documents abstracts be abbreviated when present on standards landing pages? @peterdesmet @stanblum
Issue Figure out what documents should actually be considered to be part of a standard #8 verify what documents should actually be included in standards @stanblum @chicoreus
Issue Browser-viewable distributions should be generated for every standards document #9 some standards documents can't be viewed in a browser @peterdesmet
Issue documents metadata header in script-generated standards landing pages aren't styled #27 documents metadata in generated landing pages are not styled @peterdesmet

It would be desirable, but not required, to resolve the following issues before implementation:

Issue Sort out how we acknowledge contributors in citations and elsewhere #24 sort out how we acknowledge contributors in citations of standards and documents @stanblum @chicoreus
Issue Ratification dates for standards and publication dates for docs are a mess #25 ratification and publication dates are unclear for old standards @stanblum
Issue Including version information in citations #26 deal with versions and version information for documents @peterdesmet @chicoreus

This issue should be addressed after implementation:

Issue redirect permanent standards IRIs to the standards landing pages #28 redirect "permanent URLs" of standards to the landing pages @peterdesmet

Steve Baskauf added 11 commits May 22, 2019 20:12

remove duplicate standards parts

8237062

incremental work on standards landing pages build script

e373f08

completed updating of tables to generate metadata

9f32319

clean up and comment build script

ed6c7fe

Fix hyperlinking for doc URLs to avoid parentheses Markdown problem

85ad86b

push output to Github to check Markdown rendering

6b7e941

add additional fields in SDS-required header section

ad0b2c2

escape parens in DOI and return to standard Markdown link generation

5e4c012

fix reversed link/text

d2c5c66

modify script to add addtional Markdown (if any)

892da7b

escape spaces in page anchor

0754335

baskaufs added the stds landing pg build script label May 24, 2019

baskaufs added this to the deploy standards landing pages generated by build script milestone May 24, 2019

baskaufs requested review from peterdesmet, chicoreus and stanblum May 24, 2019 19:16

Steve Baskauf added 5 commits May 24, 2019 15:39

add two missing ISBNs to citations

8cc3485

regenerate pages with updated ISBNs

0805751

updated docs.csv based on corrections

1a1ebaa

Update Audubon Core modification date

4c4ffc9

Merge branch 'master' into stds-pages

37f664c

baskaufs merged commit 8607a35 into master Mar 3, 2020

baskaufs deleted the stds-pages branch March 3, 2020 17:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standards landing pages generated by build script #29

Standards landing pages generated by build script #29

baskaufs commented May 24, 2019

Standards landing pages generated by build script #29

Standards landing pages generated by build script #29

Conversation

baskaufs commented May 24, 2019