Skip to content
This repository was archived by the owner on Jan 3, 2018. It is now read-only.

Quick feedback on Git lesson on 'open science' #712

Closed
12 tasks
cboettig opened this issue Sep 12, 2014 · 4 comments
Closed
12 tasks

Quick feedback on Git lesson on 'open science' #712

cboettig opened this issue Sep 12, 2014 · 4 comments
Assignees

Comments

@cboettig
Copy link

Hi SWC,

Was just reading through the Open Science lesson under git and thought I would make a few notes to myself about things that could be improved. I can try and get around to a pull request on this, so this is somewhat of a to-do list of things I might tackle in the pull request. Other suggestions, background or push-back is (of course) welcome! It will help guide me before I make too many foolish edits.

This lesson appears to be about open source licensing and hosting, not about "Open Science" per se. (With the exception of the opening vignette that compares two workflows, highlighting all kinds of issues that will not actually be covered in this lesson.) Perhaps the title could be revised to reflect the focus on software licensing and hosting issues (which fits most naturally under the "git" section anyway). Consider:

  • changing the title
  • focusing the introductory vignette to highlight the value of open source licenses

Licensing

This section jumps in a bit too quickly for me. I'd suggest first

  • defining copyright,
  • stating that copyright exists independent of any license or claim about it,
  • and then making it clear that open source licensing is a process of waiving certain rights. This section must also first address the issue of who holds the copyright. For instance, in an academic setting, faculty and students typically own their copyrights, while staff researchers do not. These terms are set by the individual's contract, and they may limit what licenses they can do (for instance, many universities have restrictions about GPL v3 but not v2. The point is not to get into the weeds, but merely to signpost key issues.
  • The discussion of creative commons licenses is too detailed. (In my experience, students that see tables like the one shown try and memorize them, and may lose sight of the more important context). The section does not mention that creative commons licenses are not appropriate for code, which is the clear focus of the discussion. I would remove this table and emphasize that CC licenses should be considered for publications and other creative works (blog posts), not for data or code.
  • If it's desirable to discuss CC licenses, this should reflect the Budapest definition of Open Access, highlighting the fact that of the CC licenses, only CC-by meets that strict definition of open access publishing, though some publishers would like to define it otherwise.
  • Likewise, it is generally held that they are not appropriate for data. The CC0 declaration recommended for Data by the Panton principles and enforced by Dryad, CC0, is not mentioned at all, (instead the section mentions PD, which belies the fact that Creative Commons finds it not so simple to make a multi-national public domain declaration and has created a specific tool for that, with a lengthy text file, though it is not technically a license but a 'declaration').

Hosting

It's not entirely clear if this section is talking about distributing code or about more general work. Since we're not covering data repositories, preprint servers, etc in this section, I think it should be made more clear that the focus is on software hosting. I think this section could be much more concise and potentially more prescriptive: "While researchers frequently distribute code and software by hosting it on their own websites (either on a university or private server), hosting on a dedicated code repository has several advantages." and then briefly mention link rot as the main issue, but also versioning and issue/bug tracking tools available on the repositories you list. (see, e.g. JORS code repo requirements)

Conclusions

  • "People who incorporate GPL'd software into theirs must make theirs open" I feel like this would easily confuse a new user that if they use GPL software, they have to share the code openly online. The clause only impacts them if they want to redistribute their code, (e.g. as a binary).
  • I would edit the comment about the Creative Commons family of licenses to simply say: it does not apply to code (or data).
  • "Projects can be hosted on university servers, on personal domains, or on public forges" -> "projects hosted on public repositories are more likely to be still available in the future"
  • I would add a conclusion that: "open licenses are a way of choosing what rights you waive" and "one should always be clear who owns the copyright in the first place"
@wking
Copy link
Contributor

wking commented Sep 12, 2014

On Fri, Sep 12, 2014 at 10:53:27AM -0700, Carl Boettiger wrote:

  • "Projects can be hosted on university servers, on personal
    domains, or on public forges" -> "projects hosted on public
    repositories are more likely to be still available in the future"

I'd stick to “forges” to avoid confusion between Git repositories
(which hold a single project) and hosting sites (e.g. SourceForge,
GitHub, ...; which host many projects). The students just made it
through their first Git lesson, where we repeatedly use “repository” in
the single-project sense.

Other than that, these all sound like great changes to me :).

@cboettig
Copy link
Author

@wking Good point, I was just using "repository" in the same sense as JORS or as with data archiving, but I see where that would be confusing given the context of the other lessons. I just don't find the term "Forge" particularly intuitive (Not that the suffixes "Bucket" or "Hub" are much better). Any suggestions?

@wking
Copy link
Contributor

wking commented Sep 12, 2014

On Fri, Sep 12, 2014 at 11:17:56AM -0700, Carl Boettiger wrote:

I just don't find the term "Forge" particularly intuitive (Not that
the suffixes "Bucket" or "Hub" are much better). Any suggestions?

I think “forge” is ok (and Wikipedia at least recognizes it 1), but
I agree that it is jargon-y. There's also the more explicit (but less
romantic) “hosting site”, although that doesn't imply auxilliary
services such as issue tracking. Finally, Wikipedia sometimes uses
the unwieldy “collaborative software development management system”.
In this case (novice students), it's probably best to bite the bullet
and use the wordy form, linking it to Wikipedia's software forge
article so the curious can learn the jargon ;).

@rgaiacs
Copy link

rgaiacs commented Sep 12, 2014

Carl,

I agree with your suggestions. If you need help using git just ask.

I suggest that you split your suggestions in two or three pull requests to help
us review it but won't be a problem if you create only one pull request.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants