Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Re-)Add content on generative AI #697

Merged
merged 8 commits into from
Apr 1, 2025
Merged

Conversation

tobyhodges
Copy link
Member

@tobyhodges tobyhodges commented Mar 14, 2025

[This repeats the changes made in #695 (then reverted in #696). I am sorry for the confusion! 😅 The pull request will stay open for at least a week, to give community members time to provide feedback and suggest improvements. Thanks for your patience @vahtras and other Maintainers ❤️ ]

This adds a new section to Built-in Functions and Help, titled "Other ways to get help" that discusses searching the internet, StackOverflow, talking to another person, and generative AI chatbots e.g. ChatGPT as possible ways to get more help when faced with errors while coding.

Some notes to guide feedback:

  • I have tried to keep this as concise as possible, sticking to what I consider to be the most essential information only. But concede that it is still pretty wordy!
  • These changes are guided by conversations within the community over recent months, including but not limited to the community discussion sessions summarised in a couple of recent blog posts (The Ethics of Teaching LLMs in Carpentries Workshops and Essential Knowledge and Misconceptions).

Verified

This commit was signed with the committer’s verified signature.
tobyhodges Toby Hodges
Copy link

github-actions bot commented Mar 14, 2025

Thank you!

Thank you for your pull request 😃

🤖 This automated message can help you check the rendered files in your submission for clarity. If you have any questions, please feel free to open an issue in {sandpaper}.

If you have files that automatically render output (e.g. R Markdown), then you should check for the following:

  • 🎯 correct output
  • 🖼️ correct figures
  • ❓ new warnings
  • ‼️ new errors

Rendered Changes

🔍 Inspect the changes: https://github.com/swcarpentry/python-novice-gapminder/compare/md-outputs..md-outputs-PR-697

The following changes were observed in the rendered markdown documents:

 04-built-in.md | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 md5sum.txt     |  2 +-
 2 files changed, 59 insertions(+), 1 deletion(-)
What does this mean?

If you have source files that require output and figures to be generated (e.g. R Markdown), then it is important to make sure the generated figures and output are reproducible.

This output provides a way for you to inspect the output in a diff-friendly manner so that it's easy to see the changes that occur due to new software versions or randomisation.

⏱️ Updated at 2025-03-28 15:02:55 +0000

@tobyhodges tobyhodges changed the title re-add genAI content (Re-)Add content on generative AI Mar 14, 2025
@tobyhodges
Copy link
Member Author

Quoting the comment received before #695 was merged. @rowleya wrote:

I personally think this is a very reasoned and reasonable approach and I think what you have written here relating both to AI and getting help certainly represents how I do things and also how I think about AI especially when learning. Good job!

github-actions bot pushed a commit that referenced this pull request Mar 14, 2025
@bkmgit
Copy link

bkmgit commented Mar 14, 2025

A difficult topic. Licensing concerns are tricky.

It may be worth exploring how such tools are used ethically in educational settings, particularly for programming, and whether in future they could be incorporated into Carpentries lessons. One need not use LLMs, machine learning models and editor add ons can also be helpful. These might be explored in a separate lesson piloted through the incubator first though.

@tobyhodges
Copy link
Member Author

Thanks @bkmgit. Indeed I would like to help get some lessons like you described through the Incubator. E.g. I think we could do a really nice follow-up to DC Image Processing, exploring ML methods for image analysis.

Copy link
Contributor

@brownsarahm brownsarahm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to link to sources on any of this? I can help provide them, but do not want to put them in if the goal is to not link out

more broadly, this might not fit here, but also, part of why learning to write code to do analyses is important is because LLMs do not reliably answer mathematical questions and for data privacy (as a counter to the idea a person might have to upload their data to a chatbot and ask it to do the analysis)

Copy link
Contributor

@dpshelio dpshelio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice @tobyhodges! Thanks!

tobyhodges and others added 2 commits March 18, 2025 11:20

Verified

This commit was created on github.com and signed with GitHub’s verified signature.
Co-authored-by: David Pérez-Suárez <dps.helio@gmail.com>
Co-authored-by: Sarah Brown <brownsarahm@uri.edu>

Verified

This commit was created on github.com and signed with GitHub’s verified signature.
github-actions bot pushed a commit that referenced this pull request Mar 18, 2025
github-actions bot pushed a commit that referenced this pull request Mar 18, 2025
@tobyhodges
Copy link
Member Author

@brownsarahm wrote

do we want to link to sources on any of this?

I considered this, but came down on the side of not including sources. If nothing else, just for the sake of keeping the focus on what we want to say in workshops, as opposed to which sources are the best fit for each point. But I could be easily persuaded in the opposite direction! If you or anyone else feels strongly, I can put some links in.

Verified

This commit was created on github.com and signed with GitHub’s verified signature.
Co-authored-by: Federica Gazzelloni (she/her) <61802414+Fgazzelloni@users.noreply.github.com>
github-actions bot pushed a commit that referenced this pull request Mar 18, 2025
@mrawls
Copy link

mrawls commented Mar 18, 2025

I think you've threaded the needle reasonably well here. I am pretty strongly in the "don't use generative AI" camp, and I appreciate that you've incorporated context about pros, cons, and real world considerations alongside the blanket request to not use AI when you are learning to code because it defeats the purpose.

Copy link
Contributor

@drammock drammock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice work @tobyhodges. Appreciate all your efforts to include the community in this change.

These tools sometimes generate plausible but incorrect or misleading information, so (just as with an answer found on the internet) it is essential to verify their accuracy.
You need the knowledge and skills to be able to understand these responses, to judge whether or not they are accurate, and to fix any errors in the code it offers you.

In addition to asking for help, programmers can use generative AI tools to generate code from scratch; extend, improve and reorganise existing code; translate code between programming languages; figure out what terms to use in a search of the internet; and more.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels like more of a "sidebar" comment, since it's not about getting help when you're stuck. I wonder if it makes more sense to put it at the end of this section?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intended this to aid flow into the next paragraph. Roughly, something like:

  • This is how the thing we are talking about now is similar to the things we talked about just before
  • These are some of the ways in which people use this thing beyond what we already talked about
  • But these are some ways in which that could be considered problematic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. In my view, the ethical problems apply regardless of whether you're using an LLM to get help when stuck, or using it to write whole programs from scratch (or refactor, translate code, etc). So I don't see why it needs to be right here in that regard.

You need the knowledge and skills to be able to understand these responses, to judge whether or not they are accurate, and to fix any errors in the code it offers you.

In addition to asking for help, programmers can use generative AI tools to generate code from scratch; extend, improve and reorganise existing code; translate code between programming languages; figure out what terms to use in a search of the internet; and more.
However, there are drawbacks that you should be aware of.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you take my suggestion of making the prior paragraph a "sidebar", then this should probably change to something like:

Suggested change
However, there are drawbacks that you should be aware of.
Additionally, there are drawbacks that you should be aware of.

Copy link
Member

@alee alee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this thoughtful and important addition @tobyhodges (and all the comments from the community have been really excellent as well). @swcarpentry/python-novice-gapminder-maintainers getting a lot of pings on this one 🤣

I'm running a clinic at the next CSDMS Annual Meeting on using LLMs for computational modeling so all of the points raised here are also helpful in thinking about how to refine and tailor that stream of work.

It would be great to include some summaries of the mailing list discussion here as well, or perhaps just a link to that entire thread?

The section on generative AI is intended to be concise but Instructors may choose to devote more time to the topic in a workshop.
Depending on your own level of experience and comfort with talking about and using these tools, you could choose to do any of the following:

* Explain how large language models work and are trained, and/or the difference between generative AI, other forms of AI that currently exist, and the concept of artificial general intelligence.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be important to emphasize that current LLMs do not appear to be "reasoning" by any sense of the word. They are statistical engines trained on enormous corpuses of data (internet+) that are very good at producing plausible, grammatically correct sentences that are statistically relevant to the given input (which you explain clearly later on). Would it be useful to be clear that current LLMs are nothing like AGI (which even if they did exist should also not be trusted unequivocally).

Though there is very interesting work being done by google deepmind in the area of neurosymbolic AI (alphageometry etc)...

I still find LLMs to be quite useful at search, summarization, explaining new concepts, rubber ducking, generating first drafts of code snippets, text, images, tests, build scaffolding and scripts, Dockerfiles, Apptainer recipes, k8s configuration, etc.


**We recommend that you avoid getting help from generative AI during the workshop** for several reasons:

1. For most problems you will encounter at this stage, help and answers can be found among the first results returned by searching the internet.
Copy link
Member

@alee alee Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't necessarily agree with the recommendation that learners not use LLMs during a workshop but this is a pretty soft position. Perhaps a brief interlude could have the instructor use a LLM live and demonstrate the back and forth assessment of the responses provided and techniques for effectively constraining and prompting an LLM. I have found tools like the the latest chatgpt, perplexity, notebooklm, github copilot, v0.dev, etc. to be valuable aids to my work but have also seen first-hand the effects of blind application of LLM responses without critical assessment; regardless people are using and will continue to use these tools so it may be important to demo some subset of good practices around them.

There was a "cooking" / "recipe" analogy raised on the mailing list that I don't think quite fits either, the main concern in my opinion is LLMs are statistical regurgitation-of-our-past-knowledge black boxes - https://garymarcus.substack.com/p/decoding-and-debunking-hard-forks does a good job of articulating these issues...

For a resources section I think this YouTube video (by Andrej Karpathy - worth following IMO) might be worth including, it is an excellent introduction to understanding what the current crop of LLMs actually do and how to more effectively use them:

https://www.youtube.com/watch?v=EWvNQjAaOHw

tobyhodges and others added 2 commits March 19, 2025 14:25

Partially verified

This commit was created on github.com and signed with GitHub’s verified signature.
We cannot verify signatures from co-authors, and some of the co-authors attributed to this commit require their commits to be signed.
Co-authored-by: Daniel McCloy <dan@mccloy.info>

Verified

This commit was created on github.com and signed with GitHub’s verified signature.
github-actions bot pushed a commit that referenced this pull request Mar 19, 2025
github-actions bot pushed a commit that referenced this pull request Mar 19, 2025
@tobyhodges
Copy link
Member Author

Thanks to @alee for the suggestion to link to the ongoing discussion on the mailing list. For reasons that remain unclear to me, the thread is split into three (so far?) on the TopicBox site, but here they are:

  1. Part 1
  2. Part 2
  3. Part 3

@mhagdorn
Copy link
Contributor

@tobyhodges I also think this is a very considerate addition. I fully agree with the recommendations.


This is a fast-moving technology.
If you are preparing to teach this section and you feel it has become outdated, please open an issue on the lesson repository to let the Maintainers know and/or a pull request to suggest updates and improvements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If you are comfortable, demonstrating how to work with an LLM, especially through the build in tools, could be a good way to end the workshop, in addition to pointing out other resources.

motivated by @alee's comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on how this differs from the point already included on line 284 of the diff?

Demonstrate how you recommend that learners use generative AI.

My aim with this guidance to Instructros was to actively encourage different approaches based on the individual's level of expertise and comfort with using the technology

tobyhodges and others added 2 commits March 28, 2025 16:00

Partially verified

This commit was created on github.com and signed with GitHub’s verified signature.
We cannot verify signatures from co-authors, and some of the co-authors attributed to this commit require their commits to be signed.
Co-authored-by: Daniel McCloy <dan@mccloy.info>
Co-authored-by: Sarah Brown <brownsarahm@uri.edu>

Verified

This commit was created on github.com and signed with GitHub’s verified signature.
github-actions bot pushed a commit that referenced this pull request Mar 28, 2025
@tobyhodges
Copy link
Member Author

Thanks everyone for your latest contributions. I really appreciate the feedback.

A heads-up that I plan to ask the Maintainers to merge this on Monday next week, so this is a last call for any "dealbreaking" reviews and suggestions!

@tobyhodges
Copy link
Member Author

@swcarpentry/python-novice-gapminder-maintainers I think this is ready to merge now, if you are happy to do so. Thanks for your patience and support 🙌

Copy link
Member

@alee alee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks all for the stimulating and thoughtful discussion around this PR! 🥂

@alee alee merged commit 0b6744f into swcarpentry:main Apr 1, 2025
3 checks passed
github-actions bot pushed a commit that referenced this pull request Apr 1, 2025
Auto-generated via `{sandpaper}`
Source  : 0b6744f
Branch  : main
Author  : Allen Lee <alee@users.noreply.github.com>
Time    : 2025-04-01 20:27:24 +0000
Message : Merge pull request #697 from tobyhodges/llm-assistants

(Re-)Add content on generative AI
github-actions bot pushed a commit that referenced this pull request Apr 1, 2025
Auto-generated via `{sandpaper}`
Source  : 5c93536
Branch  : md-outputs
Author  : GitHub Actions <actions@github.com>
Time    : 2025-04-01 20:28:21 +0000
Message : markdown source builds

Auto-generated via `{sandpaper}`
Source  : 0b6744f
Branch  : main
Author  : Allen Lee <alee@users.noreply.github.com>
Time    : 2025-04-01 20:27:24 +0000
Message : Merge pull request #697 from tobyhodges/llm-assistants

(Re-)Add content on generative AI
github-actions bot pushed a commit to NIH-GREI/python-novice-gapminder that referenced this pull request Apr 2, 2025
Auto-generated via `{sandpaper}`
Source  : 0b6744f
Branch  : main
Author  : Allen Lee <alee@users.noreply.github.com>
Time    : 2025-04-01 20:27:24 +0000
Message : Merge pull request swcarpentry#697 from tobyhodges/llm-assistants

(Re-)Add content on generative AI
github-actions bot pushed a commit to NIH-GREI/python-novice-gapminder that referenced this pull request Apr 2, 2025
Auto-generated via `{sandpaper}`
Source  : 9220ce4
Branch  : md-outputs
Author  : GitHub Actions <actions@github.com>
Time    : 2025-04-02 18:08:30 +0000
Message : markdown source builds

Auto-generated via `{sandpaper}`
Source  : 0b6744f
Branch  : main
Author  : Allen Lee <alee@users.noreply.github.com>
Time    : 2025-04-01 20:27:24 +0000
Message : Merge pull request swcarpentry#697 from tobyhodges/llm-assistants

(Re-)Add content on generative AI
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants