-
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add seed data and instructions #80
Conversation
Thank you @russtoku for the great instructions! |
Apologies for the large number of changes:
|
These are working now:
I'd like to suggest that we merge things at this point to give people something to work with. The seeding of all 201 public bodies will take a few more days. If I finish that before this is merged, I'll add it. |
the public body information from a CSV file. | ||
|
||
``` | ||
$ python manage.py loaddata data/seed/2024-03-15-classification.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These filenames need to be updated
|
||
- Then, on the `Public Body` page (Home > Public Body > Public Bodies), scroll | ||
down to the bottom of the page to where there is a `Choose File` button next | ||
to the `Import Public Bodies` button. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried running this?
python manage.py import_csv data/seed/2024-03-24-public-bodies-fixed.csv
Docs: https://froide.readthedocs.io/en/latest/importpublicbodies/#importing-via-command-line
I gave it a quick shot, and it seemed to work:
➜ uipa git:(pr/80) python manage.py import_csv data/seed/test-public-bodies.csv
/Users/tylerchong/.pyenv/versions/3.10.13/lib/python3.10/site-packages/django_elasticsearch_dsl/documents.py:178: ElasticsearchWarning: Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.15/security-minimal-setup.html to enable security.
response = bulk(client=self._get_connection(), actions=actions, **kwargs)
Import done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few nitpick commands, but with this PR I was able to seed the portal with all the public bodies! 🚀
data/seed/Seeding.md
Outdated
``` | ||
$ python manage.py loaddata data/seed/2024-03-24-categories.json | ||
``` | ||
- On the Public Bodies page of the Admin website, upload the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weirdly the previous import_csv
command fails here:
➜ uipa git:(pr/80) python manage.py import_csv data/seed/2024-03-24-public-bodies-fixed.csv
/Users/tylerchong/.pyenv/versions/3.10.13/lib/python3.10/site-packages/django_elasticsearch_dsl/documents.py:178: ElasticsearchWarning: Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.15/security-minimal-setup.html to enable security.
response = bulk(client=self._get_connection(), actions=actions, **kwargs)
Traceback (most recent call last):
File "/Users/tylerchong/Desktop/workspace/codewithaloha/uipa/manage.py", line 13, in <module>
execute_from_command_line(sys.argv)
File "/Users/tylerchong/.pyenv/versions/3.10.13/lib/python3.10/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
utility.execute()
File "/Users/tylerchong/.pyenv/versions/3.10.13/lib/python3.10/site-packages/django/core/management/__init__.py", line 436, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/Users/tylerchong/.pyenv/versions/3.10.13/lib/python3.10/site-packages/django/core/management/base.py", line 412, in run_from_argv
self.execute(*args, **cmd_options)
File "/Users/tylerchong/.pyenv/versions/3.10.13/lib/python3.10/site-packages/django/core/management/base.py", line 458, in execute
output = self.handle(*args, **options)
File "/Users/tylerchong/Desktop/workspace/codewithaloha/uipa/src/froide/froide/publicbody/management/commands/import_csv.py", line 25, in handle
importer.import_from_file(f)
File "/Users/tylerchong/Desktop/workspace/codewithaloha/uipa/src/froide/froide/publicbody/csv_import.py", line 47, in import_from_file
self.import_row(row)
File "/Users/tylerchong/Desktop/workspace/codewithaloha/uipa/src/froide/froide/publicbody/csv_import.py", line 90, in import_row
row["parent"] = PublicBody._default_manager.get(slug=slugify(parent))
File "/Users/tylerchong/.pyenv/versions/3.10.13/lib/python3.10/site-packages/django/db/models/manager.py", line 87, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/Users/tylerchong/.pyenv/versions/3.10.13/lib/python3.10/site-packages/django/db/models/query.py", line 637, in get
raise self.model.DoesNotExist(
froide.publicbody.models.PublicBody.DoesNotExist: PublicBody matching query does not exist.
but going through the Web UI with the same file seems to work, so 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure why it does that. I get the *PublicBody matching query does not exist." message when I upload via the Admin website. I get 113 public bodies loaded but the CSV file has 201.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message is saying that the parent of the public body being loaded doesn't exist in the Public Body table yet. So things stop at this point. It's a data issue. I'll need to clean up the data a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 - I'll merge in #81 then in the meantime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data/seed/2024-03-24-public-bodies-fixed.csv
file appears to be good because after several attempts all 201 public bodies are loaded.
In my working directory, there appears to be something flaky going on with src/froide/froide/publicbody/csv_import.py
in my virtual environment. If I add a few print statements all 201 public bodies will load. I'm using Python 3.10.13 in a virtual environment created using the venv module. When only 118 or so public bodies are loaded, it usually due to the name and the slug not matching up. This prevents public bodies with parent bodies to fail to load.
I can't find a permanent fix for loading public bodies from a CSV file. I've updated the I'd like to see more people getting involved so hopefully the seeding of the database helps make the set-up process easier. |
I discovered that deleting and reloading the public bodies from a CSV file several times will eventually load all 201 public bodies correctly. I've this to a note in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
``` | ||
$ python extract_sets.py ../2024-03-15-Hawaii_UIPA_Public_Bodies_All.csv | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preliminary information to seed a development database. This should be enough to anyone started and provide a base to move forward from.
I'm still working on a Python script to create a category.json fixture and CSV file for uploading public body data from the CSV extract from the production UIPA.org website.