New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

DOC/TST: Validate documentation pages #24216

Closed

FHaase wants to merge 9 commits into pandas-dev:master from FHaase:validate_documentation

Contributor

FHaase commented Dec 10, 2018

closes DOC: Implement script to validate list indentation in docs #21520
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff

FHaase added 3 commits

December 10, 2018 18:45


          Validate documentation

534eb70

Signed-off-by: Fabian Haase <haase.fabian@gmail.com>


          Match style of validate_docstrings

7aa12ad

Signed-off-by: Fabian Haase <haase.fabian@gmail.com>


          Fix flake8 issues

93974fb

Signed-off-by: Fabian Haase <haase.fabian@gmail.com>

pep8speaks commented Dec 10, 2018 •

edited

Loading

Hello @FHaase! Thanks for updating the PR.

There are no PEP8 issues in the file doc/make.py !
There are no PEP8 issues in the file scripts/tests/test_validate_documentation.py !
There are no PEP8 issues in the file scripts/validate_documentation.py !

Comment last updated on December 10, 2018 at 23:59 Hours UTC

datapythonista reviewed

View reviewed changes

Member

datapythonista left a comment

Thanks for taking care of this @FHaase

Added some comments. If possible, would be nice to add it to ci/code_checks.py in this PR, even if it's for a single document.

Btw, I think you're based in Germany. Not sure if you're subscribed to the pandas-dev list, but in case you're not and you may be interested: https://mail.python.org/pipermail/pandas-dev/2018-December/000862.html

doc/make.py Outdated

@@ @@ -21,6 +21,14 @@ @@
               import webbrowser
               import jinja2
+              sys.path.insert(0, os.path.abspath('../scripts'))

Member

datapythonista Dec 11, 2018

I'm -1 on importing anything in scripts/ from other places. When we make changes to the scripts, we don't need to worry about breaking other parts of the code at the moment. I like to keep it this way.

I think this part is not essential, is it?

scripts/validate_documentation.py

		@@ -0,0 +1,227 @@
		import argparse

Member

datapythonista Dec 11, 2018

We should have a comment at the beginning of the script explaining what it does and how to use it.

scripts/validate_documentation.py

		@@ -0,0 +1,227 @@
		import argparse
		from fnmatch import fnmatch

Member

datapythonista Dec 11, 2018

haven't used fnmatch, but I prefer to import modules with submodules having the same name (like datetime) as import fnmatch. Then, in the code, when you see fnmatch.fnmatch there is no ambiguity whether it's the module or the submodule.

scripts/validate_documentation.py Outdated

+              import re
+              import sys
+              import docutils.nodes

Member

datapythonista Dec 11, 2018

docutils is part of the standard library, right? why the blank line if that's the case?

scripts/validate_documentation.py Outdated

+              DOCUMENTATION_SOURCE = os.path.join(os.curdir, '../doc/source')
+              DOCTREE_PATH = '../doc/build/doctrees/{}.doctree'
+              RST_PATH = '../doc/source/{}.rst'

Member

datapythonista Dec 11, 2018

I think you are assuming here that the current directory (pwd) is scripts/, but the script can also be called as ./scripts/validate_documentation.py, and I think those paths will be wrong. It's usually a good practice to have a BASE_PATH first that is the root of the project, and use it as the base for all those. I'd say you should find the code in other scripts.

Also, better use os.path.join instead of the hardcoded /, so this can also run in windows.

scripts/validate_documentation.py



		class DocumentChecker(object):

Member

datapythonista Dec 11, 2018

May be a short docstring for the class? If you explain what the script does at the beginning of the file there is not much to say here, but a comment on what does this class do, or how is it expected to be used, can be useful.

scripts/validate_documentation.py

+                      issue.append((self.find_line(match), kwargs))
+                  def find_line(self, match):
+                      if not match:

Member

datapythonista Dec 11, 2018

can you add docstrings to those too? It's not obvious what they do

scripts/validate_documentation.py Outdated

+                  result = {}
+                  for root, dirs, files in os.walk(DOCUMENTATION_SOURCE):
+                      _, base_dir = root.split('../doc/source')

Member

datapythonista Dec 11, 2018

use os.path.join

scripts/validate_documentation.py Outdated

		return self.issues


		def report(reports, output_format='default', errors=None):

Member

datapythonista Dec 11, 2018

I don't find report descriptive enough

scripts/validate_documentation.py Outdated

+                           'a single document)')
+                  add('--exclude', default=None,
+                      help='comma separated '
+                           'patterns of pages to exclude. By default it '

Member

datapythonista Dec 11, 2018

From the help message I don't understand how I should use this. --exclude=io.rst,api.rst?

datapythonista added Docs CI labels

datapythonista self-assigned this


          Review of @datapythonista

5e7bca4

Signed-off-by: Fabian Haase <haase.fabian@gmail.com>

codecov bot commented Dec 11, 2018

Codecov Report

Merging #24216 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #24216   +/-   ##
=======================================
  Coverage   92.21%   92.21%           
=======================================
  Files         162      162           
  Lines       51763    51763           
=======================================
  Hits        47733    47733           
  Misses       4030     4030

Flag	Coverage Δ
#multiple	`90.61% <ø> (ø)`	⬆️
#single	`43% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 91802fb...5e7bca4. Read the comment docs.

codecov bot commented Dec 11, 2018 •

edited

Loading

Codecov Report

Merging #24216 into master will increase coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #24216      +/-   ##
==========================================
+ Coverage   92.21%   92.21%   +<.01%     
==========================================
  Files         162      162              
  Lines       51763    51763              
==========================================
+ Hits        47733    47734       +1     
+ Misses       4030     4029       -1

Flag	Coverage Δ
#multiple	`90.61% <ø> (ø)`	⬆️
#single	`43.01% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/util/testing.py	`87.41% <0%> (-0.1%)`	⬇️
pandas/core/generic.py	`96.65% <0%> (ø)`	⬆️
pandas/io/json/json.py	`93.09% <0%> (+0.47%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 91802fb...f21a465. Read the comment docs.


          Review of @datapythonista

08e7546

Signed-off-by: Fabian Haase <haase.fabian@gmail.com>

Contributor Author

FHaase commented Dec 11, 2018

I've added the checks to ./ci/build_docs.sh as the doctrees need to exist when running the script.

FHaase and others added 4 commits

December 11, 2018 21:40


          Fix travis build

3e0c233

Signed-off-by: Fabian Haase <haase.fabian@gmail.com>


          Make validate_documentation executable

907c587

Signed-off-by: Fabian Haase <haase.fabian@gmail.com>


          Skip pages missing corresponding rst-file

1ffa59a

Signed-off-by: Fabian Haase <haase.fabian@gmail.com>


          Move linting execution to run_tests.sh

f21a465

Signed-off-by: pandas-docs-bot <pandas-docs-bot@localhost.foo>

Contributor Author

FHaase commented Dec 14, 2018

@datapythonista The ci integration is in ./ci/run_tests.sh and generates errors in 'JOB="3.6, doc"'

Member

datapythonista commented Dec 14, 2018

Nice, lots of stuff to fix :)

I think you mention it somewhere, but I can't find it now. What's the reason to run in run_tests.sh instead of code_checks.py? can't be validate the .rst files directly?

And why adding it to make.py?

datapythonista mentioned this pull request

DOC: Removing tailing whitespaces in .rst files #24281

Merged

4 tasks

Member

datapythonista commented Dec 14, 2018

I opened #24281 to fix the tailing whitespace problems. We can create a separate issue to fix the lists, but I guess that will be harder to automate and we may need to do it manually.

Btw, you can see how lists are rendered with the new theme here: https://pandas-dev.github.io/pandas-sphinx-theme/pr-datapythonista_base/generated/pandas.DataFrame.head.html#pandas.DataFrame.head

I guess this won't fix the problem with the Parameters section. But you can check an example with 2 spaces, and one with 4, and see if they are rendered correctly in that url.

Contributor Author

FHaase commented Dec 14, 2018

Adding it to make.py outputs the issues when running make.py html --single <something> at the end.
So if building one page those issues are easily seen. Running without --single <something> does not output something.
Basically I assumed that when building one page the next step would be to validate the issues anyway. I found this more convenient than having to type 2 commands and having to watch out that the page was build before by sphinx.

The last thing is also the reason why it could not be implemented in ./ci/code_checks.sh as that script is run on circle ci without generating the doc first. In order to check whether a bullet_list is within a block_quote, I load the *.doctree file in ./doc/build/doctrees which exists only after the documentation was build.
The advantage is that all kinds of constraints to the doctree can be checked.

I guess this won't fix the problem with the Parameters section. But you can check an example with 2 spaces, and one with 4, and see if they are rendered correctly in that url.

I don't really check the number of spaces in the file, but literally whether there is a <blockquote> generated around a <ul>.

Contributor Author

FHaase commented Dec 14, 2018

BTW, as after the documentation build there are also all the docstrings generated for the classes functions,
Those constraints get applied to those as well.

A possible solution for #22900 would be to run flake8-rst on the docs/source/generated files. I think that should be sufficient.

Member

datapythonista commented Dec 14, 2018

I see. My concern is in not making things too complex. There are lots of scripts in pandas, a lot of complexity in building the documentation... Linting the docs calling a script from ci/code_checks.py is very easy to understand and to keep under control. And it works in the same way as validate_docstrings.py, so when you know how one works, you know about the other.

Having something similar, but importing scripts/validate_documentation.py from docs/make.py, and having to call in the docs build instead of the ci/code_checks.py, make things more complex to learn, to remember and to maintain.

If it needs to run after (or during) the docs build. Would make sense to have this implemented as a sphinx extension? And I think it's generic enough to be independent of pandas and be used by other projects too. What do you think? Does it make sense?

Member

datapythonista commented Dec 14, 2018

I added validation for tailing whitespaces in all files (not only documentation) in #24286. I think that may simplify the work here, and it was trivial to validate with a grep there.

Member

datapythonista commented Dec 30, 2018

I don't think we'll merge this like it is. Not sure if that's too much complexity for the value it adds. But in any case, if we follow this approach, we should implement this as a sphinx extension, probably in a different package and install it via conda.

datapythonista closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels