Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC/TST: Validate documentation pages #24216

Closed
wants to merge 9 commits into from

Conversation

FHaase
Copy link
Contributor

@FHaase FHaase commented Dec 10, 2018

Signed-off-by: Fabian Haase <haase.fabian@gmail.com>
Signed-off-by: Fabian Haase <haase.fabian@gmail.com>
Signed-off-by: Fabian Haase <haase.fabian@gmail.com>
@pep8speaks
Copy link

pep8speaks commented Dec 10, 2018

Hello @FHaase! Thanks for updating the PR.

Comment last updated on December 10, 2018 at 23:59 Hours UTC

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of this @FHaase

Added some comments. If possible, would be nice to add it to ci/code_checks.py in this PR, even if it's for a single document.

Btw, I think you're based in Germany. Not sure if you're subscribed to the pandas-dev list, but in case you're not and you may be interested: https://mail.python.org/pipermail/pandas-dev/2018-December/000862.html

doc/make.py Outdated
@@ -21,6 +21,14 @@
import webbrowser
import jinja2

sys.path.insert(0, os.path.abspath('../scripts'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm -1 on importing anything in scripts/ from other places. When we make changes to the scripts, we don't need to worry about breaking other parts of the code at the moment. I like to keep it this way.

I think this part is not essential, is it?

@@ -0,0 +1,227 @@
import argparse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a comment at the beginning of the script explaining what it does and how to use it.

@@ -0,0 +1,227 @@
import argparse
from fnmatch import fnmatch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't used fnmatch, but I prefer to import modules with submodules having the same name (like datetime) as import fnmatch. Then, in the code, when you see fnmatch.fnmatch there is no ambiguity whether it's the module or the submodule.

import re
import sys

import docutils.nodes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docutils is part of the standard library, right? why the blank line if that's the case?


DOCUMENTATION_SOURCE = os.path.join(os.curdir, '../doc/source')
DOCTREE_PATH = '../doc/build/doctrees/{}.doctree'
RST_PATH = '../doc/source/{}.rst'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are assuming here that the current directory (pwd) is scripts/, but the script can also be called as ./scripts/validate_documentation.py, and I think those paths will be wrong. It's usually a good practice to have a BASE_PATH first that is the root of the project, and use it as the base for all those. I'd say you should find the code in other scripts.

Also, better use os.path.join instead of the hardcoded /, so this can also run in windows.



class DocumentChecker(object):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be a short docstring for the class? If you explain what the script does at the beginning of the file there is not much to say here, but a comment on what does this class do, or how is it expected to be used, can be useful.

issue.append((self.find_line(match), kwargs))

def find_line(self, match):
if not match:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add docstrings to those too? It's not obvious what they do


result = {}
for root, dirs, files in os.walk(DOCUMENTATION_SOURCE):
_, base_dir = root.split('../doc/source')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use os.path.join

return self.issues


def report(reports, output_format='default', errors=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find report descriptive enough

'a single document)')
add('--exclude', default=None,
help='comma separated '
'patterns of pages to exclude. By default it '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the help message I don't understand how I should use this. --exclude=io.rst,api.rst?

@datapythonista datapythonista added Docs CI Continuous Integration labels Dec 11, 2018
@datapythonista datapythonista self-assigned this Dec 11, 2018
Signed-off-by: Fabian Haase <haase.fabian@gmail.com>
@codecov
Copy link

codecov bot commented Dec 11, 2018

Codecov Report

Merging #24216 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #24216   +/-   ##
=======================================
  Coverage   92.21%   92.21%           
=======================================
  Files         162      162           
  Lines       51763    51763           
=======================================
  Hits        47733    47733           
  Misses       4030     4030
Flag Coverage Δ
#multiple 90.61% <ø> (ø) ⬆️
#single 43% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 91802fb...5e7bca4. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 11, 2018

Codecov Report

Merging #24216 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24216      +/-   ##
==========================================
+ Coverage   92.21%   92.21%   +<.01%     
==========================================
  Files         162      162              
  Lines       51763    51763              
==========================================
+ Hits        47733    47734       +1     
+ Misses       4030     4029       -1
Flag Coverage Δ
#multiple 90.61% <ø> (ø) ⬆️
#single 43.01% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/util/testing.py 87.41% <0%> (-0.1%) ⬇️
pandas/core/generic.py 96.65% <0%> (ø) ⬆️
pandas/io/json/json.py 93.09% <0%> (+0.47%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 91802fb...f21a465. Read the comment docs.

Signed-off-by: Fabian Haase <haase.fabian@gmail.com>
@FHaase
Copy link
Contributor Author

FHaase commented Dec 11, 2018

I've added the checks to ./ci/build_docs.sh as the doctrees need to exist when running the script.

FHaase and others added 4 commits December 11, 2018 21:40
Signed-off-by: Fabian Haase <haase.fabian@gmail.com>
Signed-off-by: Fabian Haase <haase.fabian@gmail.com>
Signed-off-by: Fabian Haase <haase.fabian@gmail.com>
Signed-off-by: pandas-docs-bot <pandas-docs-bot@localhost.foo>
@FHaase
Copy link
Contributor Author

FHaase commented Dec 14, 2018

@datapythonista The ci integration is in ./ci/run_tests.sh and generates errors in 'JOB="3.6, doc"'

@datapythonista
Copy link
Member

Nice, lots of stuff to fix :)

I think you mention it somewhere, but I can't find it now. What's the reason to run in run_tests.sh instead of code_checks.py? can't be validate the .rst files directly?

And why adding it to make.py?

@datapythonista
Copy link
Member

I opened #24281 to fix the tailing whitespace problems. We can create a separate issue to fix the lists, but I guess that will be harder to automate and we may need to do it manually.

Btw, you can see how lists are rendered with the new theme here: https://pandas-dev.github.io/pandas-sphinx-theme/pr-datapythonista_base/generated/pandas.DataFrame.head.html#pandas.DataFrame.head

I guess this won't fix the problem with the Parameters section. But you can check an example with 2 spaces, and one with 4, and see if they are rendered correctly in that url.

@FHaase
Copy link
Contributor Author

FHaase commented Dec 14, 2018

Adding it to make.py outputs the issues when running make.py html --single <something> at the end.
So if building one page those issues are easily seen. Running without --single <something> does not output something.
Basically I assumed that when building one page the next step would be to validate the issues anyway. I found this more convenient than having to type 2 commands and having to watch out that the page was build before by sphinx.

The last thing is also the reason why it could not be implemented in ./ci/code_checks.sh as that script is run on circle ci without generating the doc first. In order to check whether a bullet_list is within a block_quote, I load the *.doctree file in ./doc/build/doctrees which exists only after the documentation was build.
The advantage is that all kinds of constraints to the doctree can be checked.

I guess this won't fix the problem with the Parameters section. But you can check an example with 2 spaces, and one with 4, and see if they are rendered correctly in that url.

I don't really check the number of spaces in the file, but literally whether there is a <blockquote> generated around a <ul>.

@FHaase
Copy link
Contributor Author

FHaase commented Dec 14, 2018

BTW, as after the documentation build there are also all the docstrings generated for the classes functions,
Those constraints get applied to those as well.

A possible solution for #22900 would be to run flake8-rst on the docs/source/generated files. I think that should be sufficient.

@datapythonista
Copy link
Member

I see. My concern is in not making things too complex. There are lots of scripts in pandas, a lot of complexity in building the documentation... Linting the docs calling a script from ci/code_checks.py is very easy to understand and to keep under control. And it works in the same way as validate_docstrings.py, so when you know how one works, you know about the other.

Having something similar, but importing scripts/validate_documentation.py from docs/make.py, and having to call in the docs build instead of the ci/code_checks.py, make things more complex to learn, to remember and to maintain.

If it needs to run after (or during) the docs build. Would make sense to have this implemented as a sphinx extension? And I think it's generic enough to be independent of pandas and be used by other projects too. What do you think? Does it make sense?

@datapythonista
Copy link
Member

I added validation for tailing whitespaces in all files (not only documentation) in #24286. I think that may simplify the work here, and it was trivial to validate with a grep there.

@datapythonista
Copy link
Member

I don't think we'll merge this like it is. Not sure if that's too much complexity for the value it adds. But in any case, if we follow this approach, we should implement this as a sphinx extension, probably in a different package and install it via conda.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration Docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: Implement script to validate list indentation in docs
3 participants