Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Java dependency analysis types and launcher using javaparser library. #12890

Merged
merged 3 commits into from
Sep 21, 2021

Conversation

patricklaw
Copy link
Member

@patricklaw patricklaw commented Sep 15, 2021

This PR adds the core functionality for inferring from Java source:

  • imports
  • declared package
  • top level types that could potentially be imported by other sources from the provided source

It uses the com.github.javaparser:javaparser-symbol-solver Java library to do the heavy lifting. We have a custom launcher for calling into this library, and that launcher source is compiled on-the-fly by Pants (the source itself is loaded using pkg_resources). This is a novel way of calling into Pants-provided JVM code that will probably act as a template for similar situations, like custom JUnit launchers.

[ci skip-rust]
[ci skip-build-wheels]

@patricklaw patricklaw requested a review from stuhood September 15, 2021 00:55
@patricklaw
Copy link
Member Author

@stuhood Can you take a look at src/python/pants/backend/java/dependency_inference/java_parser.py in this PR at a high level? I'm particularly trying to figure out:

  • Should I make this result "fallible" like you did with javac? My inclination is yes.
  • Does the pkg_resources hack for on-the-fly compilation of Java seem reasonable?
  • Is there a cleaner way to dump the in-memory Java source into a "synthetic" target so I can just request the classfiles from the regular javac rules, rather than reimplement javac here?
  • Do you see any issues with requiring that this always operate over a single source? I'm concerned that this is going to make using it awkward, but it seems like the whole idea here is that we can always guarantee that java source is reduced to a single source file, even if it later gets grouped back together by other rules.

@Eric-Arellano
Copy link
Contributor

Do you see any issues with requiring that this always operate over a single source? I'm concerned that this is going to make using it awkward, but it seems like the whole idea here is that we can always guarantee that java source is reduced to a single source file, even if it later gets grouped back together by other rules.

See https://docs.google.com/document/d/1HpJn2jTWf5sKob6zhe4SqHZ7KWBlX4a6OJOwnuw-IHo/edit#. Yes, this is okay thanks to "file targets". That concept will soon be easier to understand as we'll have dedicated java_source targets.

Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

  • Should I make this result "fallible" like you did with javac? My inclination is yes.

Yea, I believe so. Failing to extract imports from a file is something that we should either ignore or warn for (if we can be confident that the parser only fails for truly invalid syntax).

  • Does the pkg_resources hack for on-the-fly compilation of Java seem reasonable?

Yes, very: for a few reasons. The big one is that it's somewhat likely that as with Python we'll need multiple versions of the parsing library to support new syntax introduced by new JVM versions (over time: not now). Maybe the syntax is stable enough that having a "very modern version" of the parsing library can deal with all previous syntaxes: but if the parser version is JVM dependent, we might not be able to rely on the very modern version of the parsing library working on, say: JDK8.

  • Is there a cleaner way to dump the in-memory Java source into a "synthetic" target so I can just request the classfiles from the regular javac rules, rather than reimplement javac here?

There is not currently... as it stands, any "synthetic targets" (basically only file targets, but see Eric's comment) that exist are associated-with/parented-by a BUILD file target. Allowing for a target without any concrete BUILD target as its parent would require adding the big "is_synthetic" property that v1 had, which was a bit of a landmine for goals (since it wasn't typesafe that the target wasn't a "real" target).

We might be able to solve this in the medium term: but for now, some code duplication will be necessary.

  • Do you see any issues with requiring that this always operate over a single source? I'm concerned that this is going to make using it awkward, but it seems like the whole idea here is that we can always guarantee that java source is reduced to a single source file, even if it later gets grouped back together by other rules.

No, operating on a single file is totally appropriate. nailgun should make it fine from a performance perspective, and it's the right thing to do in general.


Also, as an aside: empty __init__.py files shouldn't be necessary: did something break without them?
src/python/pants/backend/java/dependency_inference/__init__.py

from pants.engine.rules import Get, collect_rules, rule
from pants.engine.target import Sources


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll ignore import_parser*.py for now, since I don't see any blockers for using the java-based parser.

],
input_digest=merged_digest,
output_directories=("classfiles",),
description=f"Compile {LAUNCHER_BASENAME} import processors with javac",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: import parser would align with the Python side.

argv=[
coursier.coursier.exe,
"java",
"--system-jvm", # TODO(#12293): use a fixed JDK version from a subsystem.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The javac subsystem should probably be renamed, and this and junit should probably consume the flag for this. Could probably cheat and consume it even with its current name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need a jvm subsystem?



@rule
async def parse_java_package(request: ParseJavaPackageRequest) -> ParsedJavaPackage:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the medium (possibly even short?) term, it would be great to have an option that asserts that filenames will match packages so that we can skip parsing, and directly use the filename (by stripping source roots).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a restriction we're willing to add? In general, Java packages don't need to match source directories; that only matters for classfiles on the classpath (or in a jar). See the various unit tests in the Java subsystem that exercise this for examples

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behind an option? Yea, I think so. Well behaved monorepos are very likely to follow it. But because it wouldn't be the default, it's probably not a priority.

@patricklaw
Copy link
Member Author

I pushed another big update to this PR. I've got first party source analysis to a place I'm pretty happy with:

  • Both package and top level types are parsed from source, instead of only package. This means that package-private types are correctly handled, so importing a non-public type from the same package will do the right thing.
  • Instead of being hierarchical "tree" oriented, all dependencies are rooted on packages (and then top level types within those packages). The apparent hierarchy of Java packages is an illusion, so it removes a bunch of degenerate cases to never assuming that import org.Foo should fall back to depending on everything that has a package prefix of org (the old behavior, which obviously would cause things to silently go off the rails by implicitly depending on most of the world).

However I still have a few major issues I'm concerned about, and I'd like your thoughts, Stu:

  • I'm not convinced that I can assume that all of my inputs are source targets with exactly 1 file. You can check out this branch and run src/python/pants/backend/java/dependency_inference/rules_test.py::test_dependencies_from_inferred_deps to see a concrete example of where I don't do anything particularly crazy, but I trigger a scenario where my code has to raise an exception because of multiple sources in an input. Also, what happens if I'm requesting SourceFiles with codegen enabled and a codegen protocol generates multiple output sources? And so on. It doesn't feel like the restriction to a single source is represented in the types that I'm working with, so I'm not surprised that I end up unexpectedly encountering violations of that assumption.
  • How does "fallibility" work in practice here? How do we effectively let the user know that parsing a source failed without eagerly failing the build? If we do want to just eagerly fail the build, how do we expose the error to the user in a way that is consistent with how we would otherwise show a compilation error? In effect, the import/export parser acts as a pre-compiler, so any syntax error anywhere in the repo will always be caught by it first. (This is also how buildgen worked back in the day, and we sort of leveraged it as a "feature" since it was much faster than normal Scala compilation, but it wasn't intended to work like that in general).

@Eric-Arellano
Copy link
Contributor

How does "fallibility" work in practice here? How do we effectively let the user know that parsing a source failed without eagerly failing the build?

Hm, @stuhood , note that we do fail for Python import parsing errors.

It doesn't feel like the restriction to a single source is represented in the types that I'm working with, so I'm not surprised that I end up unexpectedly encountering violations of that assumption.

Indeed, it is not! But it very soon can be: #12957. We can do that same PR for Java, and because Java is still experimental, we don't need to do that weird hack where both the atom target and generator target use the same symbol name: we can have java_source vs java_sources. Your module mapping rule will be updated to only work with the file-based java_source and java_test etc, as that PR does for Shell. Does that make sense?

@tdyas
Copy link
Contributor

tdyas commented Sep 20, 2021

I'm not convinced that I can assume that all of my inputs are source targets with exactly 1 file.

You could if we switch the Java support to use target generation and something similar to #12957.

@patricklaw patricklaw marked this pull request as ready for review September 20, 2021 23:01
@patricklaw patricklaw changed the title Add javaparser based import and package inference for Java source Add Java dependency analysis types and launcher using javalauncher library. Sep 20, 2021
@patricklaw patricklaw changed the title Add Java dependency analysis types and launcher using javalauncher library. Add Java dependency analysis types and launcher using javaparser library. Sep 20, 2021
@patricklaw patricklaw added the backend: JVM JVM backend-related issues label Sep 20, 2021
@patricklaw patricklaw requested a review from stuhood September 20, 2021 23:35
Copy link
Contributor

@tdyas tdyas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm just some Java style comments plus the need to match current API of coursier_lockfile.

argv=[
coursier.coursier.exe,
"java",
"--system-jvm", # TODO(#12293): use a fixed JDK version from a subsystem.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need a jvm subsystem?

…brary.

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
…analysis (to satisfy MyPy)

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@patricklaw patricklaw merged commit 9c12fc6 into pantsbuild:main Sep 21, 2021
@stuhood
Copy link
Member

stuhood commented Sep 21, 2021

I pushed another big update to this PR. I've got first party source analysis to a place I'm pretty happy with:

@patricklaw : Awesome stuff. Thanks!

How does "fallibility" work in practice here? How do we effectively let the user know that parsing a source failed without eagerly failing the build?

Hm, @stuhood , note that we do fail for Python import parsing errors.

That might be reasonable for Python. But it's overkill on the JVM I think, because the compiler will fail and give you a much better error message about invalid syntax than the parsing library is likely to.

So, my recommendation would be to either ignore parsing errors entirely (if we aren't confident that the parsing library won't ever fail to parse something that javac succeeds at), or to warn for them, but to still allow javac to render the final error. We should avoid warning for things that the user cannot control: so we should not warn unless we are highly confident that the parser will only fail in situations where javac would too.

I don't think that we should error on import extraction until we're very confident in the parser's ability to parse all Java code it encounters in the wild.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: JVM JVM backend-related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants