Add Java dependency analysis types and launcher using javaparser library. #12890

patricklaw · 2021-09-15T00:18:09Z

This PR adds the core functionality for inferring from Java source:

imports
declared package
top level types that could potentially be imported by other sources from the provided source

It uses the com.github.javaparser:javaparser-symbol-solver Java library to do the heavy lifting. We have a custom launcher for calling into this library, and that launcher source is compiled on-the-fly by Pants (the source itself is loaded using pkg_resources). This is a novel way of calling into Pants-provided JVM code that will probably act as a template for similar situations, like custom JUnit launchers.

[ci skip-rust]
[ci skip-build-wheels]

patricklaw · 2021-09-15T01:02:53Z

@stuhood Can you take a look at src/python/pants/backend/java/dependency_inference/java_parser.py in this PR at a high level? I'm particularly trying to figure out:

Should I make this result "fallible" like you did with javac? My inclination is yes.
Does the pkg_resources hack for on-the-fly compilation of Java seem reasonable?
Is there a cleaner way to dump the in-memory Java source into a "synthetic" target so I can just request the classfiles from the regular javac rules, rather than reimplement javac here?
Do you see any issues with requiring that this always operate over a single source? I'm concerned that this is going to make using it awkward, but it seems like the whole idea here is that we can always guarantee that java source is reduced to a single source file, even if it later gets grouped back together by other rules.

Eric-Arellano · 2021-09-15T01:12:28Z

Do you see any issues with requiring that this always operate over a single source? I'm concerned that this is going to make using it awkward, but it seems like the whole idea here is that we can always guarantee that java source is reduced to a single source file, even if it later gets grouped back together by other rules.

See https://docs.google.com/document/d/1HpJn2jTWf5sKob6zhe4SqHZ7KWBlX4a6OJOwnuw-IHo/edit#. Yes, this is okay thanks to "file targets". That concept will soon be easier to understand as we'll have dedicated java_source targets.

stuhood

Looks great!

Should I make this result "fallible" like you did with javac? My inclination is yes.

Yea, I believe so. Failing to extract imports from a file is something that we should either ignore or warn for (if we can be confident that the parser only fails for truly invalid syntax).

Does the pkg_resources hack for on-the-fly compilation of Java seem reasonable?

Yes, very: for a few reasons. The big one is that it's somewhat likely that as with Python we'll need multiple versions of the parsing library to support new syntax introduced by new JVM versions (over time: not now). Maybe the syntax is stable enough that having a "very modern version" of the parsing library can deal with all previous syntaxes: but if the parser version is JVM dependent, we might not be able to rely on the very modern version of the parsing library working on, say: JDK8.

Is there a cleaner way to dump the in-memory Java source into a "synthetic" target so I can just request the classfiles from the regular javac rules, rather than reimplement javac here?

There is not currently... as it stands, any "synthetic targets" (basically only file targets, but see Eric's comment) that exist are associated-with/parented-by a BUILD file target. Allowing for a target without any concrete BUILD target as its parent would require adding the big "is_synthetic" property that v1 had, which was a bit of a landmine for goals (since it wasn't typesafe that the target wasn't a "real" target).

We might be able to solve this in the medium term: but for now, some code duplication will be necessary.

Do you see any issues with requiring that this always operate over a single source? I'm concerned that this is going to make using it awkward, but it seems like the whole idea here is that we can always guarantee that java source is reduced to a single source file, even if it later gets grouped back together by other rules.

No, operating on a single file is totally appropriate. nailgun should make it fine from a performance perspective, and it's the right thing to do in general.

Also, as an aside: empty __init__.py files shouldn't be necessary: did something break without them?
src/python/pants/backend/java/dependency_inference/__init__.py

stuhood · 2021-09-15T17:26:18Z

src/python/pants/backend/java/dependency_inference/import_parser.py

+from pants.engine.rules import Get, collect_rules, rule
+from pants.engine.target import Sources
+
+


I'll ignore import_parser*.py for now, since I don't see any blockers for using the java-based parser.

stuhood · 2021-09-15T17:41:10Z

src/python/pants/backend/java/dependency_inference/java_parser.py

+            ],
+            input_digest=merged_digest,
+            output_directories=("classfiles",),
+            description=f"Compile {LAUNCHER_BASENAME} import processors with javac",


Nit: import parser would align with the Python side.

stuhood · 2021-09-15T17:45:15Z

src/python/pants/backend/java/dependency_inference/java_parser.py

+        argv=[
+            coursier.coursier.exe,
+            "java",
+            "--system-jvm",  # TODO(#12293): use a fixed JDK version from a subsystem.


The javac subsystem should probably be renamed, and this and junit should probably consume the flag for this. Could probably cheat and consume it even with its current name.

Maybe we need a jvm subsystem?

stuhood · 2021-09-15T17:49:00Z

src/python/pants/backend/java/dependency_inference/package_parser.py

+
+
+@rule
+async def parse_java_package(request: ParseJavaPackageRequest) -> ParsedJavaPackage:


In the medium (possibly even short?) term, it would be great to have an option that asserts that filenames will match packages so that we can skip parsing, and directly use the filename (by stripping source roots).

Is that a restriction we're willing to add? In general, Java packages don't need to match source directories; that only matters for classfiles on the classpath (or in a jar). See the various unit tests in the Java subsystem that exercise this for examples

Behind an option? Yea, I think so. Well behaved monorepos are very likely to follow it. But because it wouldn't be the default, it's probably not a priority.

patricklaw · 2021-09-19T23:38:51Z

I pushed another big update to this PR. I've got first party source analysis to a place I'm pretty happy with:

Both package and top level types are parsed from source, instead of only package. This means that package-private types are correctly handled, so importing a non-public type from the same package will do the right thing.
Instead of being hierarchical "tree" oriented, all dependencies are rooted on packages (and then top level types within those packages). The apparent hierarchy of Java packages is an illusion, so it removes a bunch of degenerate cases to never assuming that import org.Foo should fall back to depending on everything that has a package prefix of org (the old behavior, which obviously would cause things to silently go off the rails by implicitly depending on most of the world).

However I still have a few major issues I'm concerned about, and I'd like your thoughts, Stu:

I'm not convinced that I can assume that all of my inputs are source targets with exactly 1 file. You can check out this branch and run src/python/pants/backend/java/dependency_inference/rules_test.py::test_dependencies_from_inferred_deps to see a concrete example of where I don't do anything particularly crazy, but I trigger a scenario where my code has to raise an exception because of multiple sources in an input. Also, what happens if I'm requesting SourceFiles with codegen enabled and a codegen protocol generates multiple output sources? And so on. It doesn't feel like the restriction to a single source is represented in the types that I'm working with, so I'm not surprised that I end up unexpectedly encountering violations of that assumption.
How does "fallibility" work in practice here? How do we effectively let the user know that parsing a source failed without eagerly failing the build? If we do want to just eagerly fail the build, how do we expose the error to the user in a way that is consistent with how we would otherwise show a compilation error? In effect, the import/export parser acts as a pre-compiler, so any syntax error anywhere in the repo will always be caught by it first. (This is also how buildgen worked back in the day, and we sort of leveraged it as a "feature" since it was much faster than normal Scala compilation, but it wasn't intended to work like that in general).

Eric-Arellano · 2021-09-20T17:41:11Z

How does "fallibility" work in practice here? How do we effectively let the user know that parsing a source failed without eagerly failing the build?

Hm, @stuhood , note that we do fail for Python import parsing errors.

It doesn't feel like the restriction to a single source is represented in the types that I'm working with, so I'm not surprised that I end up unexpectedly encountering violations of that assumption.

Indeed, it is not! But it very soon can be: #12957. We can do that same PR for Java, and because Java is still experimental, we don't need to do that weird hack where both the atom target and generator target use the same symbol name: we can have java_source vs java_sources. Your module mapping rule will be updated to only work with the file-based java_source and java_test etc, as that PR does for Shell. Does that make sense?

tdyas · 2021-09-20T22:01:00Z

I'm not convinced that I can assume that all of my inputs are source targets with exactly 1 file.

You could if we switch the Java support to use target generation and something similar to #12957.

tdyas

lgtm just some Java style comments plus the need to match current API of coursier_lockfile.

src/python/pants/backend/java/dependency_inference/PantsJavaParserLauncher.java

src/python/pants/backend/java/dependency_inference/java_parser.py

tdyas · 2021-09-21T00:42:52Z

src/python/pants/backend/java/dependency_inference/java_parser.py

+        argv=[
+            coursier.coursier.exe,
+            "java",
+            "--system-jvm",  # TODO(#12293): use a fixed JDK version from a subsystem.


Maybe we need a jvm subsystem?

src/python/pants/backend/java/dependency_inference/java_parser_launcher.py

src/python/pants/backend/java/dependency_inference/java_parser_test.py

…brary. # Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

…analysis (to satisfy MyPy) # Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

stuhood · 2021-09-21T03:46:46Z

I pushed another big update to this PR. I've got first party source analysis to a place I'm pretty happy with:

@patricklaw : Awesome stuff. Thanks!

How does "fallibility" work in practice here? How do we effectively let the user know that parsing a source failed without eagerly failing the build?

Hm, @stuhood , note that we do fail for Python import parsing errors.

That might be reasonable for Python. But it's overkill on the JVM I think, because the compiler will fail and give you a much better error message about invalid syntax than the parsing library is likely to.

So, my recommendation would be to either ignore parsing errors entirely (if we aren't confident that the parsing library won't ever fail to parse something that javac succeeds at), or to warn for them, but to still allow javac to render the final error. We should avoid warning for things that the user cannot control: so we should not warn unless we are highly confident that the parser will only fail in situations where javac would too.

I don't think that we should error on import extraction until we're very confident in the parser's ability to parse all Java code it encounters in the wild.

patricklaw requested a review from stuhood September 15, 2021 00:55

stuhood reviewed Sep 15, 2021

View reviewed changes

patricklaw force-pushed the java_dep_inf branch from a0bf6f5 to b501a3f Compare September 19, 2021 23:23

patricklaw force-pushed the java_dep_inf branch from b501a3f to b8c2f6a Compare September 20, 2021 22:59

patricklaw requested review from chrisjrn and tdyas September 20, 2021 23:00

patricklaw marked this pull request as ready for review September 20, 2021 23:01

patricklaw changed the title ~~Add javaparser based import and package inference for Java source~~ Add Java dependency analysis types and launcher using javalauncher library. Sep 20, 2021

patricklaw force-pushed the java_dep_inf branch from b8c2f6a to bfdd658 Compare September 20, 2021 23:33

patricklaw changed the title ~~Add Java dependency analysis types and launcher using javalauncher library.~~ Add Java dependency analysis types and launcher using javaparser library. Sep 20, 2021

patricklaw added the backend: JVM JVM backend-related issues label Sep 20, 2021

patricklaw requested a review from stuhood September 20, 2021 23:35

tdyas approved these changes Sep 21, 2021

View reviewed changes

patricklaw added 3 commits September 20, 2021 21:03

Add Java dependency analysis types and launcher using javalauncher li…

3703000

…brary. # Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

Change FallibleJavaSourceDependencyAnalysisResult to not encapsulate …

2722e6b

…analysis (to satisfy MyPy) # Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

Address PR review feedback

14d5126

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

patricklaw force-pushed the java_dep_inf branch from c6a49a0 to 14d5126 Compare September 21, 2021 01:03

patricklaw enabled auto-merge (squash) September 21, 2021 01:04

tdyas approved these changes Sep 21, 2021

View reviewed changes

patricklaw merged commit 9c12fc6 into pantsbuild:main Sep 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Java dependency analysis types and launcher using javaparser library. #12890

Add Java dependency analysis types and launcher using javaparser library. #12890

patricklaw commented Sep 15, 2021 •

edited

Loading

patricklaw commented Sep 15, 2021

Eric-Arellano commented Sep 15, 2021

stuhood left a comment •

edited

Loading

stuhood Sep 15, 2021

stuhood Sep 15, 2021

stuhood Sep 15, 2021

tdyas Sep 21, 2021

stuhood Sep 15, 2021

patricklaw Sep 17, 2021

stuhood Sep 20, 2021

patricklaw commented Sep 19, 2021

Eric-Arellano commented Sep 20, 2021

tdyas commented Sep 20, 2021

tdyas left a comment

tdyas Sep 21, 2021

stuhood commented Sep 21, 2021 •

edited

Loading

		from pants.engine.rules import Get, collect_rules, rule
		from pants.engine.target import Sources



		@rule
		async def parse_java_package(request: ParseJavaPackageRequest) -> ParsedJavaPackage:

Add Java dependency analysis types and launcher using javaparser library. #12890

Add Java dependency analysis types and launcher using javaparser library. #12890

Conversation

patricklaw commented Sep 15, 2021 • edited Loading

patricklaw commented Sep 15, 2021

Eric-Arellano commented Sep 15, 2021

stuhood left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patricklaw commented Sep 19, 2021

Eric-Arellano commented Sep 20, 2021

tdyas commented Sep 20, 2021

tdyas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuhood commented Sep 21, 2021 • edited Loading

patricklaw commented Sep 15, 2021 •

edited

Loading

stuhood left a comment •

edited

Loading

stuhood commented Sep 21, 2021 •

edited

Loading