-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-15187: [Java] Made reader
initialization lazy and added new getTransferPair()
function that takes in a Field
type
#34424
Conversation
…ion that takes in a Field type
|
@@ -71,7 +73,11 @@ public BigIntVector(String name, FieldType fieldType, BufferAllocator allocator) | |||
*/ | |||
public BigIntVector(Field field, BufferAllocator allocator) { | |||
super(field, allocator, TYPE_WIDTH); | |||
reader = new BigIntReaderImpl(BigIntVector.this); | |||
reader = () -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not thread safe, but I am not sure if adding the Google core libraries dependency to get Suppliers::memoize
is allowed. I don't know of other ways to make this thread safe without adding any throws
clause.
I tried adding apache.commons.lang3#LazyInitializer
but that I think requires that I add throws ConcurrentException
to the getReader
function, doesn't seem that great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to liberate the necessary code from either of these libraries? I'm not sure how much baggage they would drag with them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this rtadepalli. Instead of the above logic, should we add a new constructor with a new boolean parameter createReader. We create the reader based on the boolean flag. And in the getReader if the reader is null, we throw an exception. Callers who dont need the reader can call the new constructor, and if they try accessing the reader then they get an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prashanthbdremio I think adding a new constructor would not be backwards compatible, unless we just default the boolean to true
in the BigIntVector(field, allocator);
constructor (and constructors in other classes). If we do this, I guess we're not really changing much, since the reader will be eagerly created anyways. If you think that is fine, I can make the change.
@lwhite1 I have made some changes to the way the reader is being initialized -- please take another look. Thanks.
Right now, this only covers derived classes for |
reader
initialization lazy and added new getTransferPair() function that takes in a Field
typereader
initialization lazy and added new getTransferPair()
function that takes in a Field
type
|
@lwhite1 or @davisusanibar are you able to take a first look at this? |
Ah, I forgot to reply. I think this is OK. We can make a follow up for the rest of the vector classes. |
Closing because it has been untouched for a while, in case it's still relevant feel free to reopen and move it forward 👍 |
I am not sure I understand -- there are other PRs open against |
@rtadepalli sorry about that, some contributors decided to perform some spring cleaning and a few PRs got caught up in that. I reopened the PR. |
@rtadepalli As you are planning to continue working on this one I converted it to a Draft until you consider it complete. Feel free to tag people when you think it's ready for further review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Covered both fixed and variable length vectors here. cc @lidavidm.
@@ -71,7 +73,11 @@ public BigIntVector(String name, FieldType fieldType, BufferAllocator allocator) | |||
*/ | |||
public BigIntVector(Field field, BufferAllocator allocator) { | |||
super(field, allocator, TYPE_WIDTH); | |||
reader = new BigIntReaderImpl(BigIntVector.this); | |||
reader = () -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prashanthbdremio I think adding a new constructor would not be backwards compatible, unless we just default the boolean to true
in the BigIntVector(field, allocator);
constructor (and constructors in other classes). If we do this, I guess we're not really changing much, since the reader will be eagerly created anyways. If you think that is fine, I can make the change.
@lwhite1 I have made some changes to the way the reader is being initialized -- please take another look. Thanks.
@@ -130,6 +130,12 @@ public TransferPair makeTransferPair(ValueVector target) { | |||
return underlyingVector.makeTransferPair(target); | |||
} | |||
|
|||
@Override | |||
protected Class<? extends FieldReader> getReaderImplClass() { | |||
throw new UnsupportedOperationException("Readers for extension types depend on the underlying vector, " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throwing an exception here because getReader
is delegating to the underlying vector, which to my understanding would always be a subclass of BaseValueVector
. This means that there'd be a concrete implementation of getReaderImplClass
that'd be available, so delegating to the getReader
method of the underlying vector is fine and there's no need of an implementation here.
Please let me know if this is not the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subclasses of ExtensionTypeVector should override this if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be fine I think -- should I add a comment explicitly saying that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, on second thought: maybe remove the override for getReader here, and put the delegation here instead? That way, users can override this and get the expected behavior. Otherwise they have to override both methods which is a bit of a mess.
if (fieldReader == null) { | ||
try { | ||
fieldReader = | ||
(FieldReader) Class.forName(getReaderImplClass().getName()).getConstructor(getClass()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getReaderImplClass already returns a Class, why do we have to look it up again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: why can't getReaderImplClass return a factory function instead of a class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please clarify what you mean by return a factory function? I can change the usage to getReaderImplClass().getInstance
if that is preferable. I just used Class.forName
since I am more familiar with that option is all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't these all be Function<FieldVector, FieldReader>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well returning a FieldReader
instance would mean that I'd need to construct it. I want to do that in the synchronized
block so I am returning the class type to construct an instance out of.
Out of curiosity, what is the advantage of returning a Function
type over the current implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getReaderImplClass
can return a Function
which you then call.
Explicit reflection/metaprogramming isn't necessary here over just a factory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh I guess the construction would only happen when I call .apply
which would in the synchronized block. That does seem nicer...
Any chance I could clean that up in a follow up ticket? Or would you prefer I make the changes here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, looks like we commented the same thing at the same time, :jinx:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not include reflection here that isn't strictly necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized we don't need really need the ValueVector
as an argument to the Function
, so I decided to return a Supplier
instead.
|
||
/** | ||
* Default implementation to create a reader for the vector. Depends on the individual vector | ||
* class' implementation of `getReaderImpl()` to initialize the reader appropriately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be {@link #getReaderImplClass}
? JavaDoc does not use Markdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though it turns out the supplier is not necessary either (sorry)
} | ||
synchronized (this) { | ||
if (fieldReader == null) { | ||
fieldReader = getReaderImpl().get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll just point out...if you don't actually need to pass anything, this can just be getReaderInstance()
and you can directly return the constructed reader. The whole thing with the class or even the function/supplier isn't necessary to make it lazy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sigh... don't know why I didn't realize this last night. Will change.
@@ -130,6 +130,12 @@ public TransferPair makeTransferPair(ValueVector target) { | |||
return underlyingVector.makeTransferPair(target); | |||
} | |||
|
|||
@Override | |||
protected Class<? extends FieldReader> getReaderImplClass() { | |||
throw new UnsupportedOperationException("Readers for extension types depend on the underlying vector, " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subclasses of ExtensionTypeVector should override this if needed.
@lidavidm ready for another look, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! And sorry for all the back-and-forth.
FYI, if it's ready, can you take this out of draft mode? |
Oh I thought I didn't have permissions to take it out of draft mode but then I realized that I was looking at this PR while logged out on my phone. Done 👍 |
Conbench analyzed the 6 benchmark runs on commit There were 4 benchmark results indicating a performance regression:
The full Conbench report has more details. |
…w `getTransferPair()` function that takes in a `Field` type (apache#34424) This PR closes apache#15187. `FieldReader` is being allocated directly in the constructor today, and this PR changes it such that the initialization becomes lazy. Additionally, a new function `getTransferPair(Field, Allocator)` is introduced so that a new `Field` method is not constructed each time `getTransferPair` is called on the Vector. 1. Introduce a new `getTransferPair` method. 2. Make initializing `FieldReader` lazy. Yes, some tests have been added to verify these changes. I am not 100% sure if there are any user facing changes. There should not be any breaking changes. * Closes: apache#15187 Authored-by: Ramasai <ramasai.tadepalli+3108@gmail.com> Signed-off-by: David Li <li.davidm96@gmail.com>
* apacheGH-15187: [Java] Made `reader` initialization lazy and added new `getTransferPair()` function that takes in a `Field` type (apache#34424) This PR closes apache#15187. `FieldReader` is being allocated directly in the constructor today, and this PR changes it such that the initialization becomes lazy. Additionally, a new function `getTransferPair(Field, Allocator)` is introduced so that a new `Field` method is not constructed each time `getTransferPair` is called on the Vector. 1. Introduce a new `getTransferPair` method. 2. Make initializing `FieldReader` lazy. Yes, some tests have been added to verify these changes. I am not 100% sure if there are any user facing changes. There should not be any breaking changes. * Closes: apache#15187 Authored-by: Ramasai <ramasai.tadepalli+3108@gmail.com> Signed-off-by: David Li <li.davidm96@gmail.com> * Fix import * Fix imports --------- Signed-off-by: David Li <li.davidm96@gmail.com> Co-authored-by: rtadepalli <105760760+rtadepalli@users.noreply.github.com>
Rationale for this change
This PR closes #15187.
FieldReader
is being allocated directly in the constructor today, and this PR changes it such that the initialization becomes lazy. Additionally, a new functiongetTransferPair(Field, Allocator)
is introduced so that a newField
method is not constructed each timegetTransferPair
is called on the Vector.What changes are included in this PR?
getTransferPair
method.FieldReader
lazy.Are these changes tested?
Yes, some tests have been added to verify these changes.
Are there any user-facing changes?
I am not 100% sure if there are any user facing changes.
There should not be any breaking changes.