-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
name/title system #430
Comments
It seems very useful and rather consistent. The use of names seems Pythonic. My only complaint is use of title instead of label. I would prefer label. Some thoughts: why not go one step further and allow an axis named x to be accessed as h.axes.x? |
I'm happy with either, I just picked up "title" from ROOT. We've already had one person (at least) comment that
Sure, that would be friendly/reasonable; that's how Pandas and Numpy's record arrays work. We can even add |
I was looking over the talk video and I literally called it |
Ok label it is :) Apart from being closer to matplotlib, I think it also makes more sense word-wise. Things have labels. Books have titles. Edit: I know these words are synonymous, I am referring to typical use. |
I thought some more about this, and then changed my mind. This feature is perfect for hist, but shouldn't be backported to boost-histogram. It doesn't really fit in, because boost-histogram does not assign any meaning to metadata and that should stay this way. |
It's up to you, but just to be clear - metadata would be completely untouched from Python. The C++ struct would be touched, but that's an internal detail that is not visible to Python. From Python, there would be a metadata slot that acts exacty as it does now, and two new slots, a label and a name. You are free to use strings in metadata for names, and nothing changes. You don't have to use If we don't have this, then we lose a very pythonic named axis system - all downstream libraries will have to reinvent this like hist does if they want it. It would be very nice to have And we lose the ability to have a universal "Histogram" API that can be produced by Uproot4, modified in boost-histogram, and then plotted in mplhep, since Hist has to assign meaning to the metadata if it wants to put a name and title in, since Axis only have the single metadata slot available. All plotting libraries, like mplhep and histoprint, would have to check to see if there was a "label" attribute (Hist), then check to see if Unlike normal Python libraries, subclasses are not allowed to add arbitrary Python properties, since they don't get passed through manipulations in C++; this forces metadata in Hist to not be the same structure as in boost-histogram. Currently, the conversion I (briefly) showed between Hist and boost-histogram doesn't work if you assign anything that's not a dict to boost-histogram's metadata. There's no reason to make a rushed decision - after the last couple of weeks, I'm taking time off working on histograms until at least Tuesday. In fact, I tried not to open my computer today. Also, if you don't want both PS: The problem mentioned above - that we can't make and pass through arbitrary members, is one we've discussed before - |
Actually, I think our discussion in #400 was too restrictive - we were considering replacing Then I would make the case, completely unrelated to I would then separately make the case that having the |
I understand all this, but boost-histogram should stay close to the metal. It's the library where we have a bare metal histogram without convenience features, with a very small code base. It is the library where we innovate to make histogramming faster and add new exciting accumulators. I am ok with downstream libraries re-inventing more convenient ways to access axis. Your proposal is not only about that, you also want to enable keyword-based filling. All this is useful, but goes against this bare-metal idea. If other libraries want these conveniences, they can just build on hist instead of boost-histogram. |
Boost-histogram was originally conceived as a thin wrapper over C++, so that it is easy to innovate on the C++ side and then add according features to Python. The more complex the Python wrapper is, the harder this is. Regarding metadata and |
The metadata class in C++ can handle the Python dict. The axis constructor in Python could pass all keywords not recognized to the dict, so writing |
The problem with that design of course is that it is open for conflicts between fields that already exist on the axis or histogram, which was why funneling everything through the metadata field is the cleaner solution. |
It is fine if boost-histogram doesn't have an explicit metadata field. The metadata is anyway optional in C++. Its whole purpose is to allow adding runtime metadata to an axis. We changed many other things to make it more Pythonic. The natural way in Python to add arbitrary metadata is the |
boost-histogram's scope is to be the best filling and manipulation library for histograms in Python. Hist's scope is to add plotting, shortcuts, simple access to other libraries, dependencies etc. Of all the current and planned features for Hist,
This is non-trivial, and Hist has to integrate deeply with boost-histogram, since it has to modify the internal caching system to make NamedAxesTuple the item that gets cached instead of AxesTuple. It would be much easier to support, and would be much simpler code, if boost-histogram had the idea of names built-in. It should be telling that I had to add this and it caused a PR or two in boost-histogram, while @LovelyBuggies was able to do the other parts. I'm perfectly fine with dropping name-based filling from boost-histogram, actually, and making that Hist only - that's a tiny bit magical, and that's easy for a downstream library to add. It's using names in the three places where integers currently already can be used (AxesTuple, projection, and dict indexing) that would be nice to upstream. |
By the way, having a way to transmit other information besides the metadata through transforms would solve the biggest problem - and we could remove label/name from the PlottableHistogram - the official spec would lose the ability to attach labels to histograms in a uniform way, but it could be added optionally for some libraries/situations. I think it is not optimal, but I could live with it. I think name is a Pythonic addition that is not natural in C++ - in Python, you have inspection, and class Also, name should be a read only property - allowing it to be changed after an axis is created is not ideal/more error prone. I'm not a fan of allowing arbitrary keywords in the constructor - misspelling keywords like
Hist is not designed to be built upon; boost-histogram is. Hist is designed to be an end-user library, and hopefully one of several - I'd like to see if Physt could be built on boost-histogram too, for example. If sata acquisition/online picks up our tools, they should use boost-histogram, not Hist. I think we should still make boost-histogram the best it can be, rather than intentionally complicating Hist and making boost-histogram a sub-optimal library that only Hist can use. Hist is supposed to help boost-histogram, not limit/damage it. |
PS: I do understand that if someone comes to boost-histogram, sees names, then goes to Boost.Histogram, they may wonder why it doesn't have names. I really don't think that will be a large number of people, I think the answer is easy, names and labels are part of the Histogram ecosystem in Python - there isn't one for C++, there isn't a Boost.Plot, and it's a language where looping yourself is encouraged. Two of the three uses (dict UHI and AxesTuple) don't exist in Boost.Histogram. I'm focused on making the best possible histogram library for Python - I think the number of people wanting a great histogram library for Python outnumbers the number of people wanting a perfect clone of a C++ library, and I think there will always be a difference between a compiled language and a very different interpreted language. But I do understand if you really don't want to have this difference. |
By the way, "h.name" will not be 1:1 when converting to Hist - If you did want to allow just |
No, boost-histogram's scope is to bring the power and flexibility of Boost.Histogram in C++ to Python with a minimal API. |
Dynamically generating keywords in .fill and attributes to AxesTuple based on a name keyword to the axis is not Pythonic. Tuples are accesses by index and not keys and vice versa for dicts. Your enhanced AxisTuple is neither a tuple nor a dict. It is not Pythonic. I will not accept that in boost-histogram. |
I'd like to propose upstreaming @LovelyBuggies fantastic title/name system from Hist to boost-histogram. There are several reasons for this:
mplhep.plothist(uproot.to_boost())
would show titles too, like Hist, and @jpivarski probably could add the same api directly to TH* objects in Uproot4)Quick example of usage:
The proposed design, open for discussion:
name
parameter.label
parameter@HDembinski, what do you think?
The text was updated successfully, but these errors were encountered: