-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Global tables should be initialised data, not populated at run time #126
Comments
This would also fix #105 if I am not mistaken? |
Today @shapr and I spent a little bit of pairing time thinking about this. We agree "in general" that static data is better for this case than runtime-generated tables -- 17 years ago the answer may indeed have been the implemented solution. It's not immediately obvious to me that there is "a table" (versus one for every Also, the whole concept of a static table smells like premature optimization to us -- do we need tables at all here? Can we just "do the math directly" (and let the optimizer do its thing)? Is there enough data here to blow out L1 or even L2 caches, such that it's actually counter-productive? (From some quick searching it seems that Haswell has 32KiB L1 caches, although it's not immediately obvious if that means "per die" or "per core". |
The tables that depend on n and k will continue to be built when fec is called. This ticket pertains to the tables that currently get set up by fec_init which takes no parameters. I agree that flooding the cache just for the sake of multiplication is probably bad for performance. But is it just a simple multiplication? I haven't looked in detail, but there must be some kind of modulo arithmetic going on. I'll look more closely and comment again. |
It's definitely not a simple multiplication table. For instance, the 3s row goes 0,3,6,5,12,15,10,9... It's symmetrical about the diagonal so we could in principle halve the size of it, but it would be a pain. BTW, there are three smaller tables as well as the 256*256 byte "multiplication" table. So shall I proceed? |
Yeah, it's not simple and we didn't spend enough to time understand precisely what it's even doing ... however, without performance tests proving that this is better, my instinct is still to "let the optimizer optimize". Of course, you're not really proposing that here (right?) just to change when/how the table is populated. I've not really delved into this library much before, so I'm not sure how much pain this relieves or doesn't ;) so probably not the best person (at least without spending more time) to answer what the best use of time here is |
This isn't proof one way or another but the current implementation closely follows https://dl.acm.org/doi/10.1145/263876.263881 which has performance and efficiency considerations. The paper is quite old so it's entirely possible that the tradeoffs it considers are not appropriate to hardware currently in use. However, another thing pointed out there is that the matrix is indeed not "simple multiplication". I very much doubt that zfec would perform acceptable without it since every single byte of input is transformed by some element of this matrix and and that transformation would involve many times more operations without the pre-computed matrix. The question of when exactly the matrix is built seems worth considering but it seems to me that it does need to be built somewhere prior to an actual encode / decode operation. |
My motivation isn't primarily performance. It's to alleviate the need to call As JP says, there has to be a table, and it has to be in RAM when it's in use. After the change I'm proposing, it will always be on disk so it only needs to be loaded on demand and can be discarded if it hasn't been used for a while. But if it's been calculated at run time, it has to be preserved in RAM all the time. That's one performance implication. The other is that calculating it at run time wastes time. I can't see a performance disadvantage. BTW, it took me only an hour to implement this on the train this morning. |
I certainly have no objections to making the table static data vs. runtime generated. |
"Initialised data" is what you get in C if you write "int a=1;" at global scope. It's as much a part of the executable or library image as the machine code and there's no need to populate it at run time.
fec contains some global constant tables with values that are identical for all users. These are populated by "initialize". If these were initialised data instead, there'd be no "initialize" function and no need to check whether or not it had been called.
This would also provide the overwhelming advantage that the whole zfec library would be pure (i.e. not need IO in Haskell). An initialize function (even if called internally) forces the whole interface to return IO to tell Haskell to call it. It's clear that people want this because the current code achieves it with unsafePerformIO, but this causes a bug at present where Haskell doesn't know that the other functions only work after calling initialize. #84 knew this and advocated IO on the interface, as does my present draft MR, but I think the approach I describe here would be best.
The tables are about 65k, which is not large, but even if they were larger, it would be hard to argue that RAM is at less of a premium than disk, and having it on disk allows the RAM to load it on demand and discard it when pressure on RAM is high.
This would be accomplished by running the logic of "initialize" in isolation, printing the results in C syntax, and using that as a source file. For bonus points, this procedure could be run as a pre-build step, but personally I'd be just as happy to generate this source file once and check it into source control (also retaining the code that generated it).
The text was updated successfully, but these errors were encountered: