Global tables should be initialised data, not populated at run time #126

adrianmay · 2025-03-10T10:46:38Z

"Initialised data" is what you get in C if you write "int a=1;" at global scope. It's as much a part of the executable or library image as the machine code and there's no need to populate it at run time.

fec contains some global constant tables with values that are identical for all users. These are populated by "initialize". If these were initialised data instead, there'd be no "initialize" function and no need to check whether or not it had been called.

This would also provide the overwhelming advantage that the whole zfec library would be pure (i.e. not need IO in Haskell). An initialize function (even if called internally) forces the whole interface to return IO to tell Haskell to call it. It's clear that people want this because the current code achieves it with unsafePerformIO, but this causes a bug at present where Haskell doesn't know that the other functions only work after calling initialize. #84 knew this and advocated IO on the interface, as does my present draft MR, but I think the approach I describe here would be best.

The tables are about 65k, which is not large, but even if they were larger, it would be hard to argue that RAM is at less of a premium than disk, and having it on disk allows the RAM to load it on demand and discard it when pressure on RAM is high.

This would be accomplished by running the logic of "initialize" in isolation, printing the results in C syntax, and using that as a source file. For bonus points, this procedure could be run as a pre-build step, but personally I'd be just as happy to generate this source file once and check it into source control (also retaining the code that generated it).

hacklschorsch · 2025-03-11T10:43:15Z

This would also fix #105 if I am not mistaken?

meejah · 2025-03-13T01:07:19Z

Today @shapr and I spent a little bit of pairing time thinking about this. We agree "in general" that static data is better for this case than runtime-generated tables -- 17 years ago the answer may indeed have been the implemented solution.

It's not immediately obvious to me that there is "a table" (versus one for every n, k combination). Is this indeed just a single table, or does it depend on the N and K values?

Also, the whole concept of a static table smells like premature optimization to us -- do we need tables at all here? Can we just "do the math directly" (and let the optimizer do its thing)? Is there enough data here to blow out L1 or even L2 caches, such that it's actually counter-productive? (From some quick searching it seems that Haswell has 32KiB L1 caches, although it's not immediately obvious if that means "per die" or "per core".

adrianmay · 2025-03-13T08:01:36Z

The tables that depend on n and k will continue to be built when fec is called. This ticket pertains to the tables that currently get set up by fec_init which takes no parameters.

I agree that flooding the cache just for the sake of multiplication is probably bad for performance. But is it just a simple multiplication? I haven't looked in detail, but there must be some kind of modulo arithmetic going on. I'll look more closely and comment again.

adrianmay · 2025-03-13T08:57:04Z

It's definitely not a simple multiplication table. For instance, the 3s row goes 0,3,6,5,12,15,10,9...

It's symmetrical about the diagonal so we could in principle halve the size of it, but it would be a pain.

BTW, there are three smaller tables as well as the 256*256 byte "multiplication" table.

So shall I proceed?

meejah · 2025-03-13T15:03:56Z

Yeah, it's not simple and we didn't spend enough to time understand precisely what it's even doing ... however, without performance tests proving that this is better, my instinct is still to "let the optimizer optimize".

Of course, you're not really proposing that here (right?) just to change when/how the table is populated.

I've not really delved into this library much before, so I'm not sure how much pain this relieves or doesn't ;) so probably not the best person (at least without spending more time) to answer what the best use of time here is

exarkun · 2025-03-13T15:32:42Z

This isn't proof one way or another but the current implementation closely follows https://dl.acm.org/doi/10.1145/263876.263881 which has performance and efficiency considerations. The paper is quite old so it's entirely possible that the tradeoffs it considers are not appropriate to hardware currently in use.

However, another thing pointed out there is that the matrix is indeed not "simple multiplication". I very much doubt that zfec would perform acceptable without it since every single byte of input is transformed by some element of this matrix and and that transformation would involve many times more operations without the pre-computed matrix.

The question of when exactly the matrix is built seems worth considering but it seems to me that it does need to be built somewhere prior to an actual encode / decode operation.

adrianmay · 2025-03-13T16:11:36Z

My motivation isn't primarily performance. It's to alleviate the need to call initialize. It seems like tighter programming to eliminate the uninitialised state altogether. Haskell has is own reason to care about this because with initialise the whole interface has to be in IO to avoid laziness bugs, but without initialize it could all safely be pure.

As JP says, there has to be a table, and it has to be in RAM when it's in use. After the change I'm proposing, it will always be on disk so it only needs to be loaded on demand and can be discarded if it hasn't been used for a while. But if it's been calculated at run time, it has to be preserved in RAM all the time. That's one performance implication. The other is that calculating it at run time wastes time. I can't see a performance disadvantage.

BTW, it took me only an hour to implement this on the train this morning.

meejah · 2025-03-13T16:17:31Z

I certainly have no objections to making the table static data vs. runtime generated.

adrianmay mentioned this issue Mar 10, 2025

Declare exported functions as IO, call initialize internally instead of exporting it. #118

Closed

adrianmay mentioned this issue Mar 17, 2025

Initialise tables instead of calculating them at run time #128

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Global tables should be initialised data, not populated at run time #126

Global tables should be initialised data, not populated at run time #126

adrianmay commented Mar 10, 2025 •

edited

Loading

hacklschorsch commented Mar 11, 2025

meejah commented Mar 13, 2025

adrianmay commented Mar 13, 2025

adrianmay commented Mar 13, 2025

meejah commented Mar 13, 2025

exarkun commented Mar 13, 2025

adrianmay commented Mar 13, 2025

meejah commented Mar 13, 2025

Global tables should be initialised data, not populated at run time #126

Global tables should be initialised data, not populated at run time #126

Comments

adrianmay commented Mar 10, 2025 • edited Loading

hacklschorsch commented Mar 11, 2025

meejah commented Mar 13, 2025

adrianmay commented Mar 13, 2025

adrianmay commented Mar 13, 2025

meejah commented Mar 13, 2025

exarkun commented Mar 13, 2025

adrianmay commented Mar 13, 2025

meejah commented Mar 13, 2025

adrianmay commented Mar 10, 2025 •

edited

Loading