Experimental full buffer in SSD1309 #31

dbuezas · 2023-11-22T21:45:23Z

This draft PR is a working proof of concept that improves screen refresh rate.
The impact is most relevant when combined with the recent PR that adds SW I2C support #26433 to Marlin.

Working principle

Accumulates all pages in a new buffer (~~8kB~~ 1kB of RAM)
Sends all at once once the last page is received.
Keeps track of which didn't change and skips them
within the changed pages, skips unchanged columns

This is a hack

~~This is a draft proof of concept because the correct way would be to handle the full buffer directly:~~

~~Increasing the PAGE_HEIGHT~~
~~Figuring out the data layout inside the u8glib buffer~~
~~Finding how to setup the screen to take the buffer in the same layout~~

Update: Not a hack in the end, this is also the way u8glib handles double page buffers in e.g SSD1306

Video comparisons

Make sure to enable sound, the audio feedback I added on each line change makes the difference easier to appreciate.

Original

SW I2C with an SSD1309 display

PXL_20231122_211124172.TS.mp4

This PR:

Same but with the full buffer and sending only changed pages

PXL_20231122_211306775.TS.mp4

Addendum

Extra tweaks for max performance

Max clock speed

Original SW I2C but passing 1_000_000 as clock speed
SoftWire still adds delayMicroseconds(1)
Measured 138kHz clock

PXL_20231122_211537073.TS.mp4

No delays

Original but removing all delayMicroseconds calls completely from the SoftWire library
205kHz

PXL_20231122_211706529.TS.mp4

ALL TOGETHER

Full buffer sending only changed pages
Removed all delayMicroseconds calls
Use FASTIO instead of digitalWrite
Use platform specific port direction setup (instead of Arduino's pinMode)
Measured ~~240kHz~~ clock speed Update: now 260kHz

PXL_20231122_211922545.TS.mp4

Test setup

Ultimaker 2
SKR3 board
Running a custom branch that includes these two open PRs: 26441, 26433 and some non-relevant tweaks like adding sound on menu item changes.

dbuezas · 2023-11-23T14:07:23Z

This is the OPTIONAL modified SoftWire.cpp to achieve 260kHz clock with software i2c: https://gist.github.com/dbuezas/e4a7b8afddcee868a09bf014034fc37d

The patch in this experimental PR is not specific to SW i2c, but it is where it is most needed.

thinkyhead · 2023-11-25T03:57:19Z

Seems like a fine option when the board has enough SRAM. The fewer stripes we need to draw, the better. We already avoid drawing elements that only exist within a single stripe more than once, but overlapping items do get drawn two or more times, with clipping. The most important consideration is that each call to the display update handler should take as little time as possible before returning so that the main loop gets processed and the planner buffer remains as full as possible. To the extent that this shortens the time spent in the display update routine it's a winner.

dbuezas · 2023-11-25T08:13:57Z

If the screen was a single page, then even the double drawing could be avoided. I'll try this a bit more, but I think u8glib expects a page column to fit in one byte.
As you note, this as is should already save lots of cpu time waiting for the i2c bus.

By the way, if you merge this, the we should add a flag in Marlin to switch between normal and full page modes.

thinkyhead · 2023-11-25T19:03:46Z

This change will make a Melzi with SSD1309 a better choice, since some of those boards have 16K of SRAM, most of which ends up unused. (The "compact bootscreen" option may eat up 1K of that.)

Accumulates all pages in a new buffer (8kB of RAM)

Is this a B/W screen with 128x64? That should only be 128 x 64 ÷ 8 == 1KB. Or is this screen using one byte per pixel even though it's only displaying one color? If that is the case, I wonder if the display has some special one-bit color mode for situations like this?

dbuezas · 2023-11-25T19:50:27Z

Oh, you are right it is 1kB, I counted each bit as a byte.

Note that this doesn't only not send unchanged pages, but it also sends all changed pages at once. This makes the screen a lot faster even when the amount of data transferred is the same. My hypothesis is that this is because the screen doesn't re-paint after each page, which may also be making the mcu wait between paints.

dbuezas · 2023-11-25T21:58:09Z

We could also get silly and send only the changed columns within each page. I took a look at the ssd1309 datasheet, and started to keep track of the first and last changed bit of each page.

Sending this to the lcd works fine:

u8g_WriteByte(u8g, dev, 0x000 | (start & 0b1111)); /* col start low */
u8g_WriteByte(u8g, dev, 0x010 | (start >> 4)); /* col start high */
[...]
u8g_WriteSequence(u8g, dev, end-start, full_buffer[page]+start);

dbuezas · 2023-11-25T22:35:02Z

It's even faster. Should I update this PR? It uses another 17 bytes:

1 for the column with the first changes, times 8 (pages)
1 for the ends, times 8
1 for the first render (send full page on first render, ram of display is uninitialised)

thinkyhead · 2023-11-25T23:19:05Z

Should I update this PR?

Sure, go ahead. It seems like a pretty decent extra optimization for most cases. It could slow things down a little in cases where most of the screen is changed, but not enough to negate the other optimizations.

dbuezas · 2023-11-26T10:39:36Z

Here's a test visualisation. The colors are inverted in all reused parts of the screen.

PXL_20231126_103322524.TS.mp4

Note: The test is actually slow because I'm rendering the screen twice, it just shows proves everything works as intended.

dbuezas · 2023-11-26T10:44:06Z

To test try it, change add _F in marlinui.DOGM

// #define U8G_CLASS U8GLIB_SSD1309_128X64
   #define U8G_CLASS U8GLIB_SSD1309_128X64_F

dbuezas · 2023-11-26T10:48:42Z

Sorry I just accidentally forced push on top of your changes, I'll fix that

update: fixed that

thinkyhead · 2023-11-26T21:14:54Z

I've made the minor adjustment to cache the number of columns instead of the end column + 1.

So the only thing to do now is to run some print tests to make sure this can't lead to planner starvation if, for example, the G-code arriving on the USB/Serial port contains many small segments and the screen is being frequently updated by navigating menus, scrolling, and generally causing the contents of the LCD to change while the print is underway. A "rocket vase" sliced without G2/G3 and with small segments (e.g., 0.25mm) should provide the kind of conditions that contend with LCD updates.

dbuezas · 2023-11-26T22:53:44Z

Good idea storing width instead of end.

Can you do that test? With software i2c and my beefy skr3 I don't think my setup would mean much. Plus I'm fairly new to this so I'm not quite sure I can confidently say if there is starvation or not.

My bet is that this PR makes it harder to happen, since there's less total transmission, and no waiting for the screen to finish partial updates anymore.
Btw, my running assumption is that full buffer updates are faster (even when sending all pages and columns) because without it the lcd takes time to render each page independently and makes the mcu wait after each page, do you know if that's true?

dbuezas · 2023-12-01T23:50:44Z

Let me know if there's something I can help with

dbuezas · 2023-12-04T11:13:04Z

I lowered LCD_UPDATE_INTERVAL from 100 to 10 to make an experiment and OH BOY, it feels SO responsive!
This is 100fps, which is an overkill, but going from 10 to 30 sounds reasonable to me and it makes quite a difference in how the UI feels.

I also noted no issues during prints, I'll make a small PR to marlin to set LCD_UPDATE_INTERVAL in configuration_adv.h

This is software I2C with all the modifications mentioned in this PR.

Send buffer at once instead of page by page
Send only changed pages
Send only changed columns
Modified SoftWire library to remove all clock delays and use platform specific pin handling

PXL_20231204_110104920.TS.1.mp4

thinkyhead · 2023-12-05T00:29:39Z

Thanks for the extra feedback and trying out the different refresh rate. I've been busy chasing down other things and preparing patches for older Marlin branches to keep up with platform changes.

It's a good idea to get that balance of refresh and responsiveness. Although there is a single call to the LCD update routine at high frequency, it's mostly to check the encoder, and will only update the display if the flag has been set to indicate that there's something new to draw. So as long as we call the routine frequently enough to get reliable encoder feedback we're golden. Maybe in the future we'll further decouple the encoder from the display so we can check it at an independent frequency, and we'll see if it helps with responsiveness.

If this is reasonably stable by your estimation then I'm happy to go ahead and merge it forthwith.

dbuezas · 2023-12-05T08:11:04Z

TL; DR it is stable, let's merge!

Oh yes, I know, your work is very visible :)

The modifications in this PR have been rock solid for me. On top of that, it is only active if the _F variant is selected, so it shouldn't affect anyone yet.

Once merged and released I'll add a flag to marlin to select it (or we could make it default if we're confident it is always better than by page)

Regarding encoder updates, I saw that there is a call to ui.update_buttons in temperature.cpp, so it seems to me like encoders are read at a faster rate (500hz). I actually changed it to run at 1khz as I have a snappy board.

I do still get some encoder jumps every so often, I assume due missed steps since they aren't read by pin change interrupts. TBH, the encoder felt more precise in Marlin 1.x on an atmega, is there any specific reason for them to be read at an interval instead of using pin change interrupts?

Thank you and congrats for your work in Marlin, I get a bit lost with the macros, but it is an amazing code base.

dbuezas · 2023-12-26T01:35:50Z

@thinkyhead in case this was forgotten, it is ready to be merged

thinkyhead · 2023-12-26T01:37:10Z

Thanks for the heads-up! I'll get to that shortly.

thisiskeithb · 2023-12-27T14:55:39Z

This breaks compiling for several LCDs (issue with Mini 12864 was reported on Discord):

dbuezas marked this pull request as draft November 22, 2023 21:45

Experimental full buffer in SSD1309

5cf8e08

dbuezas force-pushed the dbuezas/experiment-full-buffer branch from 6f84341 to 5cf8e08 Compare November 22, 2023 21:48

dbuezas mentioned this pull request Nov 22, 2023

STM32 I2C two wire LCD - bug fix and soft I2C driver implementation MarlinFirmware/Marlin#26433

Open

Fix page change detection

91be62c

thinkyhead added the enhancement label Nov 25, 2023

thinkyhead marked this pull request as ready for review November 25, 2023 19:04

misc. style, extra page flags

e364de8

dbuezas force-pushed the dbuezas/experiment-full-buffer branch from e364de8 to 8632522 Compare November 26, 2023 10:47

Only send changed columns

37feb35

dbuezas force-pushed the dbuezas/experiment-full-buffer branch from 8632522 to 37feb35 Compare November 26, 2023 11:03

Only update changed columns in buffer too

d3414ef

dbuezas mentioned this pull request Nov 26, 2023

[BUG] DISPLAY_SLEEP_MINUTES makes the UI sluggish MarlinFirmware/Marlin#26467

Closed

1 task

thinkyhead added 2 commits November 26, 2023 15:06

store width instead of end

2d8caeb

rename

a29de5c

dbuezas mentioned this pull request Dec 4, 2023

Double lcd framerate option MarlinFirmware/Marlin#26500

Merged

thinkyhead merged commit c1ec1a4 into MarlinFirmware:master Dec 27, 2023

dbuezas mentioned this pull request Jan 20, 2024

Add USE_LCD_DOUBLE_BUFFERING option MarlinFirmware/Marlin#26713

Merged

dbuezas mentioned this pull request May 26, 2024

Fix menus responsiveness when HAS_MARLINUI_U8GLIB. MarlinFirmware/Marlin#26555

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental full buffer in SSD1309 #31

Experimental full buffer in SSD1309 #31

dbuezas commented Nov 22, 2023 •

edited

Loading

dbuezas commented Nov 23, 2023 •

edited

Loading

thinkyhead commented Nov 25, 2023

dbuezas commented Nov 25, 2023

thinkyhead commented Nov 25, 2023

dbuezas commented Nov 25, 2023

dbuezas commented Nov 25, 2023

dbuezas commented Nov 25, 2023

thinkyhead commented Nov 25, 2023

dbuezas commented Nov 26, 2023 •

edited

Loading

dbuezas commented Nov 26, 2023

dbuezas commented Nov 26, 2023 •

edited

Loading

thinkyhead commented Nov 26, 2023

dbuezas commented Nov 26, 2023 •

edited

Loading

dbuezas commented Dec 1, 2023

dbuezas commented Dec 4, 2023 •

edited

Loading

thinkyhead commented Dec 5, 2023

dbuezas commented Dec 5, 2023 •

edited

Loading

dbuezas commented Dec 26, 2023

thinkyhead commented Dec 26, 2023

thisiskeithb commented Dec 27, 2023 •

edited

Loading

Experimental full buffer in SSD1309 #31

Experimental full buffer in SSD1309 #31

Conversation

dbuezas commented Nov 22, 2023 • edited Loading

Working principle

This is a hack

Video comparisons

Original

This PR:

Addendum

Max clock speed

No delays

ALL TOGETHER

Test setup

dbuezas commented Nov 23, 2023 • edited Loading

thinkyhead commented Nov 25, 2023

dbuezas commented Nov 25, 2023

thinkyhead commented Nov 25, 2023

dbuezas commented Nov 25, 2023

dbuezas commented Nov 25, 2023

dbuezas commented Nov 25, 2023

thinkyhead commented Nov 25, 2023

dbuezas commented Nov 26, 2023 • edited Loading

dbuezas commented Nov 26, 2023

dbuezas commented Nov 26, 2023 • edited Loading

thinkyhead commented Nov 26, 2023

dbuezas commented Nov 26, 2023 • edited Loading

dbuezas commented Dec 1, 2023

dbuezas commented Dec 4, 2023 • edited Loading

thinkyhead commented Dec 5, 2023

dbuezas commented Dec 5, 2023 • edited Loading

dbuezas commented Dec 26, 2023

thinkyhead commented Dec 26, 2023

thisiskeithb commented Dec 27, 2023 • edited Loading

dbuezas commented Nov 22, 2023 •

edited

Loading

dbuezas commented Nov 23, 2023 •

edited

Loading

dbuezas commented Nov 26, 2023 •

edited

Loading

dbuezas commented Nov 26, 2023 •

edited

Loading

dbuezas commented Nov 26, 2023 •

edited

Loading

dbuezas commented Dec 4, 2023 •

edited

Loading

dbuezas commented Dec 5, 2023 •

edited

Loading

thisiskeithb commented Dec 27, 2023 •

edited

Loading