Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental full buffer in SSD1309 #31

Merged

Conversation

dbuezas
Copy link
Contributor

@dbuezas dbuezas commented Nov 22, 2023

This draft PR is a working proof of concept that improves screen refresh rate.
The impact is most relevant when combined with the recent PR that adds SW I2C support #26433 to Marlin.

Working principle

  • Accumulates all pages in a new buffer (8kB 1kB of RAM)
  • Sends all at once once the last page is received.
  • Keeps track of which didn't change and skips them
  • within the changed pages, skips unchanged columns

This is a hack

This is a draft proof of concept because the correct way would be to handle the full buffer directly:

  • Increasing the PAGE_HEIGHT
  • Figuring out the data layout inside the u8glib buffer
  • Finding how to setup the screen to take the buffer in the same layout

Update: Not a hack in the end, this is also the way u8glib handles double page buffers in e.g SSD1306

Video comparisons

Make sure to enable sound, the audio feedback I added on each line change makes the difference easier to appreciate.

Original

  • SW I2C with an SSD1309 display
PXL_20231122_211124172.TS.mp4

This PR:

  • Same but with the full buffer and sending only changed pages
PXL_20231122_211306775.TS.mp4


Addendum

  • Extra tweaks for max performance

Max clock speed

  • Original SW I2C but passing 1_000_000 as clock speed
  • SoftWire still adds delayMicroseconds(1)
  • Measured 138kHz clock
PXL_20231122_211537073.TS.mp4

No delays

  • Original but removing all delayMicroseconds calls completely from the SoftWire library
  • 205kHz
PXL_20231122_211706529.TS.mp4

ALL TOGETHER

  • Full buffer sending only changed pages
  • Removed all delayMicroseconds calls
  • Use FASTIO instead of digitalWrite
  • Use platform specific port direction setup (instead of Arduino's pinMode)
  • Measured 240kHz clock speed Update: now 260kHz
PXL_20231122_211922545.TS.mp4

Test setup

  • Ultimaker 2
  • SKR3 board
  • Running a custom branch that includes these two open PRs: 26441, 26433 and some non-relevant tweaks like adding sound on menu item changes.

@dbuezas dbuezas marked this pull request as draft November 22, 2023 21:45
@dbuezas
Copy link
Contributor Author

dbuezas commented Nov 23, 2023

This is the OPTIONAL modified SoftWire.cpp to achieve 260kHz clock with software i2c: https://gist.github.com/dbuezas/e4a7b8afddcee868a09bf014034fc37d

The patch in this experimental PR is not specific to SW i2c, but it is where it is most needed.

@thinkyhead
Copy link
Member

Seems like a fine option when the board has enough SRAM. The fewer stripes we need to draw, the better. We already avoid drawing elements that only exist within a single stripe more than once, but overlapping items do get drawn two or more times, with clipping. The most important consideration is that each call to the display update handler should take as little time as possible before returning so that the main loop gets processed and the planner buffer remains as full as possible. To the extent that this shortens the time spent in the display update routine it's a winner.

@dbuezas
Copy link
Contributor Author

dbuezas commented Nov 25, 2023

If the screen was a single page, then even the double drawing could be avoided. I'll try this a bit more, but I think u8glib expects a page column to fit in one byte.
As you note, this as is should already save lots of cpu time waiting for the i2c bus.

By the way, if you merge this, the we should add a flag in Marlin to switch between normal and full page modes.

@thinkyhead
Copy link
Member

This change will make a Melzi with SSD1309 a better choice, since some of those boards have 16K of SRAM, most of which ends up unused. (The "compact bootscreen" option may eat up 1K of that.)

Accumulates all pages in a new buffer (8kB of RAM)

Is this a B/W screen with 128x64? That should only be 128 x 64 ÷ 8 == 1KB. Or is this screen using one byte per pixel even though it's only displaying one color? If that is the case, I wonder if the display has some special one-bit color mode for situations like this?

@thinkyhead thinkyhead marked this pull request as ready for review November 25, 2023 19:04
@dbuezas
Copy link
Contributor Author

dbuezas commented Nov 25, 2023

Oh, you are right it is 1kB, I counted each bit as a byte.

Note that this doesn't only not send unchanged pages, but it also sends all changed pages at once. This makes the screen a lot faster even when the amount of data transferred is the same. My hypothesis is that this is because the screen doesn't re-paint after each page, which may also be making the mcu wait between paints.

@dbuezas
Copy link
Contributor Author

dbuezas commented Nov 25, 2023

We could also get silly and send only the changed columns within each page. I took a look at the ssd1309 datasheet, and started to keep track of the first and last changed bit of each page.

Sending this to the lcd works fine:

u8g_WriteByte(u8g, dev, 0x000 | (start & 0b1111)); /* col start low */
u8g_WriteByte(u8g, dev, 0x010 | (start >> 4)); /* col start high */
[...]
u8g_WriteSequence(u8g, dev, end-start, full_buffer[page]+start);

@dbuezas
Copy link
Contributor Author

dbuezas commented Nov 25, 2023

It's even faster. Should I update this PR? It uses another 17 bytes:

  • 1 for the column with the first changes, times 8 (pages)
  • 1 for the ends, times 8
  • 1 for the first render (send full page on first render, ram of display is uninitialised)

@thinkyhead
Copy link
Member

Should I update this PR?

Sure, go ahead. It seems like a pretty decent extra optimization for most cases. It could slow things down a little in cases where most of the screen is changed, but not enough to negate the other optimizations.

@dbuezas
Copy link
Contributor Author

dbuezas commented Nov 26, 2023

Here's a test visualisation. The colors are inverted in all reused parts of the screen.

PXL_20231126_103322524.TS.mp4

Note: The test is actually slow because I'm rendering the screen twice, it just shows proves everything works as intended.

@dbuezas
Copy link
Contributor Author

dbuezas commented Nov 26, 2023

To test try it, change add _F in marlinui.DOGM

// #define U8G_CLASS U8GLIB_SSD1309_128X64
   #define U8G_CLASS U8GLIB_SSD1309_128X64_F

@dbuezas dbuezas force-pushed the dbuezas/experiment-full-buffer branch from e364de8 to 8632522 Compare November 26, 2023 10:47
@dbuezas
Copy link
Contributor Author

dbuezas commented Nov 26, 2023

Sorry I just accidentally forced push on top of your changes, I'll fix that

update: fixed that

@dbuezas dbuezas force-pushed the dbuezas/experiment-full-buffer branch from 8632522 to 37feb35 Compare November 26, 2023 11:03
@thinkyhead
Copy link
Member

I've made the minor adjustment to cache the number of columns instead of the end column + 1.

So the only thing to do now is to run some print tests to make sure this can't lead to planner starvation if, for example, the G-code arriving on the USB/Serial port contains many small segments and the screen is being frequently updated by navigating menus, scrolling, and generally causing the contents of the LCD to change while the print is underway. A "rocket vase" sliced without G2/G3 and with small segments (e.g., 0.25mm) should provide the kind of conditions that contend with LCD updates.

@dbuezas
Copy link
Contributor Author

dbuezas commented Nov 26, 2023

Good idea storing width instead of end.

Can you do that test? With software i2c and my beefy skr3 I don't think my setup would mean much. Plus I'm fairly new to this so I'm not quite sure I can confidently say if there is starvation or not.

My bet is that this PR makes it harder to happen, since there's less total transmission, and no waiting for the screen to finish partial updates anymore.
Btw, my running assumption is that full buffer updates are faster (even when sending all pages and columns) because without it the lcd takes time to render each page independently and makes the mcu wait after each page, do you know if that's true?

@dbuezas
Copy link
Contributor Author

dbuezas commented Dec 1, 2023

Let me know if there's something I can help with

@dbuezas
Copy link
Contributor Author

dbuezas commented Dec 4, 2023

I lowered LCD_UPDATE_INTERVAL from 100 to 10 to make an experiment and OH BOY, it feels SO responsive!
This is 100fps, which is an overkill, but going from 10 to 30 sounds reasonable to me and it makes quite a difference in how the UI feels.

I also noted no issues during prints, I'll make a small PR to marlin to set LCD_UPDATE_INTERVAL in configuration_adv.h

This is software I2C with all the modifications mentioned in this PR.

  • Send buffer at once instead of page by page
  • Send only changed pages
  • Send only changed columns
  • Modified SoftWire library to remove all clock delays and use platform specific pin handling
PXL_20231204_110104920.TS.1.mp4

@thinkyhead
Copy link
Member

Thanks for the extra feedback and trying out the different refresh rate. I've been busy chasing down other things and preparing patches for older Marlin branches to keep up with platform changes.

It's a good idea to get that balance of refresh and responsiveness. Although there is a single call to the LCD update routine at high frequency, it's mostly to check the encoder, and will only update the display if the flag has been set to indicate that there's something new to draw. So as long as we call the routine frequently enough to get reliable encoder feedback we're golden. Maybe in the future we'll further decouple the encoder from the display so we can check it at an independent frequency, and we'll see if it helps with responsiveness.

If this is reasonably stable by your estimation then I'm happy to go ahead and merge it forthwith.

@dbuezas
Copy link
Contributor Author

dbuezas commented Dec 5, 2023

TL; DR it is stable, let's merge!

Oh yes, I know, your work is very visible :)

The modifications in this PR have been rock solid for me. On top of that, it is only active if the _F variant is selected, so it shouldn't affect anyone yet.

Once merged and released I'll add a flag to marlin to select it (or we could make it default if we're confident it is always better than by page)

Regarding encoder updates, I saw that there is a call to ui.update_buttons in temperature.cpp, so it seems to me like encoders are read at a faster rate (500hz). I actually changed it to run at 1khz as I have a snappy board.

I do still get some encoder jumps every so often, I assume due missed steps since they aren't read by pin change interrupts. TBH, the encoder felt more precise in Marlin 1.x on an atmega, is there any specific reason for them to be read at an interval instead of using pin change interrupts?

Thank you and congrats for your work in Marlin, I get a bit lost with the macros, but it is an amazing code base.

@dbuezas
Copy link
Contributor Author

dbuezas commented Dec 26, 2023

@thinkyhead in case this was forgotten, it is ready to be merged

@thinkyhead
Copy link
Member

Thanks for the heads-up! I'll get to that shortly.

@thinkyhead thinkyhead merged commit c1ec1a4 into MarlinFirmware:master Dec 27, 2023
@thisiskeithb
Copy link
Member

thisiskeithb commented Dec 27, 2023

This breaks compiling for several LCDs (issue with Mini 12864 was reported on Discord):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants