-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Screen corruption on Atom #6
Comments
Unfortunately, I have no time to tinker the code at this point: I have some personal stuff to do. During these two weeks, I have the access to the machine with Celeron N2840, 0x0f31. I'll try to be in touch in order to compile and to test, on demand, the forthcoming commits. |
Understood. Update whenever you get a chance to try the latest. |
re-enable debug logging. > 10s post-attach, prior to starting X set dev.drm.drm_debug=-1 |
Actually. The problem is pretty obvious __get_user_pages isn't properly implemented. Thanks for the update. This is progress nonetheless. |
Fatal trap 12: page fault while in kernel mode cpuid = 0 |
Why don't you have line numbers for modules? |
I made a fairly dumb mistake in __get_user_pages_fast, the following change should fix it: |
It crashes the instant I start the X. The previous file without line numbers is a copy-paste from a kgdb. Sometimes FreeBSD doesn't save a backtrace in a text file, I have to run a dump within the debugger. |
Thanks for the update. |
That path has never been hit before ... Please try again |
After I installed the latest binary xf86-video-intel, the problem is gone, thereby I can't tell whether the problem is fixed by the commit or not. However, the result of the latest commit is the same as of the previous: the X starts, although a picture in Firefox goes distorted and after a some time the kernel crashes. |
Another dumb bug :-/ Fixed by: |
Please retry. |
Hrrm. Distressing. Can you try sna? Just to see if it's any different? |
Or if you're using sna try uxa. |
It wasn't set. After I set uxa, the result is the same as above. After I set sna, the X crashes the kernel at the moment I run a dmenu. |
SNA seems to not have artifacts, but to be much more fragile. I'll fix this panic tomorrow. |
My last change should fix this panic aa6c49d Could you please try again when you get the chance? |
I suppose that those artifacts are the result of the incorrect gem allocation due to porting the functions which use the Linux slab allocator to the FreeBSD kernel memory management. Whilst incorrect mutex locks or unlocks cause panic, these guys doesn't; therefore it's hard to debug the problem, for me it's hard also because I don't know the detailed structure of driver's functions. That's my back-seat driver opinion. |
I've caught the bug. It doesn't crash kernel although breaks the X into a console and hangs. There's photo of the laptop screen, because it doesn't save any traceback afterwards. And there's strange text about a relocation in two places from /var/log/messages
|
I'm off the grid at least for a month. |
I’m seeing the artifacts too with a Bay Trail-M (Celeron N2830) with SNA enabled. With default Xorg configuration a YouTube video is either black or slowly spouting some images. It was just a quick test. |
I've figured this out. i915's "atomic" updates need to be actually atomic. Fix forthcoming today or tomorrow. |
Try now. Pass --disable-gpu-compositing to chrome and whatever the equivalent is on FF. That appears to crash chrome right now. As a gross workaround:
|
On Bay Trail-M, the latest version, ecd31fb, regress with respect to 490ab4e: the GPU hangs when the window manager starts and all graphics operations greatly slow: dmesg says:
Back to the old kernel and graphics speed resumes, but unusable because of artifacts. |
What window manager are you using? |
cwm
Yes and no, I installed the packages you provide at http://www.bsddesktop.com/images/xserver-next-pkgs/ but not compiled from source myself. I’ll try it and report back. |
Using the packages should be good enough. I'll try testing cwm here. twm of course works fine, but that's not much of a stretch. |
Drop scan generation number and node table scan lock - the only place where ni_scangen is checked is in ieee80211_timeout_stations() (and it is used to prevent duplicate checking of the same node); node scan lock protects only this variable + node table scan generation number. This will fix (at least) next LOR (hostap mode): lock order reversal: 1st 0xc175f84c urtwm0_scan_loc (urtwm0_scan_loc) @ /usr/src/sys/modules/wlan/../../net80211/ieee80211_node.c:2019 2nd 0xc175e018 urtwm0_com_lock (urtwm0_com_lock) @ /usr/src/sys/modules/wlan/../../net80211/ieee80211_node.c:2693 stack backtrace: #0 0xa070d1c5 at witness_debugger+0x75 #1 0xa070d0f6 at witness_checkorder+0xd46 #2 0xa0694cce at __mtx_lock_flags+0x9e #3 0xb03ad9ef at ieee80211_node_leave+0x12f #4 0xb03afd13 at ieee80211_timeout_stations+0x483 #5 0xb03aa1c2 at ieee80211_node_timeout+0x42 #6 0xa06c6fa1 at softclock_call_cc+0x1e1 #7 0xa06c7518 at softclock+0xc8 #8 0xa06789ae at intr_event_execute_handlers+0x8e #9 0xa0678fa0 at ithread_loop+0x90 #10 0xa0675fbe at fork_exit+0x7e #11 0xa08af910 at fork_trampoline+0x8 In addition to the above: * switch to ieee80211_iterate_nodes(); * do not assert that node table lock is held, while calling node_age(); that's not really needed (there are no resources, which can be protected by this lock) + this fixes LOR/deadlock between ieee80211_timeout_stations() and ieee80211_set_tim() (easy to reproduce in HOSTAP mode while sending something to an STA with enabled power management). Tested: * (avos) urtwn0, hostap mode * (adrian) AR9380, STA mode * (adrian) AR9380, AR9331, AR9580, hostap mode Notes: * This changes the net80211 internals, so you have to recompile all of it and the wifi drivers. Submitted by: avos Approved by: re (delphij) Differential Revision: https://reviews.freebsd.org/D6833
I'm not sure whether there was supposed to progress on this issue, but with the image I detailed that in a post to |
commit e2c8b8701e2d moved modeset locking inside resume/suspend functions, but missed a code path only executed on lid close/open on older hardware. The result was a deadlock when closing and opening the lid without suspending on such hardware: ============================================= [ INFO: possible recursive locking detected ] 4.6.0-rc1 #385 Not tainted --------------------------------------------- kworker/0:3/88 is trying to acquire lock: (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffffa063e6a4>] intel_display_resume+0x4a/0x12f [i915] but task is already holding lock: (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffffa02d0d4f>] drm_modeset_lock_all+0x3e/0xa6 [drm] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&dev->mode_config.mutex); lock(&dev->mode_config.mutex); *** DEADLOCK *** May be due to missing lock nesting notation 7 locks held by kworker/0:3/88: #0: ("kacpi_notify"){++++.+}, at: [<ffffffff81068dfc>] process_one_work+0x14a/0x50b #1: ((&dpc->work)#2){+.+.+.}, at: [<ffffffff81068dfc>] process_one_work+0x14a/0x50b #2: ((acpi_lid_notifier).rwsem){++++.+}, at: [<ffffffff8106f874>] __blocking_notifier_call_chain+0x34/0x65 #3: (&dev_priv->modeset_restore_lock){+.+.+.}, at: [<ffffffffa0664cf6>] intel_lid_notify+0x3c/0xd9 [i915] #4: (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffffa02d0d4f>] drm_modeset_lock_all+0x3e/0xa6 [drm] #5: (crtc_ww_class_acquire){+.+.+.}, at: [<ffffffffa02d0d59>] drm_modeset_lock_all+0x48/0xa6 [drm] #6: (crtc_ww_class_mutex){+.+.+.}, at: [<ffffffffa02d0b2a>] modeset_lock+0x13c/0x1cd [drm] stack backtrace: CPU: 0 PID: 88 Comm: kworker/0:3 Not tainted 4.6.0-rc1 #385 Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011 Workqueue: kacpi_notify acpi_os_execute_deferred 0000000000000000 ffff88022fd5f990 ffffffff8124af06 ffffffff825b39c0 ffffffff825b39c0 ffff88022fd5fa60 ffffffff8108f547 ffff88022fd5fa70 000000008108e817 ffff880230236cc0 0000000000000000 ffffffff825b39c0 Call Trace: [<ffffffff8124af06>] dump_stack+0x67/0x90 [<ffffffff8108f547>] __lock_acquire+0xdb5/0xf71 [<ffffffff8108bd2c>] ? look_up_lock_class+0xbe/0x10a [<ffffffff8108fae2>] lock_acquire+0x137/0x1cb [<ffffffff8108fae2>] ? lock_acquire+0x137/0x1cb [<ffffffffa063e6a4>] ? intel_display_resume+0x4a/0x12f [i915] [<ffffffff8148202f>] mutex_lock_nested+0x7e/0x3a4 [<ffffffffa063e6a4>] ? intel_display_resume+0x4a/0x12f [i915] [<ffffffffa063e6a4>] ? intel_display_resume+0x4a/0x12f [i915] [<ffffffffa02d0b2a>] ? modeset_lock+0x13c/0x1cd [drm] [<ffffffffa063e6a4>] intel_display_resume+0x4a/0x12f [i915] [<ffffffffa063e6a4>] ? intel_display_resume+0x4a/0x12f [i915] [<ffffffffa02d0b2a>] ? modeset_lock+0x13c/0x1cd [drm] [<ffffffffa02d0b2a>] ? modeset_lock+0x13c/0x1cd [drm] [<ffffffffa02d0bf7>] ? drm_modeset_lock+0x17/0x24 [drm] [<ffffffffa02d0c8b>] ? drm_modeset_lock_all_ctx+0x87/0xa1 [drm] [<ffffffffa0664d6a>] intel_lid_notify+0xb0/0xd9 [i915] [<ffffffff8106f4c6>] notifier_call_chain+0x4a/0x6c [<ffffffff8106f88d>] __blocking_notifier_call_chain+0x4d/0x65 [<ffffffff8106f8b9>] blocking_notifier_call_chain+0x14/0x16 [<ffffffffa0011215>] acpi_lid_send_state+0x83/0xad [button] [<ffffffffa00112a6>] acpi_button_notify+0x41/0x132 [button] [<ffffffff812b07df>] acpi_device_notify+0x19/0x1b [<ffffffff812c8570>] acpi_ev_notify_dispatch+0x49/0x64 [<ffffffff812ab9fb>] acpi_os_execute_deferred+0x14/0x20 [<ffffffff81068f17>] process_one_work+0x265/0x50b [<ffffffff810696f5>] worker_thread+0x1fc/0x2dd [<ffffffff810694f9>] ? rescuer_thread+0x309/0x309 [<ffffffff810694f9>] ? rescuer_thread+0x309/0x309 [<ffffffff8106e2d6>] kthread+0xe0/0xe8 [<ffffffff8107bc47>] ? local_clock+0x19/0x22 [<ffffffff81484f42>] ret_from_fork+0x22/0x40 [<ffffffff8106e1f6>] ? kthread_create_on_node+0x1b5/0x1b5 Fixes: e2c8b8701e2d ("drm/i915: Use atomic helpers for suspend, v2.") Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1459328913-13719-1-git-send-email-bjorn@mork.no
May I object to the closing of this issue? I still occasionally get stuff like this: The corruption frequency increases with the "age" of the X server, but just restarting X is enough to restart the clock. I work around it by zooming, which seems make another "random draw" of whether I get a corrupted image or not. Here is an example of the same image without corruption: I couldn't reproduce this on the same hardware with a live Fedora. EDIT: forgot to add that it's with the latest commit of the drm-next-4.7 branch, from 2016-12-26, and official CURRENT binary packages from around the end of January (before the libz update made it incompatible with drm-next-4.7). |
Retest with drm-next please (not drm-next-4.7) |
I've just started the cross-build (for some reason a local build doesn't work, it complains about missing I was using drm-next-4.7 instead of drm-next because last I heard it was still the branch recommended for intel users. Have things changed, or is it only to test the problem against the latest code? |
Yes, drm-next is now more stable and fixes most of the known issues. You should use it with xorg-server 1.18+ and the modesetting driver. |
So using drm-next from about three weeks ago, I haven't been able to reproduce the corruption with the defaults settings, which I gather is UXA. However I did a try with SNA, and there is massive corruption to the point of being unusable, but it doesn't look like anything posted here. Should I open a new ticket for that? Or is SNA known to be broken? What can I do to help diagnose it? |
…init Feature/pflog openrc init
It's a dmesg after kldloading the latest.
[drm] Initialized drm 1.1.0 20060810
bus_register unimplemented!!!
[drm:drm_pci_init]
[drm:0xffffffff82dc525as] ERROR FreeBSD needs DRIVER_MODESET
System doesn't hang now, but it's still scfb which runs video.
The DRIVER_MODESET flag is getting cleared. It's unclear why this is happening or how it's tied to its inability to find a PCH.
The text was updated successfully, but these errors were encountered: