-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
frequent OOM on 842N v2 #1197
Comments
The node (https://meshviewer.darmstadt.freifunk.net/#/en/map/10feed08eda6) is running with the latest changes from the master branch. |
I found several nodes with high CPU-Load (SYS-Load > 95%) if mesh-on-LAN is active. The nextnode-Page wont load and sometimes the node crashs. There is no ugly Flag in "batctl tg" Gluon-Version: gluon-v2017.1.1+ |
@Sunz3r does this always occur when mesh on LAN is active, or only when there are also connections on the LAN interfaces (cable plugged in and/or other batman nodes to communicate with)? |
@azrdev: I see a process called "autoupdater" and "10stop-network" in the provided log. So seems that it crashed while trying to update? Can you maybe reliably reproduce the crash when running /usr/sbin/autoupdater manually? |
Also, the 842nd v1/v2 seems to be one of those devices with 8MB of flash, but still only 32MB of RAM. Which could explain why this type of device is having issues while trying to update first. Compared to a 841nd, for instance, which has 32MB of RAM too, but only needs to store a 4MB image when updating. |
@Sunz3r: Seems like a different issue. Maybe create a new ticket in the issue tracker here on Github? |
@T-X might be. If so, how would that help us / what should I provide? |
@azrdev: One first, interesting thing to find out would be whether the crash happens during or after downloading the image. Can you add some "print/write" statements writing to /dev/kmesg in /usr/sbin/autoupdater to output some debug messages, so we know better at what time of the updating process things get the Out-of-Memory? If it were possible for you to reproduce the issue reliably then I think it might make sense to add some patches to increase the verbosity of the Out-of-Memory trace, too. For instance more detailed information regarding what is using how much memory not just in userspace but also in kernel space would be very interesting. Not sure, maybe it'd be possible to compile an ar71xx image with CONFIG_KERNEL_SLABINFO=y and dump /proc/slabinfo from within the OOM panic handler, too. |
PS: @azrdev or if you can reliably trigger it by executing /usr/sbin/autoupdater from the login shell via the serial then you might not need to write to /dev/kmesg. Then it should be sufficient to write to stdout or stderr. You could sprinkle some lines like this in /usr/sbin/autoupdater then:
|
@T-X first results: Without uplink and private wifi disabled ( |
seems like I can (currently) reproduce a crash while receiving the last ~third of the firmware image, i.e. in wget |
would be interesting to see how another device with the same specs (e.g. WR841N) performs in the exact same situation (same spot, same configuration). |
azrdev, this reproduceable crash, is it with the private
wifi enabled or disabled now? And this node itself has a
fastd VPN uplink via its WAN port?
|
For me this issue more specific about a certain router model. Potentially i would see similar issuo on TL-WA901V5 wen running on V4 image... |
thanks for not reading this ticket before leaving a comment. |
i guess i can remember it from reading the last n times. i guess this was not meant as an ad hominem. my point was to disagree that's something like #753, just by the fact that your previous question was not marked as "needs answer" and not beeing answered. in this case it's either your suggestion was to close this and to move in d). replacing the unit with an 841 would help probably the same way as a drop in replacement with an identical 842v2. |
simply wrong! |
sorry for delaying this, I'll do the test with the 841 |
@rotanid "@azrdev what about your check, may i close this issue in favor of #753 ?" But off course this might be a susccessful strategy to reduce number of issues in case there is no feedback for individual ones which sounded "different" when they were opened. Anyhow, depending on the outcome here, i would consider to open a similar request for a 901v5 (frequent OOM reboots like https://paste.debian.net/989352/ , where a 841v11 in the same spot performs without problems. But since 1) i do not have a second 901v5 to test for individual HW defect, nor a 901v4 to see if it's an issue with the profile, nor is this build a LEDE, but CC: I can not open a topic. i just like to hint, that there might be similar situations on other routers too 'profile specific'). |
So, I had these running now for a month, logging uptime and load (manually, since our dashboard went down). both nodes were in the same location as previously, and both had fastd vpn uplink via ethernet (wan port). I did not capture serial logs this time, but IMHO the data suggests that the 841 also frequently hangs with private wifi enabled 841:842 |
So what's your conclusion? To change it from (Sorry, this is not a serious suggestion, but what i may catch from your reply: "happens with 841 on same spot as well" correct?) |
yes, though I'm not sure if it's in the autoupdater (as with the 842) because I didn't capture a serial log |
@azrdev ok, let's continue this discussion over there then. |
My TP-Link TL-WR842N v2 with firmware from darmstadt.freifunk.net frequently reboots, usually it doesn't get more than 1 hour of uptime. Nothing useful on dmesg logs (except maybe lots of
daemon.notice netifd: client (1352): cat: write error: Broken pipe
), but I got a serial log, to be found at https://git.darmstadt.ccc.de/snippets/9The text was updated successfully, but these errors were encountered: