Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very high cpu usage with R2300 #63

Closed
Arthav24 opened this issue Jul 14, 2021 · 18 comments · Fixed by #65
Closed

Very high cpu usage with R2300 #63

Arthav24 opened this issue Jul 14, 2021 · 18 comments · Fixed by #65

Comments

@Arthav24
Copy link

Arthav24 commented Jul 14, 2021

I am using R2300 with intel i5-6500TE. At the htop I can see pf driver consuming almost 80-90% CPU. Does anyone facing the same because I haven't modified the driver. Please guide me on how to debug the cause. Just launching the r2300.launch increases this much CPU usage. Not even launching the data processing node.
High CPU usage causing computational bottlenecks with my usage so looking forward to reducing it to considerable levels.
System details: Ubuntu 18.04, ROS melodic, driver v1.1.1

also since I am for now using only one of the planes out of four is there any way to optimize in terms of reducing load and data.

@ptruka
Copy link
Contributor

ptruka commented Jul 19, 2021

Hello Arthav, I'm having a i5-8350U here. Even if I start the driver in a virtual machine, I don't get such a high CPU usage.
Is RViz running? If so, how does the system behave when RViz is closed?

If you are only using one layer, you could disable the unused ones. This can be done via dynamic_reconfigure of ROS, or directly via http (example): http:///cmd/set_parameter?layer_enable=on,off,off,off
Further information can be found in section "2.6.7 Layer configuration (layer_enable)" of the R2300 Ethernet communication protocol.
Please let us know if this setting changes anything about your CPU load.

@Arthav24
Copy link
Author

Hey,
No I am not running rviz in normal operation. Just for visualising sometimes I use rviz.
Rviz doesn't seems to affect the CPU usage.
Initially I did the same switched off the unused planes and also commented static tf publisher in launch file for unused planes that rather increased the usage +10%. Reverting back brought it down to 80s%.

@hsd-dev
Copy link
Contributor

hsd-dev commented Jul 20, 2021

@Arthav24 are you using the binaries or you have build the repo from source? If you are building from source, please build it in debug mode

catkin build -DCMAKE_BUILD_TYPE=Debug

and use valgrind to report if there are any memory leaks:

launch-prefix="valgrind"

So the node in the launch file would look something like

<node pkg="pf_driver" type="ros_main" name="r2300_node" output="screen" launch-prefix="valgrind --tool=memcheck --leak-check=yes  --log-file=memleak.log">

Please report here what is the log output from valgrind

@Arthav24
Copy link
Author

Arthav24 commented Jul 22, 2021

memleak_old_4tf.log
memleak_old_1tf.log
memleak_latest_4tf.log
memleak_latest_1tf.log

Please find the attached logs.
memleak_old_4tf.log - tag 1.1.1 and enabled all the tfs in launch file ( ring1/2/3/4 )
memleak_old_1tf.log - tag 1.1.1 and enabled only one of the plane I use.

memleak_latest_4tf.log - main branch and enabled all the tfs in launch file ( ring1/2/3/4 )
memleak_latest_1tf.log - main branch and enable only one of the plane I use.

@hsd-dev
Copy link
Contributor

hsd-dev commented Jul 22, 2021

The logs don't seem to be alarming. I will do a test with

launch-prefix="valgrind --tool=callgrind --callgrind-out-file=callgrind.log"

and then we can compare the logs maybe

@Arthav24
Copy link
Author

hey @ipa-hsd any update from your test ?

@hsd-dev
Copy link
Contributor

hsd-dev commented Jul 26, 2021

@Arthav24 when I compile the latest main branch with

catkin build -DCMAKE_BUILD_TYPE=Release

the CPU usage fluctuates between 30-60%. Not as high as you reported, but it can be improved maybe.

By compiling with Debug option and using callgrind, the log file is callgrind.log. I will need to go through the log in detail, but a quick look seems like most of the time consuming calls are ROS-specific.

How long did you run the driver after which you noticed the high CPU usage? Or it's the case since the beginning?

@hsd-dev
Copy link
Contributor

hsd-dev commented Jul 26, 2021

I just realized you mentioned that you are working with version 1.1.1, but we fixed some issues with the low frequency after the release: https://github.com/PepperlFuchs/pf_lidar_ros_driver/commits/main. Could you try pulling in the latest main branch and compiling with Release option?

@Arthav24
Copy link
Author

yeah, but for me the CPU usage was always >60%. When I was using the driver in my stack then I had BUILD_TYPE=Release but had similar results.
I had been using driver for very long runtime but the usage was high from the beginning as soon as the node starts.

yes, I did check the latest commits in main branch for fixing frequency and accumulation. and tried but saw similar results. This too was build with Release option.

@hsd-dev
Copy link
Contributor

hsd-dev commented Jul 27, 2021

Can you try with

launch-prefix="valgrind --tool=callgrind --callgrind-out-file=callgrind.log"

and share the log?

@hsd-dev
Copy link
Contributor

hsd-dev commented Jul 27, 2021

I had another look at the callgrind.log and looks like this while loop is the culprit. It's a fast while loop without any sleep and the loop tries to read data even if there isn't any. For 25Hz data, this is quite a lot.

This was referenced Jul 27, 2021
@hsd-dev
Copy link
Contributor

hsd-dev commented Jul 27, 2021

@Arthav24 could you please try #65? For me the consumption has come down from 30-60% to less than 10%. This is the log file after the fix: callgrind_fix.log. I have marked the PR with [WIP] because there might be some possibility to improve it slightly more.

@Arthav24
Copy link
Author

Sorry for the delayed response, I see you have come up with an amazing solution which I will definitely give a try and report back.

Definitely, I think it could be improved because I have also used R2000 but a different driver and for that, I never saw usage >5%.
Thanks to @konu-droid for providing logs and checking things out.

@hsd-dev
Copy link
Contributor

hsd-dev commented Jul 30, 2021

@Arthav24 any updates?

@Arthav24
Copy link
Author

Again sorry for the delayed response,
These are the observation from 2 days of testing :

  1. CPU usage dropped to 13% with all the layers enabled. 👍
  2. CPU usage jumped to 80%( peak of 98%) if we disable 3 layers from both the sensor and launch file.
  3. Reverting 2) brought back to 13%. 👎
  4. The usage in 2) seems to creep up slowly as initially after launching it was around 13-15% but after sometime it went up to that level. 👎
  5. The fix in while loop seems to affect the data frequency as we noticed that at 50Hz mode actual rate was 6Hz and at 100Hz mode rate was 12Hz, just half of expected in both cases.
  6. While debugging we tried to cut half the wait time you introduced in the fix to 5 ms that broght back the expected frequencies in both modes.

Once again thanks @konu-droid for running all the tests.

@hsd-dev
Copy link
Contributor

hsd-dev commented Jul 30, 2021

Thanks for the detailed analysis!

While debugging we tried to cut half the wait time you introduced in the fix to 5 ms that broght back the expected frequencies in both modes.

here you mean?

std::this_thread::sleep_for(std::chrono::milliseconds(10));

Maybe the sleep was not required in that loop in the 1st place. I will revert it back.

CPU usage jumped to 80%( peak of 98%) if we disable 3 layers from both the sensor and launch file.

That seems strange. But in any case, I should introduce a conditional variable which wakes up the reader thread only when there is data available.

The usage in 2) seems to creep up slowly as initially after launching it was around 13-15% but after sometime it went up to that level.

That sounds more like a memory leak (?)

@Arthav24
Copy link
Author

Exactly this.
std::this_thread::sleep_for(std::chrono::milliseconds(5));

I also thought that sleep is not required but cutting it half brought the frequencies back to expected so didn't try removing it completely.
That would be perfect that threads are spawned only for the enabled layers.
Yes, it looks and sounds like a memory leak, like something piling up but couldn't debug the same so right now using it with all layers enabled.

@hsd-dev
Copy link
Contributor

hsd-dev commented Apr 1, 2022

  1. CPU usage jumped to 80%( peak of 98%) if we disable 3 layers from both the sensor and launch file.
    Reverting 2) brought back to 13%. -1

Below is the section that is most likely culprit of this.

if (layer_idx < layer_prev_)
{
if (!cloud_->data.empty())
{
pcl_publisher_.publish(cloud_);
cloud_.reset(new sensor_msgs::PointCloud2());
}
copy_pointcloud(*cloud_, c);
}
else
{
add_pointcloud(*cloud_, c);
}
layer_prev_ = layer_idx;

hsd-dev added a commit that referenced this issue Apr 4, 2022
@hsd-dev
Copy link
Contributor

hsd-dev commented Apr 4, 2022

Pushed new changes to #65 which maintains the frequency as expected. Also, the consumption is around 10-15% in case of all layers enabled and 5-8% in case of only one layer enabled. The memory leak is currently fixed by resetting the cloud message when the scan parameters change (e.g. if layers_enabled is changed before all layers have been published for the current scan).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants