-
Notifications
You must be signed in to change notification settings - Fork 127
Backport crossdock fixes #7
Backport crossdock fixes #7
Conversation
Signed-off-by: Isaac Hier <isaachier@gmail.com>
Signed-off-by: Isaac Hier <isaachier@gmail.com>
Signed-off-by: Isaac Hier <isaachier@gmail.com>
Signed-off-by: Isaac Hier <isaachier@gmail.com>
Signed-off-by: Isaac Hier <isaachier@gmail.com>
} | ||
|
||
void getSamplingStrategy(SamplingStrategyResponse& result, | ||
const std::string& serviceName) override | ||
{ | ||
if (_serverAddr == net::IPAddress()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, not following. _serverAddr is set to net::IPAddress() if the client couldn't reach agent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. net::IPAddress()
is the default value for an empty IP and the address is only set upon successful connection. The reason for this is to avoid contacting the server if we couldn't connect on construction. I guess the real solution is not to start the looping thread at all, but that is another class that ends up calling this periodically.
/build | ||
/crossdock/crossdock | ||
/crossdock/jaeger-docker-compose.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this a temporary thing? don't think we should be ignoring this completely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crossdock/crossdock is the executable binary itself. Actually changed this in my recent crossdock development. The jaeger-docker-compose.yml
is a downloaded file to launch the other jaeger clients.
Signed-off-by: Isaac Hier <ihier@uber.com>
Signed-off-by: Isaac Hier <ihier@uber.com>
I'm closing this in favor of real crossdock pull request |
`Span::isFinished()` returns true iff the span's duration is zero. So, if `steady_clock::now()` returns the same value at span creation as at `Finish()` (unlikely, but possible, especially in virtualized environments), the following scenario can happen: - `Span::FinishWithOptions` sets `_duration` to zero - and then calls `reportSpan`, which calls `RemoteReporter::report` - which acquires the reporter mutex and - adds the span to the reporter's `_queue` Later: - `RemoteReporter::sweepQueue` acquires the reporter mutex - `RemoteReporter::sweepQueue` makes a copy of the span (https://github.com/jaegertracing/jaeger-client-cpp/blob/7e9a135b362c47f7e4d42e0693b05b91a70e6888/src/jaegertracing/reporters/RemoteReporter.cpp#L95) - The span is serialized in `RemoteReporter::sendSpan` - The span (the copy made above) needs to be destroyed because the block in `sweepQueue` ends. - The span's destructor is called, which calls `FinishWithOptions` again - ... which sees that `_duration` is zero, so `isFinished()` returns false: https://github.com/jaegertracing/jaeger-client-cpp/blob/7e9a135b362c47f7e4d42e0693b05b91a70e6888/src/jaegertracing/Span.cpp#L94 - so it tries to call `reportSpan` again - which tries to acquire the reporter's mutex - but we're holding the lock in `sweepQueue`, so - deadlock. Relevant stack trace: ``` (gdb) bt (gdb) bt #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 jaegertracing#1 0x00007fb09c722023 in __GI___pthread_mutex_lock (mutex=0x7fb088b1c1c0) at ../nptl/pthread_mutex_lock.c:78 jaegertracing#2 0x00007fb08860987b in jaegertracing::reporters::RemoteReporter::report(jaegertracing::Span const&) () from /usr/local/lib/libjaegertracing_plugin.so jaegertracing#3 0x00007fb0885b6506 in jaegertracing::Span::FinishWithOptions(opentracing::v2::FinishSpanOptions const&) [clone .localalias.361] () from /usr/local/lib/libjaegertracing_plugin.so jaegertracing#4 0x00007fb08860849a in jaegertracing::Span::~Span() [clone .constprop.253] () from /usr/local/lib/libjaegertracing_plugin.so jaegertracing#5 0x00007fb08860a8fa in jaegertracing::reporters::RemoteReporter::sweepQueue() () from /usr/local/lib/libjaegertracing_plugin.so jaegertracing#6 0x00007fb0886f2f5f in execute_native_thread_routine () from /usr/local/lib/libjaegertracing_plugin.so jaegertracing#7 0x00007fb09c71f6db in start_thread (arg=0x7fb087574700) at pthread_create.c:463 jaegertracing#8 0x00007fb0989ae88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 ``` Fixed by enforcing a minimum span duration of 1 `SteadyClock` tick. Signed-off-by: Tudor Bosman <tudor@rockset.com>
* Fix deadlock if steady_clock::now() returns the same value twice `Span::isFinished()` returns true iff the span's duration is zero. So, if `steady_clock::now()` returns the same value at span creation as at `Finish()` (unlikely, but possible, especially in virtualized environments), the following scenario can happen: - `Span::FinishWithOptions` sets `_duration` to zero - and then calls `reportSpan`, which calls `RemoteReporter::report` - which acquires the reporter mutex and - adds the span to the reporter's `_queue` Later: - `RemoteReporter::sweepQueue` acquires the reporter mutex - `RemoteReporter::sweepQueue` makes a copy of the span (https://github.com/jaegertracing/jaeger-client-cpp/blob/7e9a135b362c47f7e4d42e0693b05b91a70e6888/src/jaegertracing/reporters/RemoteReporter.cpp#L95) - The span is serialized in `RemoteReporter::sendSpan` - The span (the copy made above) needs to be destroyed because the block in `sweepQueue` ends. - The span's destructor is called, which calls `FinishWithOptions` again - ... which sees that `_duration` is zero, so `isFinished()` returns false: https://github.com/jaegertracing/jaeger-client-cpp/blob/7e9a135b362c47f7e4d42e0693b05b91a70e6888/src/jaegertracing/Span.cpp#L94 - so it tries to call `reportSpan` again - which tries to acquire the reporter's mutex - but we're holding the lock in `sweepQueue`, so - deadlock. Relevant stack trace: ``` (gdb) bt (gdb) bt #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007fb09c722023 in __GI___pthread_mutex_lock (mutex=0x7fb088b1c1c0) at ../nptl/pthread_mutex_lock.c:78 #2 0x00007fb08860987b in jaegertracing::reporters::RemoteReporter::report(jaegertracing::Span const&) () from /usr/local/lib/libjaegertracing_plugin.so #3 0x00007fb0885b6506 in jaegertracing::Span::FinishWithOptions(opentracing::v2::FinishSpanOptions const&) [clone .localalias.361] () from /usr/local/lib/libjaegertracing_plugin.so #4 0x00007fb08860849a in jaegertracing::Span::~Span() [clone .constprop.253] () from /usr/local/lib/libjaegertracing_plugin.so #5 0x00007fb08860a8fa in jaegertracing::reporters::RemoteReporter::sweepQueue() () from /usr/local/lib/libjaegertracing_plugin.so #6 0x00007fb0886f2f5f in execute_native_thread_routine () from /usr/local/lib/libjaegertracing_plugin.so #7 0x00007fb09c71f6db in start_thread (arg=0x7fb087574700) at pthread_create.c:463 #8 0x00007fb0989ae88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 ``` Fixed by enforcing a minimum span duration of 1 `SteadyClock` tick. Signed-off-by: Tudor Bosman <tudor@rockset.com> * fix typo Signed-off-by: Tudor Bosman <tudor@rockset.com>
No description provided.