Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTLP exporters are executed serially #1202

Closed
jpkrohling opened this issue Jun 26, 2020 · 4 comments
Closed

OTLP exporters are executed serially #1202

jpkrohling opened this issue Jun 26, 2020 · 4 comments
Labels
bug Something isn't working

Comments

@jpkrohling
Copy link
Member

jpkrohling commented Jun 26, 2020

Describe the bug
The OTLP exporter has a number of workers, implying that they would run in parallel consuming data from the pipeline. However, it appears that only one worker is active at any given time. Adding a few log statements to the code (branch: jpkrohling/opentelemetry-collector#jpkrohling/issue1202), and running the Jaeger trace generator to send spans, this is seen in the logs.

Note that the linked branch adds a timeout of 1 second in the context for exportTrace, and there's exactly 1s between the message exporting trace and returning exporter to the pool and nothing else is executed there, meaning that the failed execution of exportTrace has blocked the entire exporter and no other worker has performed any work.

2020-06-26T13:16:35.602+0200	INFO	loggingexporter/logging_exporter.go:102	TraceExporter	{"#spans": 8361}
2020-06-26T13:16:35.603+0200	DEBUG	otlpexporter/otlp.go:197	acquiring exporter	{"component_kind": "exporter", "component_type": "otlp", "component_name": "otlp"}
2020-06-26T13:16:35.604+0200	DEBUG	otlpexporter/otlp.go:214	exporting trace	{"component_kind": "exporter", "component_type": "otlp", "component_name": "otlp"}
2020-06-26T13:16:36.604+0200	DEBUG	otlpexporter/otlp.go:218	returning exporter to the pool	{"component_kind": "exporter", "component_type": "otlp", "component_name": "otlp"}
2020-06-26T13:16:36.682+0200	INFO	loggingexporter/logging_exporter.go:102	TraceExporter	{"#spans": 8363}
2020-06-26T13:16:36.682+0200	DEBUG	otlpexporter/otlp.go:197	acquiring exporter	{"component_kind": "exporter", "component_type": "otlp", "component_name": "otlp"}
2020-06-26T13:16:36.682+0200	DEBUG	otlpexporter/otlp.go:214	exporting trace	{"component_kind": "exporter", "component_type": "otlp", "component_name": "otlp"}
2020-06-26T13:16:37.682+0200	DEBUG	otlpexporter/otlp.go:218	returning exporter to the pool	{"component_kind": "exporter", "component_type": "otlp", "component_name": "otlp"}
2020-06-26T13:16:37.735+0200	INFO	loggingexporter/logging_exporter.go:102	TraceExporter	{"#spans": 8367}
2020-06-26T13:16:37.735+0200	DEBUG	otlpexporter/otlp.go:197	acquiring exporter	{"component_kind": "exporter", "component_type": "otlp", "component_name": "otlp"}
2020-06-26T13:16:37.735+0200	DEBUG	otlpexporter/otlp.go:214	exporting trace	{"component_kind": "exporter", "component_type": "otlp", "component_name": "otlp"}
2020-06-26T13:16:38.736+0200	DEBUG	otlpexporter/otlp.go:218	returning exporter to the pool	{"component_kind": "exporter", "component_type": "otlp", "component_name": "otlp"}
2020-06-26T13:16:38.781+0200	INFO	loggingexporter/logging_exporter.go:102	TraceExporter	{"#spans": 8364}
2020-06-26T13:16:38.781+0200	DEBUG	otlpexporter/otlp.go:197	acquiring exporter	{"component_kind": "exporter", "component_type": "otlp", "component_name": "otlp"}
2020-06-26T13:16:38.782+0200	DEBUG	otlpexporter/otlp.go:214	exporting trace	{"component_kind": "exporter", "component_type": "otlp", "component_name": "otlp"}
2020-06-26T13:16:39.782+0200	DEBUG	otlpexporter/otlp.go:218	returning exporter to the pool	{"component_kind": "exporter", "component_type": "otlp", "component_name": "otlp"}

Steps to reproduce

  1. start OpenTelemetry Collector from master (make run), or from the linked branch for extra log statements
  2. run a trace generator, like Jaeger's (go run ./cmd/tracegen -workers 1 -duration 1s)

Configuration to use:

receivers:
  otlp:
    endpoint: "localhost:55680"
  jaeger:
    protocols:
      thrift_compact:

processors:
  batch:
    timeout: 1s

exporters:
  otlp:
    endpoint: "localhost:55680"
  jaeger:
    endpoint: "localhost:14250"
    insecure: true
  logging:

service:
  pipelines:
    traces/custom-1:
      receivers: [jaeger]
      processors: [batch]
      exporters: [logging, otlp]
    traces/custom-2:
      receivers: [otlp]
      processors: [batch]
      exporters: [logging, jaeger]

What did you expect to see?
Workers sending trace data in parallel

What did you see instead?
Workers being serially called

What version did you use?
Version: master as of now (4eca960a4eb02104694324cf161ad9ec944c44c9)

What config did you use?
Config: see above

Environment
OS: Fedora 32 with the latest updates
Compiler(if manually compiled): go1.14.3

Additional context
Related to #1192

@jpkrohling jpkrohling added the bug Something isn't working label Jun 26, 2020
@jpkrohling jpkrohling changed the title Serial execution of exporters OTLP exporters are executed serially Jun 26, 2020
@bogdandrutu
Copy link
Member

I think the confusion comes from the name of the config parameter. The num_workers is actually the number of clients we create, we don't create a goroutine for every request in the exporter so requests are executed synchronously (not necessary serialized). If you have multiple producers requests will be sent in parallel. This is intentional because:

  1. We want to wait for the response and maybe retry if necessary.
  2. Let the other components like queue retry to control the parallelism of the system. We try to not change goroutines in every component of the system (only receivers and queue retry and batch will do that) to limit the number of context switches and threads/goroutines.

@jpkrohling
Copy link
Member Author

Sorry about the stupid question, but why are there multiple clients, if only one them can be active at any given time? Isn't gRPC already load balancing the connections anyway?

@bogdandrutu
Copy link
Member

That is definitely not a stupid question, was looking at the same issue right now, and I think gRPC can do a better job than what we do in terms of sending concurrent requests. I need to test that but I am confident that this is unnecessary.

@bogdandrutu
Copy link
Member

@jpkrohling I am going to close this since we removed the logic of workers and rely on gRPC load balancing #1212

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants