Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default readiness on historicals breaks operator. #97

Closed
mindreader opened this issue Aug 21, 2023 · 3 comments · Fixed by #98
Closed

Default readiness on historicals breaks operator. #97

mindreader opened this issue Aug 21, 2023 · 3 comments · Fixed by #98
Labels
bug Something isn't working

Comments

@mindreader
Copy link

In pr #72 , defaults probes are set for various components.

This breaks the operator because while /druid/historical/v1/readiness is just fine to call /druid/historical/v1/loadstatus is a privileged call that requires authentication. Since it places this readinessProbe by default if you don't have one, the only way to disable it is to hard wire one yourself to another call.

curl -vvv http://127.0.0.1:4124/druid/historical/v1/readiness
HTTP/1.1 200 OK

curl -vvv http://127.0.0.1:4124/druid/historical/v1/loadstatus
HTTP/1.1 401 Unauthorized
@mindreader
Copy link
Author

This also caused a painful outage, as the operator seemed to be forcing a default startupProbe onto the coordinator/overlord. And the tasks that are spawned through kubernetes-overlord-extensions copied that pod spec for themselves, but they are hard coded in the druid source code to use 8100, which doesn't match the druid.port (8088) in my operator druid spec., so the startupProbes failed always, causing the pods to die after a couple minutes repeatedly.

I ended up having to disable the operator entirely to get things back in order.

@cyril-corbon
Copy link
Collaborator

cyril-corbon commented Aug 22, 2023

Hello @mindreader

Can you give us more details on your configuration and the druid version your are using ?

I cannot reproduce the 401 on druid version 0.23.0 to 25.0.0

as both path have the same goals I will change this probe.

@itamar-marom itamar-marom added the bug Something isn't working label Aug 22, 2023
@mindreader
Copy link
Author

mindreader commented Aug 22, 2023

I have a druid basic escalator which means intra-communication in my cluster requires credentials. I'm on druid version 25.

In any case, you don't have to believe me, the relevant code in druid is here: https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/server/http/HistoricalResource.java

That StateResourceFilter class is what is responsible for requiring permission to view api routes that peer at druid's internal state. It is not clear to me how k8s could be supplied credentials capable of querying that route.

The second issue is more serious, by putting Probes in that I did not have in my spec, pods just get killed, especially in the case of the task pods, I could not stop them from inheriting a bad configuration, as the overlord definitely needs a readiness probe, and the tasks definitely need a very different one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants