-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collecting node logs #517
Collecting node logs #517
Changes from all commits
4d87f09
84cae0a
2b88179
d548637
bca19d3
c06087b
5ab224a
b9f64f3
e4cf082
a556fcb
4e26b3b
1666845
3fbb7c6
2133d6d
bde839f
f09b404
9e9feac
07863f1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
Nov 02 08:37:10.952465 ip-10-0-146-242 hyperkube[1469]: E1102 08:37:10.952434 1469 file.go:108] Unable to process watch event: can't process config file "/etc/kubernetes/manifests/etcd-pod.yaml": /etc/kubernetes/manifests/etcd-pod.yaml: couldn't parse as pod(Object 'Kind' is missing in 'null'), please check config file | ||
Nov 02 08:37:10.957277 ip-10-0-146-242 hyperkube[1469]: E1102 08:37:10.957257 1469 pod_workers.go:191] Error syncing pod 7d824c6d-957c-49e0-a1a9-0603f7f103c8 ("etcd-ip-10-0-146-242.us-east-2.compute.internal_openshift-etcd(7d824c6d-957c-49e0-a1a9-0603f7f103c8)"), skipping: pod "etcd-ip-10-0-146-242.us-east-2.compute.internal_openshift-etcd" is pending termination | ||
Nov 02 08:37:22.684508 ip-10-0-146-242 hyperkube[1469]: E1102 08:37:22.684470 1469 pod_workers.go:191] Error syncing pod 7d824c6d-957c-49e0-a1a9-0603f7f103c8 ("etcd-ip-10-0-146-242.us-east-2.compute.internal_openshift-etcd(7d824c6d-957c-49e0-a1a9-0603f7f103c8)"), skipping: pod "etcd-ip-10-0-146-242.us-east-2.compute.internal_openshift-etcd" is pending termination | ||
Nov 02 08:37:26.241228 ip-10-0-146-242 hyperkube[1469]: E1102 08:37:26.241205 1469 remote_runtime.go:332] ContainerStatus "6f9594ac20b40c6b9c46f589e13189060aa7d0c4d14b3f7fb9daa408dba15f2e" from runtime service failed: rpc error: code = NotFound desc = could not find container "6f9594ac20b40c6b9c46f589e13189060aa7d0c4d14b3f7fb9daa408dba15f2e": container with ID starting with 6f9594ac20b40c6b9c46f589e13189060aa7d0c4d14b3f7fb9daa408dba15f2e not found: ID does not exist | ||
Nov 02 08:37:36.667353 ip-10-0-146-242 hyperkube[1469]: E1102 08:37:36.667319 1469 pod_workers.go:191] Error syncing pod 7d824c6d-957c-49e0-a1a9-0603f7f103c8 ("etcd-ip-10-0-146-242.us-east-2.compute.internal_openshift-etcd(7d824c6d-957c-49e0-a1a9-0603f7f103c8)"), skipping: pod "etcd-ip-10-0-146-242.us-east-2.compute.internal_openshift-etcd" is pending termination | ||
Nov 02 08:40:48.815708 ip-10-0-146-242 hyperkube[1469]: E1102 08:40:48.815688 1469 remote_runtime.go:332] ContainerStatus "94291a4c5968794c67d091926be69736c494aee407a543a022c98281c9560bf8" from runtime service failed: rpc error: code = NotFound desc = could not find container "94291a4c5968794c67d091926be69736c494aee407a543a022c98281c9560bf8": container with ID starting with 94291a4c5968794c67d091926be69736c494aee407a543a022c98281c9560bf8 not found: ID does not exist | ||
Nov 02 08:40:48.815942 ip-10-0-146-242 hyperkube[1469]: E1102 08:40:48.815928 1469 remote_runtime.go:332] ContainerStatus "b1e4bd118537fd8ecece8a6cd48398218536aa0e8f790a6c0bf619fa534ecbba" from runtime service failed: rpc error: code = NotFound desc = could not find container "b1e4bd118537fd8ecece8a6cd48398218536aa0e8f790a6c0bf619fa534ecbba": container with ID starting with b1e4bd118537fd8ecece8a6cd48398218536aa0e8f790a6c0bf619fa534ecbba not found: ID does not exist | ||
Nov 02 08:41:11.665535 ip-10-0-146-242 hyperkube[1469]: E1102 08:41:11.665518 1469 pod_workers.go:191] Error syncing pod 7ea49e7c-7174-4dc9-a50c-f8590e72b089 ("kube-controller-manager-ip-10-0-146-242.us-east-2.compute.internal_openshift-kube-controller-manager(7ea49e7c-7174-4dc9-a50c-f8590e72b089)"), skipping: pod "kube-controller-manager-ip-10-0-146-242.us-east-2.compute.internal_openshift-kube-controller-manager" is pending termination | ||
Nov 02 08:41:27.111197 ip-10-0-146-242 hyperkube[1469]: E1102 08:41:27.111172 1469 kubelet_pods.go:403] hostname for pod:"kube-controller-manager-ip-10-0-146-242.us-east-2.compute.internal" was longer than 63. Truncated hostname to :"kube-controller-manager-ip-10-0-146-242.us-east-2.compute.inter" | ||
Nov 02 08:41:27.291685 ip-10-0-146-242 hyperkube[1469]: E1102 08:41:27.291665 1469 kubelet_pods.go:403] hostname for pod:"kube-controller-manager-ip-10-0-146-242.us-east-2.compute.internal" was longer than 63. Truncated hostname to :"kube-controller-manager-ip-10-0-146-242.us-east-2.compute.inter" | ||
Nov 02 08:41:27.506021 ip-10-0-146-242 hyperkube[1469]: E1102 08:41:27.505996 1469 kubelet_pods.go:403] hostname for pod:"kube-controller-manager-ip-10-0-146-242.us-east-2.compute.internal" was longer than 63. Truncated hostname to :"kube-controller-manager-ip-10-0-146-242.us-east-2.compute.inter" | ||
Nov 02 08:41:27.731323 ip-10-0-146-242 hyperkube[1469]: E1102 08:41:27.730935 1469 kubelet_pods.go:403] hostname for pod:"kube-controller-manager-ip-10-0-146-242.us-east-2.compute.internal" was longer than 63. Truncated hostname to :"kube-controller-manager-ip-10-0-146-242.us-east-2.compute.inter" | ||
Nov 02 08:45:52.956295 ip-10-0-146-242 hyperkube[1469]: E1102 08:45:52.956261 1469 pod_workers.go:191] Error syncing pod 863ade85-43b8-4efe-9976-36aec4713c9c ("kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_openshift-kube-apiserver(863ade85-43b8-4efe-9976-36aec4713c9c)"), skipping: pod "kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_openshift-kube-apiserver" is pending termination | ||
Nov 02 08:46:06.667363 ip-10-0-146-242 hyperkube[1469]: E1102 08:46:06.667328 1469 pod_workers.go:191] Error syncing pod 863ade85-43b8-4efe-9976-36aec4713c9c ("kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_openshift-kube-apiserver(863ade85-43b8-4efe-9976-36aec4713c9c)"), skipping: pod "kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_openshift-kube-apiserver" is pending termination | ||
Nov 02 08:46:19.667384 ip-10-0-146-242 hyperkube[1469]: E1102 08:46:19.667338 1469 pod_workers.go:191] Error syncing pod 863ade85-43b8-4efe-9976-36aec4713c9c ("kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_openshift-kube-apiserver(863ade85-43b8-4efe-9976-36aec4713c9c)"), skipping: pod "kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_openshift-kube-apiserver" is pending termination | ||
Nov 02 08:46:30.667338 ip-10-0-146-242 hyperkube[1469]: E1102 08:46:30.667306 1469 pod_workers.go:191] Error syncing pod 863ade85-43b8-4efe-9976-36aec4713c9c ("kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_openshift-kube-apiserver(863ade85-43b8-4efe-9976-36aec4713c9c)"), skipping: pod "kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_openshift-kube-apiserver" is pending termination | ||
Nov 02 08:46:43.667345 ip-10-0-146-242 hyperkube[1469]: E1102 08:46:43.667306 1469 pod_workers.go:191] Error syncing pod 863ade85-43b8-4efe-9976-36aec4713c9c ("kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_openshift-kube-apiserver(863ade85-43b8-4efe-9976-36aec4713c9c)"), skipping: pod "kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_openshift-kube-apiserver" is pending termination | ||
Nov 02 08:46:57.667367 ip-10-0-146-242 hyperkube[1469]: E1102 08:46:57.667332 1469 pod_workers.go:191] Error syncing pod 863ade85-43b8-4efe-9976-36aec4713c9c ("kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_openshift-kube-apiserver(863ade85-43b8-4efe-9976-36aec4713c9c)"), skipping: pod "kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_openshift-kube-apiserver" is pending termination | ||
Nov 02 08:47:05.373715 ip-10-0-146-242 hyperkube[1469]: E1102 08:47:05.373685 1469 fsHandler.go:114] failed to collect filesystem stats - rootDiskErr: <nil>, extraDiskErr: could not stat "/var/log/pods/openshift-kube-apiserver_kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_3091a174-6806-4b59-be71-0461dd6b4099/59950d3639402ea2f05db5d388e485226eb9d893f58c6fce16993d9e9cd4940a.log" to get inode usage: stat /var/log/pods/openshift-kube-apiserver_kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_3091a174-6806-4b59-be71-0461dd6b4099/59950d3639402ea2f05db5d388e485226eb9d893f58c6fce16993d9e9cd4940a.log: no such file or directory | ||
Nov 02 08:47:06.391308 ip-10-0-146-242 hyperkube[1469]: E1102 08:47:06.391274 1469 fsHandler.go:114] failed to collect filesystem stats - rootDiskErr: <nil>, extraDiskErr: could not stat "/var/log/pods/openshift-kube-apiserver_kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_3091a174-6806-4b59-be71-0461dd6b4099/kube-apiserver/0.log" to get inode usage: stat /var/log/pods/openshift-kube-apiserver_kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_3091a174-6806-4b59-be71-0461dd6b4099/kube-apiserver/0.log: no such file or directory | ||
Nov 02 08:47:10.667328 ip-10-0-146-242 hyperkube[1469]: E1102 08:47:10.667289 1469 pod_workers.go:191] Error syncing pod 863ade85-43b8-4efe-9976-36aec4713c9c ("kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_openshift-kube-apiserver(863ade85-43b8-4efe-9976-36aec4713c9c)"), skipping: pod "kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_openshift-kube-apiserver" is pending termination | ||
Nov 02 08:47:21.667334 ip-10-0-146-242 hyperkube[1469]: E1102 08:47:21.667296 1469 pod_workers.go:191] Error syncing pod 863ade85-43b8-4efe-9976-36aec4713c9c ("kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_openshift-kube-apiserver(863ade85-43b8-4efe-9976-36aec4713c9c)"), skipping: pod "kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_openshift-kube-apiserver" is pending termination | ||
Nov 02 08:47:33.667358 ip-10-0-146-242 hyperkube[1469]: E1102 08:47:33.667319 1469 pod_workers.go:191] Error syncing pod 863ade85-43b8-4efe-9976-36aec4713c9c ("kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_openshift-kube-apiserver(863ade85-43b8-4efe-9976-36aec4713c9c)"), skipping: pod "kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_openshift-kube-apiserver" is pending termination | ||
Nov 02 08:47:47.667381 ip-10-0-146-242 hyperkube[1469]: E1102 08:47:47.667338 1469 pod_workers.go:191] Error syncing pod 863ade85-43b8-4efe-9976-36aec4713c9c ("kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_openshift-kube-apiserver(863ade85-43b8-4efe-9976-36aec4713c9c)"), skipping: pod "kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_openshift-kube-apiserver" is pending termination | ||
Nov 02 08:47:47.800414 ip-10-0-146-242 hyperkube[1469]: E1102 08:47:47.800379 1469 kubelet_pods.go:403] hostname for pod:"openshift-kube-scheduler-ip-10-0-146-242.us-east-2.compute.internal" was longer than 63. Truncated hostname to :"openshift-kube-scheduler-ip-10-0-146-242.us-east-2.compute.inte" | ||
Nov 02 08:47:59.667358 ip-10-0-146-242 hyperkube[1469]: E1102 08:47:59.667321 1469 pod_workers.go:191] Error syncing pod 863ade85-43b8-4efe-9976-36aec4713c9c ("kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_openshift-kube-apiserver(863ade85-43b8-4efe-9976-36aec4713c9c)"), skipping: pod "kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_openshift-kube-apiserver" is pending termination | ||
Nov 02 08:48:03.195865 ip-10-0-146-242 hyperkube[1469]: E1102 08:48:03.195828 1469 remote_runtime.go:574] ReopenContainerLog "d0de199160b678db820e57918feb4ac7c49c65d24f30ea4de5e9c57dae15c8dc" from runtime service failed: rpc error: code = Unknown desc = container is not created or running | ||
Nov 02 08:48:03.196198 ip-10-0-146-242 hyperkube[1469]: E1102 08:48:03.195865 1469 container_log_manager.go:243] Container "d0de199160b678db820e57918feb4ac7c49c65d24f30ea4de5e9c57dae15c8dc" log "/var/log/pods/openshift-kube-apiserver_kube-apiserver-ip-10-0-146-242.us-east-2.compute.internal_3091a174-6806-4b59-be71-0461dd6b4099/kube-apiserver/0.log" doesn't exist, reopen container log failed: rpc error: code = Unknown desc = container is not created or running |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
package clusterconfig | ||
|
||
import ( | ||
"bufio" | ||
"compress/gzip" | ||
"context" | ||
"fmt" | ||
"strconv" | ||
|
||
"github.com/openshift/insights-operator/pkg/gatherers/common" | ||
"github.com/openshift/insights-operator/pkg/record" | ||
"github.com/openshift/insights-operator/pkg/utils/marshal" | ||
|
||
corev1 "k8s.io/api/core/v1" | ||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
"k8s.io/client-go/kubernetes" | ||
corev1client "k8s.io/client-go/kubernetes/typed/core/v1" | ||
"k8s.io/client-go/rest" | ||
"k8s.io/klog/v2" | ||
) | ||
|
||
// GatherNodeLogs fetches the node logs from journal unit | ||
// | ||
// Response see https://docs.openshift.com/container-platform/4.9/rest_api/node_apis/node-core-v1.html#apiv1nodesnameproxypath | ||
// | ||
// * Location in archive: config/nodes/logs/ | ||
// * See: docs/insights-archive-sample/config/nodes/logs | ||
// * Id in config: node_logs | ||
rluders marked this conversation as resolved.
Show resolved
Hide resolved
|
||
// * Since versions: | ||
// * 4.10+ | ||
func (g *Gatherer) GatherNodeLogs(ctx context.Context) ([]record.Record, []error) { | ||
clientSet, err := kubernetes.NewForConfig(g.gatherProtoKubeConfig) | ||
if err != nil { | ||
return nil, []error{err} | ||
} | ||
return gatherNodeLogs(ctx, clientSet.CoreV1()) | ||
} | ||
|
||
func gatherNodeLogs(ctx context.Context, client corev1client.CoreV1Interface) ([]record.Record, []error) { | ||
nodes, err := client.Nodes().List(ctx, metav1.ListOptions{LabelSelector: "node-role.kubernetes.io/master"}) | ||
if err != nil { | ||
return nil, []error{err} | ||
} | ||
return nodeLogRecords(ctx, client.RESTClient(), nodes) | ||
} | ||
|
||
// nodeLogRecords generate the records and errors list | ||
func nodeLogRecords(ctx context.Context, restClient rest.Interface, nodes *corev1.NodeList) ([]record.Record, []error) { | ||
var errs []error | ||
records := make([]record.Record, 0) | ||
|
||
for i := range nodes.Items { | ||
name := nodes.Items[i].Name | ||
uri := nodeLogResourceURI(restClient, name) | ||
req := requestNodeLog(restClient, uri, logNodeMaxTailLines, logNodeUnit) | ||
|
||
logString, err := nodeLogString(ctx, req) | ||
if err != nil { | ||
klog.V(2).Infof("Error: %q", err) | ||
errs = append(errs, err) | ||
} | ||
|
||
records = append(records, record.Record{ | ||
Name: fmt.Sprintf("config/node/logs/%s.log", name), | ||
Item: marshal.Raw{Str: logString}, | ||
}) | ||
} | ||
|
||
return records, errs | ||
} | ||
|
||
// nodeLogResourceURI creates the resource path URI to be fetched | ||
func nodeLogResourceURI(client rest.Interface, name string) string { | ||
return client.Get(). | ||
Name(name). | ||
Resource("nodes").SubResource("proxy", "logs"). | ||
Suffix("journal").URL().Path | ||
} | ||
|
||
// requestNodeLog creates the request to the API to retrieve the resource stream | ||
func requestNodeLog(client rest.Interface, uri string, tail int, unit string) *rest.Request { | ||
return client.Get().RequestURI(uri). | ||
SetHeader("Accept", "text/plain, */*"). | ||
SetHeader("Accept-Encoding", "gzip"). | ||
Param("tail", strconv.Itoa(tail)). | ||
Param("unit", unit) | ||
} | ||
|
||
// nodeLogString retrieve the data from the stream, decompress it (if necessary) and return the string | ||
func nodeLogString(ctx context.Context, req *rest.Request) (string, error) { | ||
in, err := req.Stream(ctx) | ||
if err != nil { | ||
return "", err | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This can probably lead to an empty file/record, but it's probably not a big deal. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess that It is possible. Don't you think that it should be handled by the recorder? I mean, the recorder should not record empty files. Right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably... |
||
} | ||
defer in.Close() | ||
|
||
r, err := gzip.NewReader(in) | ||
var scanner *bufio.Scanner | ||
if err != nil { | ||
scanner = bufio.NewScanner(in) | ||
} else { | ||
defer r.Close() | ||
scanner = bufio.NewScanner(r) | ||
} | ||
rluders marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
messagesToSearch := []string{ | ||
"E\\d{4} [0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2}", // Errors from log | ||
} | ||
return common.FilterLogFromScanner(scanner, messagesToSearch, true, func(lines []string) []string { | ||
if len(lines) > logNodeMaxLines { | ||
return lines[len(lines)-logNodeMaxLines:] | ||
} | ||
return lines | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we had a convention where we have 3 groups of imports: from the standard library, external packages, operator's code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that I would group imports like this:
but maybe we don't have any convention for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't...but it's true that it's grouped differently than here. I mean into 3 groups (as Serhii mentioned) where the last two somehow intersect. Isn't this part of
go fmt
or our linting??There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha! I can check if it's possible to add this validation to the lining. I think that the pre-commit should format the code before pushing it. I need to check why it is not happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. So, a quick investigation point that we have two different styles of coding format to import: check here and here. If we want to use the one by goimports I'll check how to do linting it and add it to the pre-commit script later. For now, the better approach is to configure the IDE to do it.