Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAPI can incorrectly report Diego Actual LRP state #4220

Open
Samze opened this issue Feb 14, 2025 · 0 comments
Open

CAPI can incorrectly report Diego Actual LRP state #4220

Samze opened this issue Feb 14, 2025 · 0 comments
Assignees

Comments

@Samze
Copy link
Contributor

Samze commented Feb 14, 2025

In some circumstances CAPI will report an app instance is running when it is down.

Reproduction

Steps to reproduce:

  1. Have a cf app with 4 running instances
  2. On the Diego brain I see with cfdot I have 4 actual_lrps
  3. I kill the diego cell VM bosh delete-vm
  4. On the Diego brain with cfdot that now have 8 actual_lrp instances (4 running, 4 unclaimed). Each app instance has two entries, one running one down.
  5. There continue to be duplicate entries until Diego is restored.

CAPI iterates over all actual_lrps returned from Diego (in this case 8) and uses the app index as the key, so in the case CAPI will override each app instance information once and the state shown will be determined by the order of the actual lrp instances. See https://github.com/cloudfoundry/cloud_controller_ng/blob/main/lib/cloud_controller/diego/reporters/instances_stats_reporter.rb#L48-L56

Example of a duplicate entry from cfdot actual-lrps. Note the process_guid and index are the same.

{
  "process_guid": "57a8e43b-81f9-46e9-9f78-81e15bbfd231-de7f7844-156e-4fc7-9f21-db5d072fb0b7",
  "index": 3,
  "domain": "cf-apps",
  "instance_guid": "",
  "cell_id": "",
  "address": "",
  "ports": null,
  "preferred_address": "UNKNOWN",
  "crash_count": 0,
  "state": "UNCLAIMED",
  "placement_error": "unable to communicate to compatible cells",
  "since": 1739568280529021112,
  "modification_tag": {
    "epoch": "780635af-9208-4d5e-5a08-ea49ebcb3f95",
    "index": 5758
  },
  "presence": "ORDINARY",
  "OptionalRoutable": {
    "routable": false
  },
  "availability_zone": ""
}
{
  "process_guid": "57a8e43b-81f9-46e9-9f78-81e15bbfd231-de7f7844-156e-4fc7-9f21-db5d072fb0b7",
  "index": 3,
  "domain": "cf-apps",
  "instance_guid": "1f3ffac3-be77-45e0-5075-7357",
  "cell_id": "23b06662-20e7-42dd-9377-6d8f10190ec4",
  "address": "10.0.4.17",
  "ports": [
    {
      "container_port": 8080,
      "host_port": 61012,
      "container_tls_proxy_port": 61001,
      "host_tls_proxy_port": 61014
    },
    {
      "container_port": 8080,
      "host_port": 61012,
      "container_tls_proxy_port": 61443,
      "host_tls_proxy_port": 0
    },
    {
      "container_port": 2222,
      "host_port": 61013,
      "container_tls_proxy_port": 61002,
      "host_tls_proxy_port": 61015
    }
  ],
  "instance_address": "10.255.233.24",
  "preferred_address": "HOST",
  "crash_count": 0,
  "state": "RUNNING",
  "since": 1739222044495241579,
  "modification_tag": {
    "epoch": "4a424a13-b5ba-47b7-771a-1a61d99c2524",
    "index": 2
  },
  "presence": "SUSPECT",
  "metric_tags": {
    "app_id": "57a8e43b-81f9-46e9-9f78-81e15bbfd231",
    "app_name": "static",
    "instance_id": "3",
    "organization_id": "c877a084-d65b-4758-9908-90201c6df339",
    "organization_name": "org-1",
    "process_id": "57a8e43b-81f9-46e9-9f78-81e15bbfd231",
    "process_instance_id": "1f3ffac3-be77-45e0-5075-7357",
    "process_type": "web",
    "source_id": "57a8e43b-81f9-46e9-9f78-81e15bbfd231",
    "space_id": "b248d5ab-2948-468b-ad0f-7b1b90e923d1",
    "space_name": "space-1"
  },
  "OptionalRoutable": {
    "routable": true
  },
  "availability_zone": "us-central1-f"
}

Fix

In the case of duplicates, CAPI should look at since of the actual_lrp information and take the latest definition.

@Samze Samze self-assigned this Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant