"EOF" error on kinesis PutRecord, only if request repeated every 6-7s #206

mateusz · 2015-04-27T21:01:51Z

I'm getting "EOF" errors (url.Error, Op=Post, URL=https://kinesis.ap-southeast-2.amazonaws.com/, Err="EOF") when doing repeated kinesis PutRecord requests. The data is not coming through.

This seems only to happen when I repeat the put requests every 6 or 7 seconds (with 7 seconds being more severe). I'm fairly certain it's not a throttling error (I'm the sole user of that stream, and I'm putting a single byte every 6-7 seconds), and the problem does not occur if I reduce the delay to 5s or less.

Also, when I started handling this error and retrying these EOF requests, I never had to retry more than once.

I have distilled the code on my side to allow to reliably replicate this problem from an EC2 instance in us-west-1, putting into a stream in ap-southeast-2 (I have not experimented with other regions). The authentication is done through ~/.aws/credentials, but I have also used STS and temporary creds and the behaviour does not change.

This error does not seem to happen when I make put-record requests using the aws cli tool from a bash script from the same EC2 box. I'm out of ideas :-) Help?

package main

import (
    "github.com/awslabs/aws-sdk-go/aws"
    "github.com/awslabs/aws-sdk-go/service/kinesis"
    "log"
    "time"
)

func main() {
    kinesisClient := kinesis.New(&aws.Config{
        Region: "ap-southeast-2",
    })

    for {
        _, err := kinesisClient.PutRecord(&kinesis.PutRecordInput{
            Data:         []byte("a"),
            PartitionKey: aws.String("1"),
            StreamName:   aws.String("some-stream"),
        })

        if err != nil {
            // Send has failed - drop the record and proceed.
            log.Printf("Sender failed to put record: %s, %v, %#v\n", err, err, err)
        } else {
            log.Printf("Sending successful.\n")
        }

        time.Sleep(6 * time.Second)
    }
}

lsegal · 2015-04-27T21:09:26Z

It's possible this is related to #196 and #204 where we aren't correctly seeking back to the beginning of a payload when retrying a request. If either of those patches work for you, that would narrow it down.

mateusz · 2015-04-28T02:53:58Z

Neither of these PRs affects it. On a hunch I tried putting the client creation into the for loop so that it's recreated every time, but the result is the same. Always something like this:

2015/04/28 02:49:32 Sending successful.
2015/04/28 02:49:38 Sender failed to put record: Post https://kinesis.ap-southeast-2.amazonaws.com/: EOF, Post https://kinesis.ap-southeast-2.amazonaws.com/: EOF, &url.Error{Op:"Post", URL:"https://kinesis.ap-southeast-2.amazonaws.com/", Err:(*errors.errorString)(0xc20800a0e0)}
2015/04/28 02:49:45 Sending successful.
2015/04/28 02:49:51 Sender failed to put record: Post https://kinesis.ap-southeast-2.amazonaws.com/: EOF, Post https://kinesis.ap-southeast-2.amazonaws.com/: EOF, &url.Error{Op:"Post", URL:"https://kinesis.ap-southeast-2.amazonaws.com/", Err:(*errors.errorString)(0xc20800a0e0)}
2015/04/28 02:49:58 Sending successful.
2015/04/28 02:50:04 Sender failed to put record: Post https://kinesis.ap-southeast-2.amazonaws.com/: EOF, Post https://kinesis.ap-southeast-2.amazonaws.com/: EOF, &url.Error{Op:"Post", URL:"https://kinesis.ap-southeast-2.amazonaws.com/", Err:(*errors.errorString)(0xc20800a0e0)}

lsegal · 2015-04-28T03:11:43Z

@mateusz if you can provide detailed logging by setting the LogLevel config option to 1, that would help.

svc := kinesis.New(&aws.Config{LogLevel: 1})

mateusz · 2015-04-28T03:59:06Z

Here you go.

=> [2015-04-28 03:55:56.738735534 +0000 UTC] kinesis.PutRecord(&{Data:[97] ExplicitHashKey:<nil> PartitionKey:0xc2081c9520 SequenceNumberForOrdering:<nil> StreamName:0xc2081c9530 metadataPutRecordInput:{SDKShapeTraits:false}})
---[ REQUEST PRE-SIGN ]------------------------------
POST / HTTP/1.1
Host: kinesis.ap-southeast-2.amazonaws.com
Content-Type: application/x-amz-json-1.1
User-Agent: aws-sdk-go/0.5.0
X-Amz-Target: Kinesis_20131202.PutRecord

{"Data":"YQ==","PartitionKey":"1","StreamName":"playpen-metrics"}
-----------------------------------------------------
---[ CANONICAL STRING  ]-----------------------------
POST
/

host:kinesis.ap-southeast-2.amazonaws.com
x-amz-date:20150428T035556Z
x-amz-target:Kinesis_20131202.PutRecord

host;x-amz-date;x-amz-target
<redacted>
---[ STRING TO SIGN ]--------------------------------
AWS4-HMAC-SHA256
20150428T035556Z
20150428/ap-southeast-2/kinesis/aws4_request
<redacted>
---[ SIGNED URL ]--------------------------------
https://kinesis.ap-southeast-2.amazonaws.com/
-----------------------------------------------------
---[ REQUEST POST-SIGN ]-----------------------------
POST / HTTP/1.1
Host: kinesis.ap-southeast-2.amazonaws.com
User-Agent: aws-sdk-go/0.5.0
Content-Length: 65
Authorization: AWS4-HMAC-SHA256 Credential=<redacted>/20150428/ap-southeast-2/kinesis/aws4_request, SignedHeaders=host;x-amz-date;x-amz-target, Signature=<redacted>
Content-Type: application/x-amz-json-1.1
X-Amz-Content-Sha256: <redacted>
X-Amz-Date: 20150428T035556Z
X-Amz-Target: Kinesis_20131202.PutRecord
Accept-Encoding: gzip

{"Data":"YQ==","PartitionKey":"1","StreamName":"<redacted>"}
-----------------------------------------------------
---[ RESPONSE ]--------------------------------------
Post https://kinesis.ap-southeast-2.amazonaws.com/: EOF
-----------------------------------------------------
2015/04/28 03:55:56 Sender failed to put record: Post https://kinesis.ap-southeast-2.amazonaws.com/: EOF, Post https://kinesis.ap-southeast-2.amazonaws.com/: EOF, &url.Error{Op:"Post", URL:"https://kinesis.ap-southeast-2.amazonaws.com/", Err:(*errors.errorString)(0xc20800a0e0)}

mateusz · 2015-04-28T04:01:14Z

I was thinking that perhaps it has something to do with the subsequent request coinciding with a keep-alive connection being closed by the server on the other side, and the timing being just right?..

lsegal · 2015-04-28T04:14:38Z

That looks like an error coming from Go's socket library code. It could indeed be a connection issue with the service. Are you using a proxy, by any chance?

mateusz · 2015-04-28T06:32:25Z

I've tested buffering requests and sending them no more frequently than every 10s, and that did not result in any errors after 24 hours of doing requests. I'm not using any cache, it's pretty much a basic debian wheezy box (and aws cli tool doesn't seem to have that problem, so that'd rule out a proxy problem).

I've just retested this bit of go code on darwin/amd64 on my local to rule out EC2-related issues too - same problem.

Here is the bash script I used from that EC2 machine for doing the same thing with aws cli, which did not show this problem - tried both with sleep 6 and sleep 7:

#!/bin/bash

for i in {1..20}; do
    (aws kinesis put-record --stream-name playpen-metrics --data "a" --partition-key "1" --region ap-southeast-2) &
    sleep 6
done

wait

I guess we'd be looking for that magic timeout number of "about 5 or 6" somewhere in the aws sdk or go socket code? Do you know how to disable keep-alives on these aws sdk requests? Maybe force connection closure after every request just to see if that would fix it?

lsegal · 2015-04-28T06:59:14Z

Do you know how to disable keep-alives on these aws sdk requests? Maybe force connection closure after every request just to see if that would fix it?

You can pass in your own http.Client as Config.HTTPClient and disable it that way.

Just to clarify, are you using a proxy?

mateusz · 2015-04-28T08:39:12Z

Sorry, that was a typo. Instead of proxy I said cache - no, I'm not using any proxy.

lsegal · 2015-04-30T06:57:54Z

@mateusz have you looked into disabling KeepAlive? Did this help at all?

mateusz · 2015-05-01T04:16:02Z

Heh... disabling keep-alive seems to be fixing it - no failures after 30 sends. Here is the test code. Removing line 21 from this gist reintroduces the issue - fails within the first few sends again.

I might be completely misled, but here is another guess: according to this it seems one needs to close the response Body when done with it. I have added some debug output to all occurrences of "Body.Close" in this SDK, and none seems to have been hit.

Do you think it could possibly be missing Body.Close?

jasdel · 2015-05-01T20:39:39Z

Taking a look at the code it looks like Kinesis uses the jsonrpc Unmarshal for reading the response body from the service serializing it into the output structure, but does not close the response body reader. Though I don't think this is the cause of the problem you are seeing. Locally I added the Body.Close to the jsonrpc used by Kensis, but was still able to reproduce the issue you are seeing with your sample code after the second request.

Changing the delay between send to 3s and 9s I no longer received any failures. We need to do more investigation, but I think you might be correct in that this is a situation where the connection is being terminated by the server or client at the perfectly wrong time.

mateusz · 2015-05-03T20:51:58Z

👍 Thanks for confirming the issue - happy to hear it's not my specific set-up going crazy.

jasdel · 2015-05-06T07:17:08Z

I verified this is a Go net/http Client or Transport issue. Gist: readTimeoutExample.go reproduces the issue outside of the SDK. Where Client will not open a new connection if Client doesn't know the server closed its pooled connection prior to Transport.RoundTrip() writing the request to the pooled connection.

I'll submit this issues upstream to the Go repo. In the meantime I suggest disabling keep-alive, or alternatively add retry logic for requests which fail with the EOF error.

// minimalist Client options to disable keep-alive.
client := &http.Client{
    Transport: &http.Transport{
        DisableKeepAlives:   true,
    }
}

jasdel · 2015-05-06T18:25:57Z

Cool, looks like the issue is already reported golang/go#8946 and tagged for Go 1.5Maybe. Though the current proposed change is limited to HEAD/GET methods at the moment because they are idempotent.

mateusz · 2015-05-06T20:47:21Z

Thanks, we'll wait until it's fixed upstream then.

danp · 2015-06-03T18:02:45Z

Maybe fixed by #227?

jasdel · 2015-06-03T19:13:08Z

You're correct @dpiddy that issue did allow retries to work correctly. Going to go ahead and close this issue. Thanks for the heads up.

mateusz mentioned this issue Apr 27, 2015

"EOF" error on kinesis PutRecord, only if request repeated every 6-7s aws/aws-sdk-php#559

Closed

lsegal added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. review needed labels Apr 30, 2015

jasdel removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label May 5, 2015

jasdel added third-party This issue is related to third-party libraries or applications. and removed review needed labels May 6, 2015

jasdel closed this as completed Jun 3, 2015

fjordan mentioned this issue Jul 15, 2016

Disable keep alives to recover from EOF responses rewardStyle/go-kinesis#4

Merged

diehlaws added 3rd-party and removed third-party This issue is related to third-party libraries or applications. labels Jan 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"EOF" error on kinesis PutRecord, only if request repeated every 6-7s #206

"EOF" error on kinesis PutRecord, only if request repeated every 6-7s #206

mateusz commented Apr 27, 2015

lsegal commented Apr 27, 2015

mateusz commented Apr 28, 2015

lsegal commented Apr 28, 2015

mateusz commented Apr 28, 2015

mateusz commented Apr 28, 2015

lsegal commented Apr 28, 2015

mateusz commented Apr 28, 2015

lsegal commented Apr 28, 2015

mateusz commented Apr 28, 2015

lsegal commented Apr 30, 2015

mateusz commented May 1, 2015

jasdel commented May 1, 2015

mateusz commented May 3, 2015

jasdel commented May 6, 2015

jasdel commented May 6, 2015

mateusz commented May 6, 2015

danp commented Jun 3, 2015

jasdel commented Jun 3, 2015

"EOF" error on kinesis PutRecord, only if request repeated every 6-7s #206

"EOF" error on kinesis PutRecord, only if request repeated every 6-7s #206

Comments

mateusz commented Apr 27, 2015

lsegal commented Apr 27, 2015

mateusz commented Apr 28, 2015

lsegal commented Apr 28, 2015

mateusz commented Apr 28, 2015

mateusz commented Apr 28, 2015

lsegal commented Apr 28, 2015

mateusz commented Apr 28, 2015

lsegal commented Apr 28, 2015

mateusz commented Apr 28, 2015

lsegal commented Apr 30, 2015

mateusz commented May 1, 2015

jasdel commented May 1, 2015

mateusz commented May 3, 2015

jasdel commented May 6, 2015

jasdel commented May 6, 2015

mateusz commented May 6, 2015

danp commented Jun 3, 2015

jasdel commented Jun 3, 2015