Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"EOF" error on kinesis PutRecord, only if request repeated every 6-7s #206

Closed
mateusz opened this issue Apr 27, 2015 · 18 comments
Closed

"EOF" error on kinesis PutRecord, only if request repeated every 6-7s #206

mateusz opened this issue Apr 27, 2015 · 18 comments

Comments

@mateusz
Copy link
Contributor

mateusz commented Apr 27, 2015

I'm getting "EOF" errors (url.Error, Op=Post, URL=https://kinesis.ap-southeast-2.amazonaws.com/, Err="EOF") when doing repeated kinesis PutRecord requests. The data is not coming through.

This seems only to happen when I repeat the put requests every 6 or 7 seconds (with 7 seconds being more severe). I'm fairly certain it's not a throttling error (I'm the sole user of that stream, and I'm putting a single byte every 6-7 seconds), and the problem does not occur if I reduce the delay to 5s or less.

Also, when I started handling this error and retrying these EOF requests, I never had to retry more than once.

I have distilled the code on my side to allow to reliably replicate this problem from an EC2 instance in us-west-1, putting into a stream in ap-southeast-2 (I have not experimented with other regions). The authentication is done through ~/.aws/credentials, but I have also used STS and temporary creds and the behaviour does not change.

This error does not seem to happen when I make put-record requests using the aws cli tool from a bash script from the same EC2 box. I'm out of ideas :-) Help?

package main

import (
    "github.com/awslabs/aws-sdk-go/aws"
    "github.com/awslabs/aws-sdk-go/service/kinesis"
    "log"
    "time"
)

func main() {
    kinesisClient := kinesis.New(&aws.Config{
        Region: "ap-southeast-2",
    })

    for {
        _, err := kinesisClient.PutRecord(&kinesis.PutRecordInput{
            Data:         []byte("a"),
            PartitionKey: aws.String("1"),
            StreamName:   aws.String("some-stream"),
        })

        if err != nil {
            // Send has failed - drop the record and proceed.
            log.Printf("Sender failed to put record: %s, %v, %#v\n", err, err, err)
        } else {
            log.Printf("Sending successful.\n")
        }

        time.Sleep(6 * time.Second)
    }
}
@lsegal
Copy link
Contributor

lsegal commented Apr 27, 2015

It's possible this is related to #196 and #204 where we aren't correctly seeking back to the beginning of a payload when retrying a request. If either of those patches work for you, that would narrow it down.

@mateusz
Copy link
Contributor Author

mateusz commented Apr 28, 2015

Neither of these PRs affects it. On a hunch I tried putting the client creation into the for loop so that it's recreated every time, but the result is the same. Always something like this:

2015/04/28 02:49:32 Sending successful.
2015/04/28 02:49:38 Sender failed to put record: Post https://kinesis.ap-southeast-2.amazonaws.com/: EOF, Post https://kinesis.ap-southeast-2.amazonaws.com/: EOF, &url.Error{Op:"Post", URL:"https://kinesis.ap-southeast-2.amazonaws.com/", Err:(*errors.errorString)(0xc20800a0e0)}
2015/04/28 02:49:45 Sending successful.
2015/04/28 02:49:51 Sender failed to put record: Post https://kinesis.ap-southeast-2.amazonaws.com/: EOF, Post https://kinesis.ap-southeast-2.amazonaws.com/: EOF, &url.Error{Op:"Post", URL:"https://kinesis.ap-southeast-2.amazonaws.com/", Err:(*errors.errorString)(0xc20800a0e0)}
2015/04/28 02:49:58 Sending successful.
2015/04/28 02:50:04 Sender failed to put record: Post https://kinesis.ap-southeast-2.amazonaws.com/: EOF, Post https://kinesis.ap-southeast-2.amazonaws.com/: EOF, &url.Error{Op:"Post", URL:"https://kinesis.ap-southeast-2.amazonaws.com/", Err:(*errors.errorString)(0xc20800a0e0)}

@lsegal
Copy link
Contributor

lsegal commented Apr 28, 2015

@mateusz if you can provide detailed logging by setting the LogLevel config option to 1, that would help.

svc := kinesis.New(&aws.Config{LogLevel: 1})

@mateusz
Copy link
Contributor Author

mateusz commented Apr 28, 2015

Here you go.

=> [2015-04-28 03:55:56.738735534 +0000 UTC] kinesis.PutRecord(&{Data:[97] ExplicitHashKey:<nil> PartitionKey:0xc2081c9520 SequenceNumberForOrdering:<nil> StreamName:0xc2081c9530 metadataPutRecordInput:{SDKShapeTraits:false}})
---[ REQUEST PRE-SIGN ]------------------------------
POST / HTTP/1.1
Host: kinesis.ap-southeast-2.amazonaws.com
Content-Type: application/x-amz-json-1.1
User-Agent: aws-sdk-go/0.5.0
X-Amz-Target: Kinesis_20131202.PutRecord

{"Data":"YQ==","PartitionKey":"1","StreamName":"playpen-metrics"}
-----------------------------------------------------
---[ CANONICAL STRING  ]-----------------------------
POST
/

host:kinesis.ap-southeast-2.amazonaws.com
x-amz-date:20150428T035556Z
x-amz-target:Kinesis_20131202.PutRecord

host;x-amz-date;x-amz-target
<redacted>
---[ STRING TO SIGN ]--------------------------------
AWS4-HMAC-SHA256
20150428T035556Z
20150428/ap-southeast-2/kinesis/aws4_request
<redacted>
---[ SIGNED URL ]--------------------------------
https://kinesis.ap-southeast-2.amazonaws.com/
-----------------------------------------------------
---[ REQUEST POST-SIGN ]-----------------------------
POST / HTTP/1.1
Host: kinesis.ap-southeast-2.amazonaws.com
User-Agent: aws-sdk-go/0.5.0
Content-Length: 65
Authorization: AWS4-HMAC-SHA256 Credential=<redacted>/20150428/ap-southeast-2/kinesis/aws4_request, SignedHeaders=host;x-amz-date;x-amz-target, Signature=<redacted>
Content-Type: application/x-amz-json-1.1
X-Amz-Content-Sha256: <redacted>
X-Amz-Date: 20150428T035556Z
X-Amz-Target: Kinesis_20131202.PutRecord
Accept-Encoding: gzip

{"Data":"YQ==","PartitionKey":"1","StreamName":"<redacted>"}
-----------------------------------------------------
---[ RESPONSE ]--------------------------------------
Post https://kinesis.ap-southeast-2.amazonaws.com/: EOF
-----------------------------------------------------
2015/04/28 03:55:56 Sender failed to put record: Post https://kinesis.ap-southeast-2.amazonaws.com/: EOF, Post https://kinesis.ap-southeast-2.amazonaws.com/: EOF, &url.Error{Op:"Post", URL:"https://kinesis.ap-southeast-2.amazonaws.com/", Err:(*errors.errorString)(0xc20800a0e0)}

@mateusz
Copy link
Contributor Author

mateusz commented Apr 28, 2015

I was thinking that perhaps it has something to do with the subsequent request coinciding with a keep-alive connection being closed by the server on the other side, and the timing being just right?..

@lsegal
Copy link
Contributor

lsegal commented Apr 28, 2015

That looks like an error coming from Go's socket library code. It could indeed be a connection issue with the service. Are you using a proxy, by any chance?

@mateusz
Copy link
Contributor Author

mateusz commented Apr 28, 2015

I've tested buffering requests and sending them no more frequently than every 10s, and that did not result in any errors after 24 hours of doing requests. I'm not using any cache, it's pretty much a basic debian wheezy box (and aws cli tool doesn't seem to have that problem, so that'd rule out a proxy problem).

I've just retested this bit of go code on darwin/amd64 on my local to rule out EC2-related issues too - same problem.

Here is the bash script I used from that EC2 machine for doing the same thing with aws cli, which did not show this problem - tried both with sleep 6 and sleep 7:

#!/bin/bash

for i in {1..20}; do
    (aws kinesis put-record --stream-name playpen-metrics --data "a" --partition-key "1" --region ap-southeast-2) &
    sleep 6
done

wait

I guess we'd be looking for that magic timeout number of "about 5 or 6" somewhere in the aws sdk or go socket code? Do you know how to disable keep-alives on these aws sdk requests? Maybe force connection closure after every request just to see if that would fix it?

@lsegal
Copy link
Contributor

lsegal commented Apr 28, 2015

Do you know how to disable keep-alives on these aws sdk requests? Maybe force connection closure after every request just to see if that would fix it?

You can pass in your own http.Client as Config.HTTPClient and disable it that way.

Just to clarify, are you using a proxy?

@mateusz
Copy link
Contributor Author

mateusz commented Apr 28, 2015

Sorry, that was a typo. Instead of proxy I said cache - no, I'm not using any proxy.

@lsegal
Copy link
Contributor

lsegal commented Apr 30, 2015

@mateusz have you looked into disabling KeepAlive? Did this help at all?

@lsegal lsegal added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. review needed labels Apr 30, 2015
@mateusz
Copy link
Contributor Author

mateusz commented May 1, 2015

Heh... disabling keep-alive seems to be fixing it - no failures after 30 sends. Here is the test code. Removing line 21 from this gist reintroduces the issue - fails within the first few sends again.

I might be completely misled, but here is another guess: according to this it seems one needs to close the response Body when done with it. I have added some debug output to all occurrences of "Body.Close" in this SDK, and none seems to have been hit.

Do you think it could possibly be missing Body.Close?

@jasdel
Copy link
Contributor

jasdel commented May 1, 2015

Taking a look at the code it looks like Kinesis uses the jsonrpc Unmarshal for reading the response body from the service serializing it into the output structure, but does not close the response body reader. Though I don't think this is the cause of the problem you are seeing. Locally I added the Body.Close to the jsonrpc used by Kensis, but was still able to reproduce the issue you are seeing with your sample code after the second request.

Changing the delay between send to 3s and 9s I no longer received any failures. We need to do more investigation, but I think you might be correct in that this is a situation where the connection is being terminated by the server or client at the perfectly wrong time.

@mateusz
Copy link
Contributor Author

mateusz commented May 3, 2015

👍 Thanks for confirming the issue - happy to hear it's not my specific set-up going crazy.

@jasdel jasdel removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label May 5, 2015
@jasdel
Copy link
Contributor

jasdel commented May 6, 2015

I verified this is a Go net/http Client or Transport issue. Gist: readTimeoutExample.go reproduces the issue outside of the SDK. Where Client will not open a new connection if Client doesn't know the server closed its pooled connection prior to Transport.RoundTrip() writing the request to the pooled connection.

I'll submit this issues upstream to the Go repo. In the meantime I suggest disabling keep-alive, or alternatively add retry logic for requests which fail with the EOF error.

// minimalist Client options to disable keep-alive.
client := &http.Client{
    Transport: &http.Transport{
        DisableKeepAlives:   true,
    }
}

@jasdel jasdel added third-party This issue is related to third-party libraries or applications. and removed review needed labels May 6, 2015
@jasdel
Copy link
Contributor

jasdel commented May 6, 2015

Cool, looks like the issue is already reported golang/go#8946 and tagged for Go 1.5Maybe. Though the current proposed change is limited to HEAD/GET methods at the moment because they are idempotent.

@mateusz
Copy link
Contributor Author

mateusz commented May 6, 2015

Thanks, we'll wait until it's fixed upstream then.

@danp
Copy link
Contributor

danp commented Jun 3, 2015

Maybe fixed by #227?

@jasdel
Copy link
Contributor

jasdel commented Jun 3, 2015

You're correct @dpiddy that issue did allow retries to work correctly. Going to go ahead and close this issue. Thanks for the heads up.

@jasdel jasdel closed this as completed Jun 3, 2015
@diehlaws diehlaws added 3rd-party and removed third-party This issue is related to third-party libraries or applications. labels Jan 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants