paramiko.rsakey.RSAKey object ERROR on 200 node spot instance cluster creation #78

engrean · 2016-01-29T21:12:10Z

I checked out the latest code from last night (1/28/2016) and rebuilt everything.
I did the following:

 python3 -m venv venv
source venv/bin/activate
flintrock configure
##Changed to spot instance and 200 node cluster with availability zone defined
flintrock launch cluster-name

It looked it was about done and I got the following:
[IP_ADDDRESS_REPLACED] Installing Spark...
('IP2_ADDDRESS_REPLACED', <paramiko.rsakey.RSAKey object at 0x14f7f1f60>, <paramiko.rsakey.RSAKey object at 0x14db1fd68>)
Do you want to terminate the 201 instances created by this operation? [Y/n]: Y
Terminating instances...
  File "/Users/user2/wrk/flintrock/flintrock/flintrock.py", line 766, in main
    cli(obj={})
  File "/Users/user2/wrk/flintrock/venv/lib/python3.5/site-packages/click/core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "/Users/user2/wrk/flintrock/venv/lib/python3.5/site-packages/click/core.py", line 696, in main
    rv = self.invoke(ctx)
  File "/Users/user2/wrk/flintrock/venv/lib/python3.5/site-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/user2/wrk/flintrock/venv/lib/python3.5/site-packages/click/core.py", line 889, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/user2/wrk/flintrock/venv/lib/python3.5/site-packages/click/core.py", line 534, in invoke
    return callback(*args, **kwargs)
  File "/Users/user2/wrk/flintrock/venv/lib/python3.5/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/user2/wrk/flintrock/flintrock/flintrock.py", line 290, in launch
    instance_initiated_shutdown_behavior=ec2_instance_initiated_shutdown_behavior)
  File "/Users/user2/wrk/flintrock/flintrock/ec2.py", line 30, in wrapper
    res = func(*args, **kwargs)
  File "/Users/user2/wrk/flintrock/flintrock/ec2.py", line 517, in launch
    identity_file=identity_file)
  File "/Users/user2/wrk/flintrock/flintrock/core.py", line 393, in provision_cluster
    _run_asynchronously(partial_func=partial_func, hosts=hosts)
  File "/Users/user2/wrk/flintrock/flintrock/core.py", line 357, in _run_asynchronously
    loop.run_until_complete(asyncio.gather(*tasks))
  File "/Users/user2/.pyenv/versions/3.5.0/lib/python3.5/asyncio/base_events.py", line 342, in run_until_complete
    return future.result()
  File "/Users/user2/.pyenv/versions/3.5.0/lib/python3.5/asyncio/futures.py", line 274, in result
    raise self._exception
  File "/Users/user2/.pyenv/versions/3.5.0/lib/python3.5/concurrent/futures/thread.py", line 55, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/user2/wrk/flintrock/flintrock/core.py", line 446, in provision_node
    print_status=True)
  File "/Users/user2/wrk/flintrock/flintrock/ssh.py", line 63, in get_ssh_client
    timeout=3)
  File "/Users/user2/wrk/flintrock/venv/lib/python3.5/site-packages/paramiko/client.py", line 293, in connect
    raise BadHostKeyException(hostname, server_key, our_server_key)
('IP2_ADDDRESS_REPLACED', <paramiko.rsakey.RSAKey object at 0x14f7f1f60>, <paramiko.rsakey.RSAKey object at 0x14db1fd68>)

Responding Y to the terminate cluster prompt did end up terminating the cluster.

The text was updated successfully, but these errors were encountered:

nchammas · 2016-01-29T21:24:15Z

Some questions:

Can you successfully launch a cluster with the same settings but only 1 slave? How about with 1 slave that's on-demand (as opposed to spot)?
What AMI, region, and user are you using? Post your config.yaml or command-line options and I'll try to reproduce the issue you are seeing. I've been launching large clusters this week without any issues, but then again I may only be using a certain configuration which doesn't hit the problems you're seeing.
Are you running Flintrock from Linux or OS X?

engrean · 2016-02-02T00:57:44Z

I am running OS X
I will try to launch a cluster with a single slave later.

Here's my config:
services:
spark:
version: 1.6.0
hdfs:
version: 2.7.1

provider: ec2

providers:
ec2:
key-name: key-name
identity-file: key-location
instance-type: m3.2xlarge
region: us-east-1
availability-zone: us-east-1c
ami: ami-60b6c60a # Amazon Linux, us-east-1
user: ec2-user
instance-profile-name: profile-name1
spot-price: 0.20
tenancy: default # default | dedicated
ebs-optimized: no # yes | no
instance-initiated-shutdown-behavior: terminate # terminate | stop

launch:
num-slaves: 200

install-hdfs: True

install-spark: False

nchammas · 2016-02-02T13:51:47Z

We have pretty much the same setup, minus the instance type. So I just launched a 2-node cluster with m3.2xlarge instances to be sure, and it ran without issue.

The error message you're seeing suggests something about SSH is borked. Are you able to just SSH into a new EC2 instance you launched, completely outside of Flintrock?

engrean · 2016-02-02T16:53:57Z

I cannot reproduce that exact error.
I am now getting the following error:
[Errno 24] Too many open files: 'PATH/flintrock/flintrock/scripts/setup-ephemeral-storage.py'

On my Mac, I tried setting sudo launchctl limit maxfiles 1000000 1000000
before I enter into the shell, by typing "source venv/bin/activate"

I am going to try it again while I am in the venv shell

nchammas · 2016-02-02T17:20:50Z

Yeah, if you installed Flintrock into a virtual environment, you need to always run it from within that virtual environment.

The "Too many open files" error smells like something specific to your system.

Let me know if you can pare these problems down to something small and specific, and I'll try again to reproduce it on my side.

engrean · 2016-02-02T17:49:27Z

Okay, I got it working. I am running Yosemite and it turns out the way file descriptor counts are changed is different. The following page helped me: http://blog.mact.me/2014/10/22/yosemite-upgrade-changes-open-file-limit

I wonder if the original issue was a symptom of this problem.

nchammas · 2016-02-02T18:20:38Z

Not sure if one issue was ultimately caused by other, but good to see you cleared things up.

If you re-run into your original issue, feel free to reopen with more detail to help me reproduce the issue on my side.

nchammas closed this as completed Feb 2, 2016

dhulse mentioned this issue Feb 10, 2016

OSX Yosemite 200 node cluster throws paramiko.rsakey.RSAKey object ERROR #81

Closed

nchammas mentioned this issue Mar 28, 2017

Flintrock doesn't support more than 200 instances #193

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

paramiko.rsakey.RSAKey object ERROR on 200 node spot instance cluster creation #78

paramiko.rsakey.RSAKey object ERROR on 200 node spot instance cluster creation #78

engrean commented Jan 29, 2016

nchammas commented Jan 29, 2016

engrean commented Feb 2, 2016

nchammas commented Feb 2, 2016

engrean commented Feb 2, 2016

nchammas commented Feb 2, 2016

engrean commented Feb 2, 2016

nchammas commented Feb 2, 2016

paramiko.rsakey.RSAKey object ERROR on 200 node spot instance cluster creation #78

paramiko.rsakey.RSAKey object ERROR on 200 node spot instance cluster creation #78

Comments

engrean commented Jan 29, 2016

nchammas commented Jan 29, 2016

engrean commented Feb 2, 2016

install-hdfs: True

install-spark: False

nchammas commented Feb 2, 2016

engrean commented Feb 2, 2016

nchammas commented Feb 2, 2016

engrean commented Feb 2, 2016

nchammas commented Feb 2, 2016