Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paramiko.rsakey.RSAKey object ERROR on 200 node spot instance cluster creation #78

Closed
engrean opened this issue Jan 29, 2016 · 7 comments

Comments

@engrean
Copy link
Contributor

engrean commented Jan 29, 2016

I checked out the latest code from last night (1/28/2016) and rebuilt everything.
I did the following:

 python3 -m venv venv
source venv/bin/activate
flintrock configure
##Changed to spot instance and 200 node cluster with availability zone defined
flintrock launch cluster-name

It looked it was about done and I got the following:
[IP_ADDDRESS_REPLACED] Installing Spark...
('IP2_ADDDRESS_REPLACED', <paramiko.rsakey.RSAKey object at 0x14f7f1f60>, <paramiko.rsakey.RSAKey object at 0x14db1fd68>)
Do you want to terminate the 201 instances created by this operation? [Y/n]: Y
Terminating instances...
  File "/Users/user2/wrk/flintrock/flintrock/flintrock.py", line 766, in main
    cli(obj={})
  File "/Users/user2/wrk/flintrock/venv/lib/python3.5/site-packages/click/core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "/Users/user2/wrk/flintrock/venv/lib/python3.5/site-packages/click/core.py", line 696, in main
    rv = self.invoke(ctx)
  File "/Users/user2/wrk/flintrock/venv/lib/python3.5/site-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/user2/wrk/flintrock/venv/lib/python3.5/site-packages/click/core.py", line 889, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/user2/wrk/flintrock/venv/lib/python3.5/site-packages/click/core.py", line 534, in invoke
    return callback(*args, **kwargs)
  File "/Users/user2/wrk/flintrock/venv/lib/python3.5/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/user2/wrk/flintrock/flintrock/flintrock.py", line 290, in launch
    instance_initiated_shutdown_behavior=ec2_instance_initiated_shutdown_behavior)
  File "/Users/user2/wrk/flintrock/flintrock/ec2.py", line 30, in wrapper
    res = func(*args, **kwargs)
  File "/Users/user2/wrk/flintrock/flintrock/ec2.py", line 517, in launch
    identity_file=identity_file)
  File "/Users/user2/wrk/flintrock/flintrock/core.py", line 393, in provision_cluster
    _run_asynchronously(partial_func=partial_func, hosts=hosts)
  File "/Users/user2/wrk/flintrock/flintrock/core.py", line 357, in _run_asynchronously
    loop.run_until_complete(asyncio.gather(*tasks))
  File "/Users/user2/.pyenv/versions/3.5.0/lib/python3.5/asyncio/base_events.py", line 342, in run_until_complete
    return future.result()
  File "/Users/user2/.pyenv/versions/3.5.0/lib/python3.5/asyncio/futures.py", line 274, in result
    raise self._exception
  File "/Users/user2/.pyenv/versions/3.5.0/lib/python3.5/concurrent/futures/thread.py", line 55, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/user2/wrk/flintrock/flintrock/core.py", line 446, in provision_node
    print_status=True)
  File "/Users/user2/wrk/flintrock/flintrock/ssh.py", line 63, in get_ssh_client
    timeout=3)
  File "/Users/user2/wrk/flintrock/venv/lib/python3.5/site-packages/paramiko/client.py", line 293, in connect
    raise BadHostKeyException(hostname, server_key, our_server_key)
('IP2_ADDDRESS_REPLACED', <paramiko.rsakey.RSAKey object at 0x14f7f1f60>, <paramiko.rsakey.RSAKey object at 0x14db1fd68>)

Responding Y to the terminate cluster prompt did end up terminating the cluster.

@nchammas
Copy link
Owner

Some questions:

  • Can you successfully launch a cluster with the same settings but only 1 slave? How about with 1 slave that's on-demand (as opposed to spot)?
  • What AMI, region, and user are you using? Post your config.yaml or command-line options and I'll try to reproduce the issue you are seeing. I've been launching large clusters this week without any issues, but then again I may only be using a certain configuration which doesn't hit the problems you're seeing.
  • Are you running Flintrock from Linux or OS X?

@engrean
Copy link
Contributor Author

engrean commented Feb 2, 2016

I am running OS X
I will try to launch a cluster with a single slave later.

Here's my config:
services:
spark:
version: 1.6.0
hdfs:
version: 2.7.1

provider: ec2

providers:
ec2:
key-name: key-name
identity-file: key-location
instance-type: m3.2xlarge
region: us-east-1
availability-zone: us-east-1c
ami: ami-60b6c60a # Amazon Linux, us-east-1
user: ec2-user
instance-profile-name: profile-name1
spot-price: 0.20
tenancy: default # default | dedicated
ebs-optimized: no # yes | no
instance-initiated-shutdown-behavior: terminate # terminate | stop

launch:
num-slaves: 200

install-hdfs: True

install-spark: False

@nchammas
Copy link
Owner

nchammas commented Feb 2, 2016

We have pretty much the same setup, minus the instance type. So I just launched a 2-node cluster with m3.2xlarge instances to be sure, and it ran without issue.

The error message you're seeing suggests something about SSH is borked. Are you able to just SSH into a new EC2 instance you launched, completely outside of Flintrock?

@engrean
Copy link
Contributor Author

engrean commented Feb 2, 2016

I cannot reproduce that exact error.
I am now getting the following error:
[Errno 24] Too many open files: 'PATH/flintrock/flintrock/scripts/setup-ephemeral-storage.py'

On my Mac, I tried setting sudo launchctl limit maxfiles 1000000 1000000
before I enter into the shell, by typing "source venv/bin/activate"

I am going to try it again while I am in the venv shell

@nchammas
Copy link
Owner

nchammas commented Feb 2, 2016

Yeah, if you installed Flintrock into a virtual environment, you need to always run it from within that virtual environment.

The "Too many open files" error smells like something specific to your system.

Let me know if you can pare these problems down to something small and specific, and I'll try again to reproduce it on my side.

@engrean
Copy link
Contributor Author

engrean commented Feb 2, 2016

Okay, I got it working. I am running Yosemite and it turns out the way file descriptor counts are changed is different. The following page helped me: http://blog.mact.me/2014/10/22/yosemite-upgrade-changes-open-file-limit

I wonder if the original issue was a symptom of this problem.

@nchammas
Copy link
Owner

nchammas commented Feb 2, 2016

Not sure if one issue was ultimately caused by other, but good to see you cleared things up.

If you re-run into your original issue, feel free to reopen with more detail to help me reproduce the issue on my side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants