I will have one more try to use PySpark with Yarn.
spark-submit –master yarn test.py
vs
spark-submit –master local test.py
If I submit with yarn but specify local within the code, it works fine.
If I submit with master=local and specify local with the code it works fine.
If I submit master=yarn and yarn in code, it failed, this time with an out of space error:
diagnostics: Application application_1510537474959_0005 failed 2 times due to AM Container for appattempt_1510537474959_0005_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: [2017-11-24 20:02:13.899]No space left on device
java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at org.apache.hadoop.fs.FileUtil.unZip(FileUtil.java:608)
at org.apache.hadoop.yarn.util.FSDownload.unpack(FSDownload.java:279)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:364)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:235)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerL
Now if I run with local as master and yarn in code, I also get the following info about name node.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot delete /user/steve/.sparkStaging/application_1510537474959_0006. Name node is in safe mode.
Resources are low on NN. Please add or free up more resourcesthen turn off safe mode manually. NOTE: If you turn off safe mode before adding resources, the NN will immediately return to safe mode. Use “hdfs dfsadmin -safemode leave” to turn safe mode off. NamenodeHostName:localhost
But loca/local still works…
Now I run with spark in submit command (local still in code) but this time specify executor memory and number of executors and it works.
