No Space Left On Device Error

I submitted an Apache Spark application to an Amazon EMR cluster. The application fails with a 'no space left on device' stage failure like this:

Ubuntu No Space Left On Device
Linux No Space Left
No Space Left On Device Error Detected

2) File upload to an asset failing with this error: 2013-02-18 17:48:19.350CSERROR Exception moving data from input stream to output stream java.io.IOException: No space left on device. This error is triggered if there is no space left in the directory which contains the Confluence Home folder. Usually happens in large instances with a high number of pages being created. So, you spend hours debugging, as we did some time ago, just to find out the “No Space Left On Device” error? It doesn’t matter what you do; the error keeps to pop out. If your disk is really full, then it is an easy problem to solve. Just clean it up. Getting 'No space left on device' error, new images not pulling as a result. Archived Forums. No space left on device 2017-03-24T02:58675Z groupadd. Checking for Lack of Free Semaphores Defect The second cause for the 'No space left on device' error is a defect which is resolved in NPS 4.6 and later. In previous releases, a failed nzload job would not release its memory semaphores. If a system stayed up long enough, it could run out of free semaphores and generate this error.

Short Description

Spark uses local disks on the core and task nodes to store intermediate data. If the disks run out of space, the job fails with a 'no space left on device' error. Use one of the following methods to resolve this error:

Add more Amazon Elastic Block Store (Amazon EBS) capacity.
Add more Spark partitions.
Use a bootstrap action to dynamically scale up storage on the core and task nodes. For more information and a recommended bootstrap action script, see Dynamically scale up storage on Amazon EMR clusters.

Resolution

Add more EBS capacity

For new clusters: use larger EBS volumes

Launch an Amazon EMR cluster and choose an Amazon Elastic Compute Cloud (Amazon EC2) instance type with larger EBS volumes. For more information about the amount of storage and number of volumes allocated for each instance type, see Default EBS Storage for Instances.

For running clusters: add more EBS volumes

1. If larger EBS volumes don't resolve the problem, attach more EBS volumes to the core and task nodes.

2. Format and mount the attached volumes. Be sure to use the correct disk number (for example, /mnt1 or /mnt2 instead of /data).

3. Connect to the node using SSH.

4. Create a /mnt2/yarn directory, and thenset ownership of the directory to the YARN user:

5. Add the /mnt2/yarn directory inside the yarn.nodemanager.local-dirs property of /etc/hadoop/conf/yarn-site.xml. Example:

6. Restart the NodeManager service:

Add more Spark partitions

Depending on how many core and task nodes are in the cluster, consider increasing the number of Spark partitions. Use the following Scala code to add more Spark partitions:

Related Information

How can I troubleshoot stage failures in Spark jobs on Amazon EMR?

Anything we could improve?

Need more help?