The journey to take serverless to the extreme and whole-heartedly build a CMS without any EC2 or RDS instances or even a humble VPC has been interesting to say the least. But today, I discovered something which I thought would be gone forever, in the form of a nice little message:
no space left on device
Now, I thought I’d be free of this kind of error. I’d also be free of the following, because AWS manages it all for me, right!??
- Patching, especially OS patches
- Disk space
- Scale and instance management
- Load balancing
- … a bunch of other serverless niceties
Also worthy of note is that I know an AWS Lambda instance has limits, but surely I’m not hitting those just yet.
It’s trying to write out some static assets, for one of my larger sites, after generating all of the HTML files from Markdown. Which is fair enough, since it’s all part of the package.
I also find this:
child process exited with code 255
Non-zero return codes from a process. This needs more attention.
Error: ENOSPC: no space left on device, mkdir ‘/tmp/6NPGD7TPTpvy2tZx’
The CMS does do quite a lot of different functions but the use of disk space is only used by the publish process, which is essentially a wrapper around Hugo. The process is:
- Files are downloaded from S3 to the Publish Lambda function’s local /tmp/ drive, which I’m told has about 512MB of space
- Hugo is executed
- Files are generated in a “public” folder, thanks to Hugo
- Files are then uploaded to S3
- Any temporary file is removed
Actually, there is a step 0, which ensures the Hugo binary is available to the function, which means writing it to the local disk. To save a bit of time, I check if the binary is already there and if not, go and fetch it. I could include the binary inside the function as part of the upload to AWS, but I kind of like having it slightly decoupled.
Anyways, disk space is limited, but that’s not too bad. None of my sites have hit 512MB of content. That might be a problem when they do. Problem for future Koi, shall we say.
How does it run out of space?
At the moment, if any major problem happens, the cleanup routine might not execute. This is not ideal, but can be fixed at some stage. Since we aren’t really in control of the instances start or reuse of instances, AWS might spin up new Lambda instances or just re-use existing ones if the function hasn’t been modified.
Older instances do go stale and need a bit of warm-up time, which you can feel when logging into the system on a cold morning. It’s not terrible, but definitely noticeable. Given how many times websites are published per day, I’d say that often my functions take the hit of having to warm-up.
This function re-use has one benefit for me: a Hugo binary that has been previously downloaded and stored locally can be used again, which saves a small amount of publishing time.
The downside, is that if a few failures occur in a row, the instance will have it’s ephemeral storage filled up (in /tmp/) with no automatic ability to release. A simple re-deploy of the function will clear this out good and proper, but that’s not ideal.
So, back to monitoring disk space?
Perhaps, but it would have to be baked into the function itself. In some ways I’d like a TTL on files/folders so that they will automatically be cleaned up after some period of time, but that might be too much to ask for. It’s not like I can just ssh into the Lambda instance and clean things up periodically.
At the end of the day, I’ll have to manage the use of temporary disk storage myself. In any case, adoption of serverless doesn’t mean you can ignore all the resources of the past, maybe just some of them.