I see a lot of notes about EFS's performance in the comments. I figured it's at least worth noting, for anyone considering using ECS with EFS, that just last week EFS had its read throughput on its general purpose tier increased by 400%.
That probably won't solve all EFS performance issues, but it's a pretty big boost and a nice announcement to come alongside ECS support.
Yes, these containers are supposed to be stateless, but I was tasked with converting an app at my previous job over to using ECS on Fargate and we hit so many issues because of the limits on storage per container instance. We ended up having to tweak the heck out of nginx caching configurations and other processes that would generate any "on disk" files to get around the issues. Having EFS available would have made solving some of those problems so much easier.
I've also been wanting to use ECS on Fargate for running scheduled tasks with large files (50gb+) but it wasn't really possible given the previous 4gb limit on storage.
> Yes, these containers are supposed to be stateless,
You got it backwards. NFS type services help containers be stateless because they are a separate service accessed through an interface where all the state is handled by a third party.
Thus by using a NFS-type service to store your local files, you are free to kill and respawn containers at will because their data is persisted elsewhere.
containers shouldn't necessarily be stateless; most existing codes don't know how or want to talk to services via RPC interfaces. In some sense, a mounted remote filesystem is just a standard API the OS provides you to access state in a convenient way that happens to be high performance, indexed, etc.
Oh man, awesome. We had a rather janky workload where ECS would spin up an EC2 that would then mount an EFS volume and then write a file over to S3. This is going to make that so much easier and cleaner.
If you're wondering why you'd ever have to do something like that, the answer is SAP.
(One of the product managers on the Amazon EFS team here). We have many customers that use EFS for a wide variety of use cases, including hosting Postfix. As with all applications, performance needs are relative. Use EFS if your application requires consistent low single digit ms latencies, shared POSIX file system, and a pay as you go elastic usage model. As with all AWS services, EFS is always launching greater performance capabilities, including IOPS, throughput and lower latencies to meet the needs of our customers. As example, on 4/1, EFS launched a 400% improvement in read IOPS for its General Purpose performance mode, from 7,000 to 35,000. Given the type of file system operations that Postfix performs, it should nicely benefit from this improvement.
How's the performance on EFS? Has anyone used it in production that is willing to share their experience?
We evaluated it for a relatively simple use case, and the performance seemed abysmal, so we didn't select it. I'm hoping that we made a mistake in our evaluation protocol, which would give me an excuse to give it another try.
It's terrible. Very slow when we tried to use it. There are ways to work around this, and ways to tune the performance, but honestly it was not worth it for our use case and instead we found a way to make EBS work.
EFS is a great way to get a lot of iowait on your cpu graphs. Would not recommend it for anything that had to be fast.
AWS just last week upped the read throughput speed on EFS pretty significantly (4x). That probably won't solve all of your speed problems and its still not as fast as EBS, but it might be worth giving it another try if your workload isn't write-heavy.
I have to agree with this- I found it challenging to tune EFS to get the claimed performance. The most important details of any system like this are: provision a very large filesystem, use large files, use lots of concurrent access (threads or machines).
There is a whole market of small companies that make high performance filers that do what many people want but they also have limits (high cost/byte).
Can you say more about what you did with EBS? It seems like it would be necessary to make some compromises in availability and disaster recovery because any given EBS volume is restricted to the availability zone where it was created.
We were hosting third party software in an EKS cluster and needed a way to share state between components of this system. We tried EFS initially but it actually killed the EKS cluster with iowait under load. We found a way to divert most of the systems requirements to local emptydir volumes, leaving only infrequently accessed media files on EFS
(One of the product managers on the Amazon EFS team here). Drupal is a common application that customers use with EFS, both in combination with a CDN and without. The following are two important considerations when running Drupal on an NFS service like EFS:
i) You should configure the OPcache so that it does not revalidate its cache on every request. Cache validation uses stat() in a serial loop on potentially hundreds of files, where each stat() would add O(ms) to the request.
ii) We recommend you store log files locally. NFS does not define an atomic O_APPEND operation, so appends require a file lock to prevent interleaving with appends from other clients. I've seen PHP application do 100s of file locks to a log file per request, each adding O(ms) to the total request latency. This is what you'd like to avoid.
It is highly dependent on your needs. It's NFS, and performs accordingly (though EFS has been rock solid in a few different scenarios from an availability standpoint and a baseline performance, assuming you use dedicated IOPS).
Should you run a database on EFS? No. Can you use it to back media files for a web application that are cached using a CDN, or for data files used for processing or temporary storage? Yes, it shines in those use cases... and it's cheaper than dedicating the time required to maintain your own NFS cluster.
Even Gluster or Ceph is, IMO, not worth the effort unless you (a) know how to run and maintain it, and (b) absolutely need the potential speed up that you can get, assuming a well-configured and well-maintained system.
It feels like the performance and cost is really built around a very specific use case that basically boils down to "write logs and only read a tiny fraction of those logs".
And then, I've seen way too many people treat it like a traditional file system, and stick things on it that don't expect to find themselves on NFS, and wonder why they get corrupted files.
And, really, I tend to avoid the AWS services with "Burst Balances". It's painful to get a system running smoothly only to have it grind to a halt when you use it under load because some burst balance somewhere went to zero. Your mileage may vary, of course.
Trust me, I know. We have alarming on all of our SSD burst balances after a few painful lessons.
At least those are mostly OK, since in our case at least, the really EBS hungry clients now have volumes of 1024 GB or more, so the burst balance issues don't apply.
For a high performance shared file system on AWS, an alternative is ObjectiveFS[0]. It uses memory caching to achieve performance closer to local disk.
Technically it supported it before, but you had to configure everything manually (or with your own automation). Having it native is a lot nicer, and brings provisioning of NFS-style volumes up to par with the current Kubernetes experience.
That probably won't solve all EFS performance issues, but it's a pretty big boost and a nice announcement to come alongside ECS support.
https://aws.amazon.com/about-aws/whats-new/2020/04/amazon-el...