I can, however, automate the creation of an SVM, so currently my only good option seems to be to provide one SVM per customer volume and limit the size oft he volume the SVM can create to whatever I want to sell them and to set max-volumes to 1 (or two, if the root volume also counts).
Yeah, one could do that (one vserver per customer volume) but... it's feels a bit... ridiculous..? Something better is needed, I do agree with that. Hope there will be something in the not too distant future. Can't remember now how many vservers is possible in a large ONTAP cluster, is it 1000? There's some limit. It depends on how many nodes or some other factors I think.
https://library.netapp.com/ecmdocs/ECMLP2429205/html/GUID-425E17E6-B342-4AFC...
Actually, this number is very small, so this "solution" is not going to scale. But I could start with that and re-host the volumes later on once that's fixed (assuming that this product will not sell like hot cakes in the first few months).
But this is a nice workaround -- I have to admit I still don't fully understand how it works, because I'm not too familiar with the 'security login role stuff... (need to read up on it!) ::> vserver modify -vserver vs1 -max-volumes 50 ::> security login role create -vserver <name> -role restricted -cmddirname "volume" -access all -query "-size <=50G"
Well, actually, when you leave out the query, it's quite easy. You specify the command name and the access permission for it (none, read-only, all). So you could create a role which allows "net int show", but does not allow "network port show". You can use "all" access on "network interface" which will also allow the creation and modification of LIFs, if you just assign "read-only" to "network interface", you will be able to read all the config, but not modify it.
The query then allows you to add some more granularity, like f.ex. you could limit write access to just one lif and leave read-only access on all the others.
security login role create -vserver <name> -role restricted -cmddirname "network interface" -access all -query "-lif <lifname>" security login role create -vserver <name> -role restricted -cmddirname "network interface" -access readonly
You can also use the query to just allow volume creation on a single aggregate for one user, f.ex. And what I found out while testing is, that the query also supports things like "-size <= 50G" which was helpful in this situation.
So with the above set and assuming no K8s "customer" can override it in any way, you're good right? It will limit things for sure. Or did I miss something? Some other disadvantage or side effect of doing that 'role' command?
Yes, with that, I'm "good". There's no real downside oft he security roles I can think of. Of course, you open management commands to your customers, but well, doesn't seem to be avoidable nowadays.
Some thoughts. The next step down this path is when the customer has MANY K8s clusters, they are an internal "ISP" of sorts. So they want one vserver per their K8s cluster and they want to create and remove them themselves together with K8s clusters. I.e.: be vserver admin and API control it too from their "portal" w their own automation.
But then you can no longer do anything like this, it's not in your control anymore to limit things in a vserver in any way by force:
vserver modify -vserver vs1 -max-volumes 10
Well, you would not allow a customer to create a SVM on your OnTap cluster anyways, at least not using the Ontap API, because you would have to allow them to access the cluster admin API and that's probably not the best idea. What we're doing here is to allow the customer access to our customer portal's API and in there we have a module for automating NetApp stuff, f.ex. to create a new SVM and in there it's easy to let the user pick "SVM large, small and x-large" (f.ex.) for a k8s workload and assign whatever policies I want to it.
What to do then? If you relinquish vserver creation control, then the K8s cluster admin ppl can do anything they like and the only option that remains is to control things at the Aggr level. As best one can... If they run out of Aggr space then... *boom*. Their problem. Still, there will be some sort of disruption and some sort of Incident mgmt there one would think
Yah, assigning a separate aggregate per cluster doesn't really scale here, unless there's some sort of "virtual" aggregate I'm not aware of. Also, I think aggregate quotas per "customer" (or svm in this case) would work well here, but I'm also not aware of anything like that.
So finally we have the last step: the K8s cluster team purchases their own ONTAP cluster(s), all the NetApp HW on their CAPEX budget and they own the whole thing; all the HW = all the OPEX created by the depreciation. The cost reclaim model is their problem, not mine. The only thing a Storage Ops Teams does in the scenario (e.g. where I work internally at Ericsson in our R&D) is set up the baseline ONTAP cluster as it should be in the internal Network and manage/support the HW (replace broken things etc), up to creating Aggrs probably because K8s ppl don't want to do that.
There's a big problem with doing things like that. Nowadays, people do not want to sign 3yr contracts and also they do not know how much resources they need. Back in the old times, people thought before deploying, nowadays, things are different 😊 With that comes the need to automatically scale in all directions and if the project doesn't work out, it's being teared down immediately. Also, k8s f.ex. or any other rapid deployment scenario sometimes is also used for quickly cloning an environment to do some tests (load tests, release tests, whatever) and that also requires temporary space, so the only valid option can be flexible sizing on the SP end.
Then, from vserver level of abstraction and up, they do whatever they want. 100% automated under their control. A-hm. Did I just make myself (almost) obsolete..? ;-)
Duties are shifting. You will be orchestrating lots of wild SVMs on your clusters (like an animal tamer) and have no idea what they're doing, what project they belong to and if they bite you or not if you try to pet them. Sometimes, you will have to react to "your product sucks" tickets just because those so-called DevOps people do not even know what an iop is and how to measure it, but hey, that's the future.
Yes, that's what Trident is for really from a K8s PoV, is it not? Making the storage Operations Team nearly obsolete. Infrastructure as Code, etc. For this to work out properly, the financial control (CAPEX, bugeting, OPEX, cost reclaim) has to be put in the hands of the K8s admin ppl. Each K8s cluster created automatically creates a vserver for it in some or other of the ONTAP clusters set up and available for this purpose. For this sole purpose. All the volume and performance mgmt (Demand- & Capacity Mgmt) has to be done by the K8s admin ppl.
That's not going to happen. They want to click on a button (or fire an API call) that creates a 150TB volume with 200k iops. How you, as a provider, make sure this gets delivered is none of their business. If you cannot deliver that in a few seconds, they'll go and look somewhere else. Trying to talk to them to make them aware oft he fact that you will never be able to reach 10Gbit/s of bandwidth on a volume with just 1000 iops set as a limit, because that's just technically not possible, will not work out either - BTDT.
Right. I, the Storage Architect, go join that team (K8s guys) instead, to plan and handle the ONTAP storage for PVC for all the umpteen K8s clusters and server "POD"s [a future vision for my working life..?]
Not sure if that's your future - you would have to be involved in the application design there then and that's a whole different business.
Best, Alex