HugePages documentation (#5419)

derekwaynecarr · steveperry-53 · commit c3304714835b · 2017-09-13T09:52:20.000-07:00
diff --git a/_data/tasks.yml b/_data/tasks.yml
@@ -187,6 +187,10 @@ toc:
   section:
   - docs/tasks/manage-gpus/scheduling-gpus.md
 
+- title: Manage HugePages
+  section:
+  - docs/tasks/manage-hugepages/scheduling-hugepages.md
+
 - title: Extend kubectl with plugins
   section:
   - docs/tasks/extend-kubectl/kubectl-plugins.md
diff --git a/docs/tasks/index.md b/docs/tasks/index.md
@@ -62,6 +62,10 @@ Perform common tasks for managing a DaemonSet, such as performing a rolling upda
 
 Configure and schedule NVIDIA GPUs for use as a resource by nodes in a cluster.
 
+#### Managing HugePages
+
+Configure and schedule huge pages as a schedulable resource in a cluster.
+
 ### What's next
 
 If you would like to write a task page, see
diff --git a/docs/tasks/manage-hugepages/scheduling-hugepages.md b/docs/tasks/manage-hugepages/scheduling-hugepages.md
@@ -0,0 +1,81 @@
+---
+approvers:
+- derekwaynecarr
+title: Manage HugePages
+---
+
+{% capture overview %}
+{% include feature-state-alpha.md %}
+
+Kubernetes supports the allocation and consumption of pre-allocated huge pages
+by applications in a Pod as an **alpha** feature.  This page describes how users
+can consume huge pages and the current limitations.
+
+{% endcapture %}
+
+{% capture prerequisites %}
+
+1. Kubernetes nodes must pre-allocate huge pages in order for the node to report
+   its huge page capacity.  A node may only pre-allocate huge pages for a single
+   size.
+1. A special **alpha** feature gate `HugePages` has to be set to true across the
+   system: `--feature-gates="HugePages=true"`.
+
+The nodes will automatically discover and report all huge page resources as a
+schedulable resource.
+
+{% endcapture %}
+
+{% capture steps %}
+
+## API
+
+Huge pages can be consumed via container level resource requirements using the
+resource name `hugepages-<size>`, where size is the most compact binary notation
+using integer values supported on a particular node.  For example, if a node
+supports 2048KiB page sizes, it will expose a schedulable resource
+`hugepages-2Mi`.  Unlike CPU or memory, huge pages do not support overcommit.
+
+```yaml
+apiVersion: v1
+kind: Pod
+metadata:
+  generateName: hugepages-volume-
+spec:
+  containers:
+  - image: fedora:latest
+    command:
+    - sleep
+    - inf
+    name: example
+    volumeMounts:
+    - mountPath: /hugepages
+      name: hugepage
+    resources:
+      limits:
+        hugepages-2Mi: 100Mi
+  volumes:
+  - name: hugepage
+    emptyDir:
+      medium: HugePages
+```
+
+- Huge page requests must equal the limits.  This is the default if limits are
+  specified, but requests are not.
+- Huge pages are isolated at a pod scope, container isolation is planned in a
+  future iteration.
+- EmptyDir volumes backed by huge pages may not consume more huge page memory
+  than the pod request.
+- Applications that consume huge pages via `shmget()` with `SHM_HUGETLB` must
+  run with a supplemental group that matches `proc/sys/vm/hugetlb_shm_group`
+
+## Future
+
+- Support container isolation of huge pages in addition to pod isolation.
+- NUMA locality guarnatees as a feature of quality of service.
+- ResourceQuota support.
+- LimitRange support.
+
+{% endcapture %}
+
+{% include templates/task.md %}