A mechanism to isolate CPU topology information in the Linux kernel -- CPU Namespace
Yuma Theatre | Fri 14 Jan 2:40 p.m.–3:10 p.m.
Presented by
-
Pratik Rajesh Sampat
@pratikrsampat
https://pratiksampat.github.io/
Pratik is a Linux kernel hacker working with IBM. Pratik primarily works with the CPU team. Pratik also works in container primitive semantics to evaluate and improve performance based on the cgroups and namespace semantics in the Linux kernel.
Apart from systems development, Pratik's interests also lie in development of Mixed reality and haptic feedback systems.
Prior to joining IBM, Pratik has graduated with Bachelors of Technology in Computer Science from PES University, Bangalore in 2019
-
Gautham R. Shenoy
@gautshen
https://gautshen.wordpress.com/
Gautham is a Linux kernel programmer who has been working on the Linux kernel since 2006. He has contributed to the CPU Hotplug, the process scheduler, RCU, lockdep and cpuidle subsystems in the Linux Kernel.
Pratik Rajesh Sampat
@pratikrsampat
https://pratiksampat.github.io/
Gautham R. Shenoy
@gautshen
https://gautshen.wordpress.com/
Abstract
The CPU namespace aims to extend the current pool of namespaces in the kernel to isolate the system topology view from applications. The CPU namespace virtualizes the CPU information by maintaining an internal translation from the namespace CPU to the logical CPU in the kernel. The CPU namespace will also enable the existing interfaces interfaces like sys/proc, cgroupfs and sched_set(/get)affinity syscalls to be context aware and divulge information of the topology based on the CPU namespace context that requests information from it.
The aim of this talk is to propose a mechanism to isolate CPU topology information from applications that are running in a containerized environment.
The potential utilities of having the proposed CPU isolation are as follows:
1. An interface for coherent information:
a. Today, most applications that run on containers enforce their CPU limits requirements with the help of the cgroup interface. Cgroups is a control interface rather than an information interface; hence applications do not have a coherent view of the systems and the restrictions they incur.
b. The problem extends beyond to coherency of information. Cloud runtime environments can requests for CPU runtime in millicores, which translate to using CFS period and quota to limit CPU runtime in cgroups. However, generally, applications operate in terms of threads with little to no cognizance of the millicore limit or its connotation.
This can lead to unexpected running behaviors as well as have high impact on performance. Hence, having a coherent interface for divulge information based on constraints set by different subsystems is important.
2. Potential security and fair use implications on multi-tenant systems:
a. A case where an actor can be in cognizance of the CPU node topology can schedule workloads and select CPUs such that the bus is flooded causing a Denial Of Service attack.
b. A case wherein identifying the CPU system topology can help identify cores that are close to buses and peripherals such as GPUs to get an undue latency advantage from the rest of the workloads.
Currently, all of these problems mentioned above can be mitigated with the use of light weight VMs - Kata Containers. However with the use of a CPU namespace, the isolation advantages that are provided by a Kata Container can be achieved without the heaviness of a virtual machine.
A survey RFD had been posted here highlighting the problem, its impact and the current solutions that exist in the userspace as well as kernel: https://lore.kernel.org/lkml/fe947175-62f5-c3fa-158c-7be2dd886c0e@linux.ibm.com/T/
The CPU namespace aims to extend the current pool of namespaces in the kernel to isolate the system topology view from applications. The CPU namespace virtualizes the CPU information by maintaining an internal translation from the namespace CPU to the logical CPU in the kernel. The CPU namespace will also enable the existing interfaces interfaces like sys/proc, cgroupfs and sched_set(/get)affinity syscalls to be context aware and divulge information of the topology based on the CPU namespace context that requests information from it. The aim of this talk is to propose a mechanism to isolate CPU topology information from applications that are running in a containerized environment. The potential utilities of having the proposed CPU isolation are as follows: 1. An interface for coherent information: a. Today, most applications that run on containers enforce their CPU limits requirements with the help of the cgroup interface. Cgroups is a control interface rather than an information interface; hence applications do not have a coherent view of the systems and the restrictions they incur. b. The problem extends beyond to coherency of information. Cloud runtime environments can requests for CPU runtime in millicores, which translate to using CFS period and quota to limit CPU runtime in cgroups. However, generally, applications operate in terms of threads with little to no cognizance of the millicore limit or its connotation. This can lead to unexpected running behaviors as well as have high impact on performance. Hence, having a coherent interface for divulge information based on constraints set by different subsystems is important. 2. Potential security and fair use implications on multi-tenant systems: a. A case where an actor can be in cognizance of the CPU node topology can schedule workloads and select CPUs such that the bus is flooded causing a Denial Of Service attack. b. A case wherein identifying the CPU system topology can help identify cores that are close to buses and peripherals such as GPUs to get an undue latency advantage from the rest of the workloads. Currently, all of these problems mentioned above can be mitigated with the use of light weight VMs - Kata Containers. However with the use of a CPU namespace, the isolation advantages that are provided by a Kata Container can be achieved without the heaviness of a virtual machine. A survey RFD had been posted here highlighting the problem, its impact and the current solutions that exist in the userspace as well as kernel: https://lore.kernel.org/lkml/fe947175-62f5-c3fa-158c-7be2dd886c0e@linux.ibm.com/T/