Skip to content

Commit 8ee4191

Browse files
hansendcrtg-canonical
authored andcommitted
UBUNTU: SAUCE: x86, sched: Add new topology for multi-NUMA-node CPUs
BugLink: http://bugs.launchpad.net/bugs/1338919 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1338919/comments/11 I'm getting the spew below when booting with Haswell (Xeon E5-2699 v3) CPUs and the "Cluster-on-Die" (CoD) feature enabled in the BIOS. It seems similar to the issue that some folks from AMD ran in to on their systems and addressed in this commit: 161270f ("x86/smp: Fix topology checks on AMD MCM CPUs") Both these Intel and AMD systems break an assumption which is being enforced by topology_sane(): a socket may not contain more than one NUMA node. AMD special-cased their system by looking for a cpuid flag. The Intel mode is dependent on BIOS options and I do not know of a way which it is enumerated other than the tables being parsed during the CPU bringup process. In other words, we have to trust the ACPI tables <shudder>. This detects the situation where a NUMA node occurs at a place in the middle of the "CPU" sched domains. It replaces the default topology with one that relies on the NUMA information from the firmware (SRAT table) for all levels of sched domains above the hyperthreads. This also fixes a sysfs bug. We used to freak out when we saw the "mc" group cross a node boundary, so we stopped building the MC group. MC gets exported as the 'core_siblings_list' in /sys/devices/system/cpu/cpu*/topology/ and this caused CPUs with the same 'physical_package_id' to not be listed together in 'core_siblings_list'. This violates a statement from Documentation/ABI/testing/sysfs-devices-system-cpu: core_siblings: internal kernel map of cpu#'s hardware threads within the same physical_package_id. core_siblings_list: human-readable list of the logical CPU numbers within the same physical_package_id as cpu#. The sysfs effects here cause an issue with the hwloc tool where it gets confused and thinks there are more sockets than are physically present. Before this patch, there are two packages: 18 0 18 1 But 4 _sets_ of core siblings: 9 0-8 9 18-26 9 27-35 9 9-17 After this set, there are only 2 sets of core siblings, which is what we expect for a 2-socket system. 18 0 18 1 18 0-17 18 18-35 Example spew: ... NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. #2 #3 #4 #5 torvalds#6 torvalds#7 torvalds#8 .... node #1, CPUs: torvalds#9 ------------[ cut here ]------------ WARNING: CPU: 9 PID: 0 at /home/ak/hle/linux-hle-2.6/arch/x86/kernel/smpboot.c:306 topology_sane.isra.2+0x74/0x90() sched: CPU torvalds#9's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. Modules linked in: CPU: 9 PID: 0 Comm: swapper/9 Not tainted 3.17.0-rc1-00293-g8e01c4d-dirty torvalds#631 Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRNDSDP1.86B.0036.R05.1407140519 07/14/2014 0000000000000009 ffff88046ddabe00 ffffffff8172e485 ffff88046ddabe48 ffff88046ddabe38 ffffffff8109691d 000000000000b001 0000000000000009 ffff88086fc12580 000000000000b020 0000000000000009 ffff88046ddabe98 Call Trace: [<ffffffff8172e485>] dump_stack+0x45/0x56 [<ffffffff8109691d>] warn_slowpath_common+0x7d/0xa0 [<ffffffff8109698c>] warn_slowpath_fmt+0x4c/0x50 [<ffffffff81074f94>] topology_sane.isra.2+0x74/0x90 [<ffffffff8107530e>] set_cpu_sibling_map+0x31e/0x4f0 [<ffffffff8107568d>] start_secondary+0x1ad/0x240 ---[ end trace 3fe5f587a9fcde61 ]--- torvalds#10 torvalds#11 torvalds#12 torvalds#13 torvalds#14 torvalds#15 torvalds#16 torvalds#17 .... node #2, CPUs: torvalds#18 torvalds#19 torvalds#20 torvalds#21 torvalds#22 torvalds#23 torvalds#24 torvalds#25 torvalds#26 .... node #3, CPUs: torvalds#27 torvalds#28 torvalds#29 torvalds#30 torvalds#31 torvalds#32 torvalds#33 torvalds#34 torvalds#35 Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> [ Added LLC domain and s/match_mc/match_die/ ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David Rientjes <rientjes@google.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: brice.goglin@gmail.com Cc: "H. Peter Anvin" <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/20140918193334.C065EBCE@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
1 parent 80e0038 commit 8ee4191

File tree

1 file changed

+46
-9
lines changed

1 file changed

+46
-9
lines changed

arch/x86/kernel/smpboot.c

+46-9
Original file line numberDiff line numberDiff line change
@@ -299,12 +299,20 @@ void smp_store_cpu_info(int id)
299299
identify_secondary_cpu(c);
300300
}
301301

302+
static bool
303+
topology_same_node(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
304+
{
305+
int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
306+
307+
return (cpu_to_node(cpu1) == cpu_to_node(cpu2));
308+
}
309+
302310
static bool
303311
topology_sane(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o, const char *name)
304312
{
305313
int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
306314

307-
return !WARN_ONCE(cpu_to_node(cpu1) != cpu_to_node(cpu2),
315+
return !WARN_ONCE(!topology_same_node(c, o),
308316
"sched: CPU #%d's %s-sibling CPU #%d is not on the same node! "
309317
"[node: %d != %d]. Ignoring dependency.\n",
310318
cpu1, name, cpu2, cpu_to_node(cpu1), cpu_to_node(cpu2));
@@ -345,17 +353,44 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
345353
return false;
346354
}
347355

348-
static bool match_mc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
356+
/*
357+
* Unlike the other levels, we do not enforce keeping a
358+
* multicore group inside a NUMA node. If this happens, we will
359+
* discard the MC level of the topology later.
360+
*/
361+
static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
349362
{
350-
if (c->phys_proc_id == o->phys_proc_id) {
351-
if (cpu_has(c, X86_FEATURE_AMD_DCM))
352-
return true;
353-
354-
return topology_sane(c, o, "mc");
355-
}
363+
if (c->phys_proc_id == o->phys_proc_id)
364+
return true;
356365
return false;
357366
}
358367

368+
static struct sched_domain_topology_level numa_inside_package_topology[] = {
369+
#ifdef CONFIG_SCHED_SMT
370+
{ cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
371+
#endif
372+
#ifdef CONFIG_SCHED_MC
373+
{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
374+
#endif
375+
{ NULL, },
376+
};
377+
/*
378+
* set_sched_topology() sets the topology internal to a CPU. The
379+
* NUMA topologies are layered on top of it to build the full
380+
* system topology.
381+
*
382+
* If NUMA nodes are observed to occur within a CPU package, this
383+
* function should be called. It forces the sched domain code to
384+
* only use the SMT level for the CPU portion of the topology.
385+
* This essentially falls back to relying on NUMA information
386+
* from the SRAT table to describe the entire system topology
387+
* (except for hyperthreads).
388+
*/
389+
static void primarily_use_numa_for_topology(void)
390+
{
391+
set_sched_topology(numa_inside_package_topology);
392+
}
393+
359394
void set_cpu_sibling_map(int cpu)
360395
{
361396
bool has_smt = smp_num_siblings > 1;
@@ -392,7 +427,7 @@ void set_cpu_sibling_map(int cpu)
392427
for_each_cpu(i, cpu_sibling_setup_mask) {
393428
o = &cpu_data(i);
394429

395-
if ((i == cpu) || (has_mp && match_mc(c, o))) {
430+
if ((i == cpu) || (has_mp && match_die(c, o))) {
396431
link_mask(core, cpu, i);
397432

398433
/*
@@ -414,6 +449,8 @@ void set_cpu_sibling_map(int cpu)
414449
} else if (i != cpu && !c->booted_cores)
415450
c->booted_cores = cpu_data(i).booted_cores;
416451
}
452+
if (match_die(c, o) == !topology_same_node(c, o))
453+
primarily_use_numa_for_topology();
417454
}
418455
}
419456

0 commit comments

Comments
 (0)