Aravind had the good question about why we're assigning a
software-defined bank when reporting error thresholding errors instead
of simply using the bank which reports the last error causing the
overflow.
Digging through git history, it pointed to
95268664390b ("[PATCH] x86_64: mce_amd support for family 0x10 processors")
which added that functionality. The problem with this, however, is that
tools don't know about software-defined banks and get puzzled. So drop
that K8_MCE_THRESHOLD_BASE and simply use the hw bank reporting the
thresholding interrupt.
Save us a couple of MSR reads while at it.
Reported-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Link: https://lkml.kernel.org/r/5435B206.60402@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
/* Software defined banks */
#define MCE_EXTENDED_BANK 128
#define MCE_THERMAL_BANK (MCE_EXTENDED_BANK + 0)
-#define K8_MCE_THRESHOLD_BASE (MCE_EXTENDED_BANK + 1)
#define MCE_LOG_LEN 32
#define MCE_LOG_SIGNATURE "MACHINECHECK"
log:
mce_setup(&m);
- rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);
- rdmsrl(address, m.misc);
rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
- m.bank = K8_MCE_THRESHOLD_BASE + bank * NR_BLOCKS + block;
+ m.misc = ((u64)high << 32) | low;
+ m.bank = bank;
mce_log(&m);
wrmsrl(MSR_IA32_MCx_STATUS(bank), 0);