PDF
Download PDF
Download page Неисправность модуля ОЗУ.
Неисправность модуля ОЗУ
Статья является возможным решением инцидента Самопроизвольная перезагрузка ОС
Окружение
Диагностика
- Проверить файл
/var/log/kern.logна наличие сообщений об аппаратных ошибках.Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.988789] mce: [Hardware Error]: Machine check events logged Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.988896] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.988930] {2}[Hardware Error]: event severity: recoverable Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.989483] {2}[Hardware Error]: Error 0, type: recoverable Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.990053] {2}[Hardware Error]: fru_text: Card01, ChnG, DIMM0 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.990606] {2}[Hardware Error]: section_type: memory error Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.991146] {2}[Hardware Error]: error_status: 0x0000000000000400 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.991680] {2}[Hardware Error]: physical_address: 0x000000011cc95300 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.992208] {2}[Hardware Error]: node: 0 card: 6 module: 0 rank: 0 bank: 4 device: 0 row: 2508 column: 680 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.992739] {2}[Hardware Error]: error_type: 4, single-symbol chipkill ECC Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.993266] {2}[Hardware Error]: DIMM location: not present. DMI handle: 0x0000 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.993826] Memory failure: 0x11cc95: unhandlable page. Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.994238] Memory failure: 0x11cc95: recovery action for unknown page: Ignored Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.995136] EDAC skx MC3: HANDLING MCE MEMORY ERROR Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.995138] EDAC skx MC3: CPU 0: Machine Check Event: 0x0 Bank 25: 0xac00008200a00090 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.995139] EDAC skx MC3: TSC 0x416c93fc1de265 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.995140] EDAC skx MC3: ADDR 0x11cc95300 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.995141] EDAC skx MC3: MISC 0x80001004e615486 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.995142] EDAC skx MC3: PROCESSOR 0:0x606a6 TIME 1756300083 SOCKET 0 APIC 0x0 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.995147] EDAC MC3: 1 UE memory read error on CPU_SrcID#0_MC#3_Chan#0_DIMM#0 (channel:0 slot:0 page:0x11cc95 offset:0x300 grain:32 - err_code:0x00a0:0x0090 SystemAddress:0x11cc95300 ProcessorSocketId:0x0 MemoryControllerId:0x3 ChannelAddress:0x27325500 ChannelId:0x0 RankAddress:0x13993500 PhysicalRankId:0x0 DimmSlotId:0x0 Row:0x9cc Column:0x2a8 Bank:0x0 BankGroup:0x1 ChipSelect:0x0 ChipId:0x0) Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360895.003497] Memory failure: 0x11cc95: already hardware poisoned Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360895.003902] mce: [Hardware Error]: Machine check events logged Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360895.004298] EDAC skx MC3: HANDLING MCE MEMORY ERROR Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360895.004300] EDAC skx MC3: CPU 0: Machine Check Event: 0x0 Bank 255: 0xbc0000000000009f Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360895.004302] EDAC skx MC3: TSC 0x0 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360895.004303] EDAC skx MC3: ADDR 0x11cc95300 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360895.004304] EDAC skx MC3: MISC 0x8c Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360895.004304] EDAC skx MC3: PROCESSOR 0:0x606a6 TIME 1756300083 SOCKET 0 APIC 0x0 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360895.004307] EDAC MC3: 1 UE memory read error on CPU_SrcID#0_MC#3_Chan#0_DIMM#0 (channel:0 slot:0 page:0x11cc95 offset:0x300 grain:32 - err_code:0x0000:0x009f SystemAddress:0x11cc95300 ProcessorSocketId:0x0 MemoryControllerId:0x3 ChannelAddress:0x27325500 ChannelId:0x0 RankAddress:0x13993500 PhysicalRankId:0x0 DimmSlotId:0x0 Row:0x9cc Column:0x2a8 Bank:0x0 BankGroup:0x1 ChipSelect:0x0 ChipId:0x0)CODEВ приведенном примере ошибки указывают на физическую деградацию модуля памяти в слоте DIMM0 на канале 0 CPU0.
Решение
Ответ доступен с подключенной услугой «Техническая поддержка».
Внимание! Для авторизации используйте учетные данные Личного кабинета
Если учетная запись от новой версии личного кабинета отсутствует, просим писать на почту lk@astralinux.ru
- Проверить файл
/var/log/kern.logна наличие сообщений об аппаратных ошибках.Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.988789] mce: [Hardware Error]: Machine check events logged Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.988896] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.988930] {2}[Hardware Error]: event severity: recoverable Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.989483] {2}[Hardware Error]: Error 0, type: recoverable Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.990053] {2}[Hardware Error]: fru_text: Card01, ChnG, DIMM0 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.990606] {2}[Hardware Error]: section_type: memory error Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.991146] {2}[Hardware Error]: error_status: 0x0000000000000400 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.991680] {2}[Hardware Error]: physical_address: 0x000000011cc95300 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.992208] {2}[Hardware Error]: node: 0 card: 6 module: 0 rank: 0 bank: 4 device: 0 row: 2508 column: 680 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.992739] {2}[Hardware Error]: error_type: 4, single-symbol chipkill ECC Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.993266] {2}[Hardware Error]: DIMM location: not present. DMI handle: 0x0000 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.993826] Memory failure: 0x11cc95: unhandlable page. Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.994238] Memory failure: 0x11cc95: recovery action for unknown page: Ignored Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.995136] EDAC skx MC3: HANDLING MCE MEMORY ERROR Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.995138] EDAC skx MC3: CPU 0: Machine Check Event: 0x0 Bank 25: 0xac00008200a00090 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.995139] EDAC skx MC3: TSC 0x416c93fc1de265 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.995140] EDAC skx MC3: ADDR 0x11cc95300 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.995141] EDAC skx MC3: MISC 0x80001004e615486 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.995142] EDAC skx MC3: PROCESSOR 0:0x606a6 TIME 1756300083 SOCKET 0 APIC 0x0 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360894.995147] EDAC MC3: 1 UE memory read error on CPU_SrcID#0_MC#3_Chan#0_DIMM#0 (channel:0 slot:0 page:0x11cc95 offset:0x300 grain:32 - err_code:0x00a0:0x0090 SystemAddress:0x11cc95300 ProcessorSocketId:0x0 MemoryControllerId:0x3 ChannelAddress:0x27325500 ChannelId:0x0 RankAddress:0x13993500 PhysicalRankId:0x0 DimmSlotId:0x0 Row:0x9cc Column:0x2a8 Bank:0x0 BankGroup:0x1 ChipSelect:0x0 ChipId:0x0) Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360895.003497] Memory failure: 0x11cc95: already hardware poisoned Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360895.003902] mce: [Hardware Error]: Machine check events logged Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360895.004298] EDAC skx MC3: HANDLING MCE MEMORY ERROR Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360895.004300] EDAC skx MC3: CPU 0: Machine Check Event: 0x0 Bank 255: 0xbc0000000000009f Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360895.004302] EDAC skx MC3: TSC 0x0 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360895.004303] EDAC skx MC3: ADDR 0x11cc95300 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360895.004304] EDAC skx MC3: MISC 0x8c Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360895.004304] EDAC skx MC3: PROCESSOR 0:0x606a6 TIME 1756300083 SOCKET 0 APIC 0x0 Aug 27 16:08:03 p0rubk-ru2311lp kernel: [6360895.004307] EDAC MC3: 1 UE memory read error on CPU_SrcID#0_MC#3_Chan#0_DIMM#0 (channel:0 slot:0 page:0x11cc95 offset:0x300 grain:32 - err_code:0x0000:0x009f SystemAddress:0x11cc95300 ProcessorSocketId:0x0 MemoryControllerId:0x3 ChannelAddress:0x27325500 ChannelId:0x0 RankAddress:0x13993500 PhysicalRankId:0x0 DimmSlotId:0x0 Row:0x9cc Column:0x2a8 Bank:0x0 BankGroup:0x1 ChipSelect:0x0 ChipId:0x0)В приведенном примере ошибки указывают на физическую деградацию модуля памяти в слоте DIMM0 на канале 0 CPU0.
Возможная причина: Неисправность модуля ОЗУ. Перейти к решению.