Homelab kone alkoi sekoilemaan

Liittynyt
21.12.2016
Viestejä
7
Tosiaan tuo oma homelab kone rupesi sekoilemaan eilen ja vähän menee arvailuksi mikä voisi olla vikana. Oireena on IO_PAGE_FAULT-viestejä/kernel paniikkeja/verkkokorttien toimimattomuutta.
Kone on toiminut ongelmitta 3 vuotta, mutta lähiaikoina tullut pari kertaa ongelmaa usb-laitteiden kanssa. Jotenkin koko järjestelmä alkoi kerran jumittamaan kun kytkin ulkoisen kovalevyn jne...

Setuppi:
Ryzen 4350G Pro
Kingston 2 x 16 Gb DDR4 ECC udimm
ASUS ROG Strix X570-F
I350-T2 dual port gigabit nic
LSI HBA SAS9207-8i
Corsair RM650
Käyttiksenä proxmox 8.1 ja kolme virtuaalikonetta (truenas/ubuntu/haos)

Ekat oireet alkoivat jo aiemmin viime viikolla ja samoja IO_PAGE_FAULT-herjoja antoi silloinkin. Rebootti auttoi silloin, mutta nyt ei enää auta. Nyt tulee järjestäen heti kun käyttis nostaa verkkokortin linjoille. Herjat loppuu jos disabloi verkkokortin ifdown käskyllä. Outo homma oli, että jos koneessa oli kiinni pcie-verkkokortti (I350-T2 (2xgigabit)) niin nuokin interfacet lokittivat samoja IO_PAGE_FAULT-virheitä - tosin homma ei korjaantunut vaikka kortin poisti koneesta vaan integroitu verkkokortti herjaa vieläkin. Ongelmat ilmenevät myöskin windowsin puolella samaan tyyliin kuin linuxin puolella - eli siis verkkokortin toimimattomuutena. Outoa, että prime95 meni nätisti tuntikaupalla, mutta esim tätä kirjoittaessa käynnistin vielä kuvankaappauksia varten ja sain "page fault in non-paged area" BSOD:n :) ........ jaaa nyt näyttis kaatuvan koko ajan ;*)

Kokeiltua:
- Järjestelmä päivitelty apt-getillä
- Memtest ajettu 4 tuntia onnistuneesti
- Kokeiltu yksittäisillä muisteilla (molemmilla erikseen)
- Kokeiltu eri kernel versioita 6.5 / 6.2 / 5.15
- Kokeiltu laittaa grubin konffikseen kernel parametreiksi iommu=soft ja iommu=pt (tuo pt aiheuttaa kernel panicin välillä ja muistaakseni aina kun yrittää reboottaa)
- Uusi käyttis asennus uudelle SSD:lle (proxmox)
- Windows 11 asennus uudelle SSD:lle (asennuksen aikana antoi 5 x bsod). Järjestelmä pysyy pystyssä kuitenkin normaalisti (melkein aina), mutta tälläkin puolella verkkokortti pätkii.
- Prime95 ajettu onnistuneesti 4 tuntia

Ote dmesg:stä:

[ 12.799091] vmbr0: port 1(enp3s0) entered blocking state
[ 12.799097] vmbr0: port 1(enp3s0) entered disabled state
[ 12.799121] igb 0000:03:00.0 enp3s0: entered allmulticast mode
[ 12.799180] igb 0000:03:00.0 enp3s0: entered promiscuous mode
[ 12.827012] igb 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfadfd000 flags=0x0000]
[ 12.924852] RPC: Registered named UNIX socket transport module.
[ 12.924857] RPC: Registered udp transport module.
[ 12.924859] RPC: Registered tcp transport module.
[ 12.924860] RPC: Registered tcp-with-tls transport module.
[ 12.924861] RPC: Registered tcp NFSv4.1 backchannel transport module.
[ 15.163447] bpfilter: Loaded bpfilter_umh pid 1159
[ 15.163717] Started bpfilter
[ 15.819220] igb 0000:03:00.0 enp3s0: igb: enp3s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 15.926924] vmbr0: port 1(enp3s0) entered blocking state
[ 15.926932] vmbr0: port 1(enp3s0) entered forwarding state
[ 19.882865] igb 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfae00000 flags=0x0020]
[ 25.297656] kvm[1210]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 41.866869] igb 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfae00000 flags=0x0020]
[ 59.882877] igb 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfae00000 flags=0x0020]
[ 81.866869] igb 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfae00000 flags=0x0020]
[ 113.866866] igb 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfae00000 flags=0x0020]
[ 129.866864] igb 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfae00000 flags=0x0020]
[ 153.866866] igb 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfae00000 flags=0x0020]
[ 185.866865] igb 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfae00000 flags=0x0020]
[ 217.866877] igb 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfae00000 flags=0x0020]
[ 247.466732] igb 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfadfe240 flags=0x0020]
[ 249.866836] igb 0000:03:00.0: Detected Tx Unit Hang
Tx Queue <1>
TDH <27>
TDT <27>
next_to_use <27>
next_to_clean <24>
buffer_info[next_to_clean]
time_stamp <ffffcc98>
next_to_watch <00000000f7414160>
jiffies <ffffcef0>
desc.status <a8000>
[ 251.882856] igb 0000:03:00.0: Detected Tx Unit Hang
Tx Queue <1>
TDH <27>
TDT <27>
next_to_use <27>
next_to_clean <24>
buffer_info[next_to_clean]
time_stamp <ffffcc98>
next_to_watch <00000000f7414160>
jiffies <ffffd0e8>
desc.status <a8000>
[ 253.866839] igb 0000:03:00.0: Detected Tx Unit Hang
Tx Queue <1>
TDH <27>
TDT <27>
next_to_use <27>
next_to_clean <24>
buffer_info[next_to_clean]
time_stamp <ffffcc98>
next_to_watch <00000000f7414160>
jiffies <ffffd2d8>
desc.status <a8000>
[ 255.882845] igb 0000:03:00.0: Detected Tx Unit Hang
Tx Queue <1>
TDH <27>
TDT <27>
next_to_use <27>
next_to_clean <24>
buffer_info[next_to_clean]
time_stamp <ffffcc98>
next_to_watch <00000000f7414160>
jiffies <ffffd4d0>
desc.status <a8000>
[ 257.098641] ------------[ cut here ]------------
[ 257.098653] NETDEV WATCHDOG: enp3s0 (igb): transmit queue 1 timed out 7584 ms
[ 257.098692] WARNING: CPU: 6 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x260/0x270
[ 257.098704] Modules linked in: ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables sunrpc bonding tls softdog nfnetlink_log binfmt_misc nfnetlink intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd amdgpu kvm_amd snd_hda_codec_hdmi kvm amdxcp iommu_v2 snd_hda_intel drm_buddy snd_intel_dspcfg gpu_sched snd_intel_sdw_acpi drm_suballoc_helper irqbypass drm_ttm_helper snd_hda_codec crct10dif_pclmul polyval_clmulni ttm polyval_generic snd_hda_core ghash_clmulni_intel drm_display_helper aesni_intel asus_ec_sensors snd_hwdep eeepc_wmi cec crypto_simd asus_wmi snd_pcm rc_core cryptd ledtrig_audio rapl snd_timer sparse_keymap drm_kms_helper pcspkr platform_profile snd wmi_bmof mxm_wmi soundcore k10temp ccp joydev input_leds mac_hid vhost_net vhost vhost_iotlb tap drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) btrfs blake2b_generic xor hid_generic usbkbd uas usbhid usb_storage hid raid6_pq libcrc32c xhci_pci xhci_pci_renesas crc32_pclmul
[ 257.098955] igb i2c_piix4 xhci_hcd ahci i2c_algo_bit dca libahci video wmi
[ 257.098982] CPU: 6 PID: 0 Comm: swapper/6 Tainted: P O 6.5.11-4-pve #1
[ 257.098988] Hardware name: System manufacturer System Product Name/ROG STRIX X570-F GAMING, BIOS 3604 04/14/2021
[ 257.098992] RIP: 0010:dev_watchdog+0x260/0x270
[ 257.098997] Code: ff ff 48 89 df c6 05 77 3b 78 01 01 e8 b9 80 f9 ff 44 8b 45 cc 44 89 f9 48 89 de 48 89 c2 48 c7 c7 b0 9e 83 a3 e8 70 ce 33 ff <0f> 0b e9 1d ff ff ff 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
[ 257.099002] RSP: 0018:ffffb819c039ce40 EFLAGS: 00010246
[ 257.099007] RAX: 0000000000000000 RBX: ffff88cc9188c000 RCX: 0000000000000000
[ 257.099011] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 257.099014] RBP: ffffb819c039ce78 R08: 0000000000000000 R09: 0000000000000000
[ 257.099017] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88cc9188c4c8
[ 257.099020] R13: ffff88cc9188c41c R14: 0000000000000000 R15: 0000000000000001
[ 257.099024] FS: 0000000000000000(0000) GS:ffff88cf9e580000(0000) knlGS:0000000000000000
[ 257.099027] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 257.099031] CR2: 00007f808a32aa1c CR3: 00000001178b0000 CR4: 0000000000350ee0
[ 257.099034] Call Trace:
[ 257.099038] <IRQ>
[ 257.099045] ? show_regs+0x6d/0x80
[ 257.099054] ? __warn+0x89/0x160
[ 257.099061] ? dev_watchdog+0x260/0x270
[ 257.099066] ? report_bug+0x17e/0x1b0
[ 257.099073] ? irq_work_queue+0x2f/0x70
[ 257.099081] ? handle_bug+0x46/0x90
[ 257.099089] ? exc_invalid_op+0x18/0x80
[ 257.099093] ? asm_exc_invalid_op+0x1b/0x20
[ 257.099104] ? dev_watchdog+0x260/0x270
[ 257.099109] ? __pfx_dev_watchdog+0x10/0x10
[ 257.099113] call_timer_fn+0x2c/0x160
[ 257.099120] ? __pfx_dev_watchdog+0x10/0x10
[ 257.099124] __run_timers+0x259/0x310
[ 257.099133] run_timer_softirq+0x1d/0x40
[ 257.099138] __do_softirq+0xd4/0x303
[ 257.099146] __irq_exit_rcu+0x75/0xa0
[ 257.099151] irq_exit_rcu+0xe/0x20
[ 257.099155] sysvec_apic_timer_interrupt+0x92/0xd0
[ 257.099161] </IRQ>
[ 257.099164] <TASK>
[ 257.099167] asm_sysvec_apic_timer_interrupt+0x1b/0x20
[ 257.099171] RIP: 0010:pv_native_safe_halt+0xb/0x10
[ 257.099176] Code: 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 eb 07 0f 00 2d 09 ff 3a 00 fb f4 <e9> 30 a5 01 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55
[ 257.099180] RSP: 0018:ffffb819c0197e00 EFLAGS: 00000246
[ 257.099185] RAX: 0000000000004000 RBX: ffff88cc811d9064 RCX: ffff88cc83af8800
[ 257.099188] RDX: ffff88cf9e580000 RSI: ffff88cc811d9000 RDI: 0000000000000001
[ 257.099191] RBP: ffffb819c0197e08 R08: 0000000000000000 R09: 0000000000000000
[ 257.099194] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88cc811d9064
[ 257.099197] R13: ffffffffa4277c60 R14: ffffffffa4277ce0 R15: 0000000000000001
[ 257.099204] ? acpi_safe_halt+0x19/0x60
[ 257.099209] acpi_idle_do_entry+0x40/0x80
[ 257.099213] acpi_idle_enter+0x8b/0x100
[ 257.099218] cpuidle_enter_state+0x85/0x470
[ 257.099224] cpuidle_enter+0x2e/0x50
[ 257.099232] call_cpuidle+0x23/0x60
[ 257.099238] do_idle+0x202/0x260
[ 257.099244] cpu_startup_entry+0x2a/0x30
[ 257.099248] start_secondary+0x119/0x140
[ 257.099255] secondary_startup_64_no_verify+0x17e/0x18b
[ 257.099266] </TASK>
[ 257.099269] ---[ end trace 0000000000000000 ]---
[ 257.099302] igb 0000:03:00.0 enp3s0: Reset adapter
[ 257.227455] vmbr0: port 1(enp3s0) entered disabled state
[ 260.811231] igb 0000:03:00.0 enp3s0: igb: enp3s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 260.918935] vmbr0: port 1(enp3s0) entered blocking state
[ 260.918941] vmbr0: port 1(enp3s0) entered forwarding state
[ 261.435349] igb 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfae00000 flags=0x0020]
[ 261.582847] igb 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfac0b0c0 flags=0x0020]
[ 262.008399] igb 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfac90100 flags=0x0020]
[ 262.596829] igb 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfac1c0c0 flags=0x0020]
[ 263.674324] igb 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfada40c0 flags=0x0020]
[ 266.922872] igb 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfae00000 flags=0x0020]
[ 268.748950] igb 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfadc6100 flags=0x0020]
[ 268.889759] igb 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfadd10c0 flags=0x0020]
[ 269.129207] igb 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfae00000 flags=0x0020]
[ 269.687827] igb 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfadfd600 flags=0x0020]
[ 270.129542] igb 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfadef1c0 flags=0x0020]
[ 273.438227] igb 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfadfd900 flags=0x0000]
[ 306.922876] igb 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfae00000 flags=0x0020]
[ 338.922867] igb 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfae00000 flags=0x0020]




Itsellä villi veikkaus olisi emolevy, mutta varmuutta ei ole. Jos jollain on tässä espoon alueella ylimääräinen poweri ja/tai am4 emo mitä pääsisi lainaamaan hetkeksi niin pistäkää ihmeessä yv:tä.
 

Liitteet

  • eventlog.png
    eventlog.png
    1,1 MB · Luettu: 13

Statistiikka

Viestiketjuista
257 608
Viestejä
4 478 502
Jäsenet
73 962
Uusin jäsen
askor

Hinta.fi

Back
Ylös Bottom