NAS- koneen sata- virheitä

Liittynyt
17.01.2018
Viestejä
2 472
NAS on alkanut antaa tällaisia virheitä:

[388686.586908] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x50000 action 0x6 frozen
[388686.586950] ata5: SError: { PHYRdyChg CommWake }
[388686.586968] ata5.00: failed command: FLUSH CACHE EXT
[388686.586980] ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 25
res 40/00:01:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[388686.587014] ata5.00: status: { DRDY }
[388686.587032] ata5: hard resetting link
[388687.054957] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[388687.059448] ata5.00: configured for UDMA/133
[388687.059460] ata5.00: retrying FLUSH 0xea Emask 0x4
[388687.059838] ata5: EH complete

Kone on Asrock J5040-ITX, otettu käyttöön 04/2021. Powerina Streacom 160W nanopsu (11/2019).
Levykotelo vanhasta Dell T300 palvelimesta. Emolla on 4 sata- paikkaa (2 cpu ja 2 ASM1061 kautta) ja toinen 2 portin ASM1062 lisäkorttina.
Kirjoittelin koneesta silloin jutunkin tänne: DIY - Putki- nas

Kone on pyörinyt tuosta rakennuksesta lähtien (joulu 2018) 24/7 päivityksiä ja pölynpoistoja lukuunottamatta.
Levyjä on vähän vaihdettu matkan varrella, nyt kiinni 4kpl 4T seagate ironwolf kahtena peiliparina:
3: ST4000VN008-2DR166, 37226h
4: ST4000VN008-2DR166, 37213h
5: ST4000VN006-3CW104, 6842h
6: ST4000VN006-3CW104, 1391h

Jälkimmäisissä on vähemmän tunteja kun ne on vaihdettu eri aikoina.
Ensimmäiset virheet ilmestyivät jo ennen noiden kahden viimeisen vaihtoa, ensin vain yhteen jonka vaihdoin varmuuden vuoksi ja toisen myöhemmin.
Levyt 5 ja 6 ovat emon ASM1061 porteissa joten voisi olettaa että se on kuolemassa ?
Vai voiko sata- kaapelit mennä ajan myötä huonoiksi ?
Laatikossa on vielä riittävästi kaapaleita että voi tehdä uuden tuon nykyisen tilalle mutta on siinä jonkin verran hommaa... jos ongelman aiheuttaisi poweri niin se varmaan ilmenisi kaikilla levyillä.
Toisella emollakin voisi testata.
Heti saatavilla olisi vain ASM1166 6 portin kortti mutta se ei käy koska vaatii leveämmän paikan kuin emolla on...
Huomenna testaan vaihtamalla kummastakin peilistä yhdet levyt päittäin, jos virhe siirtyy niin sitten voi johtua levystä jota en kyllä usko.
Seuraava testi voisi olla siirtää datalevyt emon 1 ja 2 paikkoihin ja root- levy 3 tai 4 paikkaan ASM1061 taakse.

Nyt viimeksi päivittelin koneen 24.2. ja virhe ilmestyi 26 päivä molemmille levyille noin reilun tunnin välein (17:49, 19:13). Ja taas 1.3. (11:26, 12:42).
Jännä sinänsä että on noinkin paljon aikaeroa koska levyt on samassa peilissä.
Mutta kun ei tule heti niin kalenteriaikaa kuluu.

Muita ideoita ?
 
Kyllä tuo enemmän SATA yhteyden ongelmalta haisee kuin levyn virheilyltä. Heitäs smart tiedot kaikista levyistä.

esim.
smartctl -a /dev/sda
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.8.0-101-generic] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: HGST Ultrastar He6
Device Model: HGST HUS726060ALA640
Serial Number:
LU WWN Device Id: 5 000cca 232c3f87e
Firmware Version: AHGNT1EN
User Capacity: 6,001,175,126,016 bytes [6.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database 7.3/5528
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Mar 2 16:25:35 2026 EET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 57) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 882) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 131 131 054 Pre-fail Offline - 87
3 Spin_Up_Time 0x0007 194 194 024 Pre-fail Always - 535 (Average 495)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 102
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 130 130 020 Pre-fail Offline - 12
9 Power_On_Hours 0x0012 088 088 000 Old_age Always - 86687
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 102
22 Helium_Level 0x0023 100 100 025 Pre-fail Always - 100
192 Power-Off_Retract_Count 0x0032 097 097 000 Old_age Always - 3883
193 Load_Cycle_Count 0x0012 097 097 000 Old_age Always - 3883
194 Temperature_Celsius 0x0002 253 253 000 Old_age Always - 20 (Min/Max 12/40)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 19554 -
 
Virheitä antavat levyt on uudempia ja vaihdoin siksi että vanhoilla alkoi tulla vastaavaa virhettä. Toivottavasti ei ollut huono tuuri ja molemmat viallisia...
Näistä siis vain 2 viimeistä (ata5 sdd, ata6 sde) antaa virheitä. Vanhatkin levyt ovat vielä tallessa ja niilläkin voi tietysti testata.
Hyvä kun joku muukin katsoo ja varmistaa, helpostihan noissa jokin jää huomaamatta.

Boottilevy (sda):
Koodi:
smartctl -a /dev/sda
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.18.5+deb13-amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 870 EVO 500GB
Serial Number:    S62BNF0R106379R
LU WWN Device Id: 5 002538 f4111dcda
Firmware Version: SVT02B6Q
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        In smartctl database 7.3/5528
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Mar  2 16:42:13 2026 EET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x53) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  85) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   097   097   000    Old_age   Always       -       11362
 12 Power_Cycle_Count       0x0032   098   098   000    Old_age   Always       -       1038
177 Wear_Leveling_Count     0x0013   099   099   000    Pre-fail  Always       -       13
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   073   057   000    Old_age   Always       -       27
195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       32
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       17707385981
252 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       445

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
  256        0    65535  Read_scanning was never started
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

ata3 (sdb)
Koodi:
smartctl -a /dev/sdb
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.18.5+deb13-amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST4000VN008-2DR166
Serial Number:    ZDHALL9S
LU WWN Device Id: 5 000c50 0e3203e79
Firmware Version: SC60
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5980 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database 7.3/5528
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Mar  2 16:44:24 2026 EET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  591) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 636) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x50bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   077   064   044    Pre-fail  Always       -       48997256
  3 Spin_Up_Time            0x0003   094   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       19
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   088   060   045    Pre-fail  Always       -       628151968
  9 Power_On_Hours          0x0032   058   058   000    Old_age   Always       -       37244 (248 21 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       20
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   073   064   040    Old_age   Always       -       27 (Min/Max 26/31)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       90
193 Load_Cycle_Count        0x0032   094   094   000    Old_age   Always       -       13919
194 Temperature_Celsius     0x0022   027   040   000    Old_age   Always       -       27 (0 22 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       37157h+58m+16.584s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       8921699800
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       424832578037

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     35170         -
# 2  Extended offline    Completed without error       00%     30496         -
# 3  Short offline       Completed without error       00%     30487         -
# 4  Short offline       Completed without error       00%     30463         -
# 5  Short offline       Completed without error       00%     30439         -
# 6  Short offline       Completed without error       00%     30415         -
# 7  Short offline       Completed without error       00%     30391         -
# 8  Short offline       Completed without error       00%     30367         -
# 9  Short offline       Completed without error       00%     30343         -
#10  Extended offline    Completed without error       00%     30328         -
#11  Short offline       Completed without error       00%     30319         -
#12  Short offline       Completed without error       00%     30295         -
#13  Short offline       Completed without error       00%     30271         -
#14  Short offline       Completed without error       00%     30247         -
#15  Short offline       Completed without error       00%     30223         -
#16  Short offline       Completed without error       00%     30199         -
#17  Short offline       Completed without error       00%     30175         -
#18  Extended offline    Completed without error       00%     30160         -
#19  Short offline       Completed without error       00%     30151         -
#20  Short offline       Completed without error       00%     30127         -
#21  Short offline       Completed without error       00%     30103         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

ata4 (sdc)
Koodi:
smartctl -a /dev/sdc
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.18.5+deb13-amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST4000VN008-2DR166
Serial Number:    ZDHALK59
LU WWN Device Id: 5 000c50 0e3255595
Firmware Version: SC60
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5980 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database 7.3/5528
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Mar  2 16:45:04 2026 EET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  581) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 603) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x50bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   077   064   044    Pre-fail  Always       -       48719111
  3 Spin_Up_Time            0x0003   094   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       19
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   088   060   045    Pre-fail  Always       -       605660452
  9 Power_On_Hours          0x0032   058   058   000    Old_age   Always       -       37231 (232 193 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       20
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   072   063   040    Old_age   Always       -       28 (Min/Max 27/33)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       85
193 Load_Cycle_Count        0x0032   094   094   000    Old_age   Always       -       13925
194 Temperature_Celsius     0x0022   028   040   000    Old_age   Always       -       28 (0 22 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       37147h+09m+54.027s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       12708898912
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       409761921321

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     35156         -
# 2  Extended offline    Completed without error       00%     30482         -
# 3  Short offline       Completed without error       00%     30473         -
# 4  Short offline       Completed without error       00%     30449         -
# 5  Short offline       Completed without error       00%     30425         -
# 6  Short offline       Completed without error       00%     30401         -
# 7  Short offline       Completed without error       00%     30377         -
# 8  Short offline       Completed without error       00%     30353         -
# 9  Short offline       Completed without error       00%     30329         -
#10  Extended offline    Completed without error       00%     30314         -
#11  Short offline       Completed without error       00%     30305         -
#12  Short offline       Completed without error       00%     30281         -
#13  Short offline       Completed without error       00%     30257         -
#14  Short offline       Completed without error       00%     30233         -
#15  Short offline       Completed without error       00%     30209         -
#16  Short offline       Completed without error       00%     30185         -
#17  Short offline       Completed without error       00%     30161         -
#18  Extended offline    Completed without error       00%     30146         -
#19  Short offline       Completed without error       00%     30137         -
#20  Short offline       Completed without error       00%     30113         -
#21  Short offline       Completed without error       00%     30089         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

ata5 (sdd)
Koodi:
smartctl -a /dev/sdd
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.18.5+deb13-amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST4000VN006-3CW104
Serial Number:    WW65X0ND
LU WWN Device Id: 5 000c50 0fa98b591
Firmware Version: SC60
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database 7.3/5528
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Mar  2 16:45:36 2026 EET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 467) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   064   006    Pre-fail  Always       -       955402
  3 Spin_Up_Time            0x0003   096   096   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       148
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   080   060   045    Pre-fail  Always       -       111493709
  9 Power_On_Hours          0x0032   093   093   000    Old_age   Always       -       6860
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       148
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   067   062   040    Old_age   Always       -       33 (Min/Max 31/36)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       309
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       426
194 Temperature_Celsius     0x0022   033   040   000    Old_age   Always       -       33 (0 24 0 0 0)
195 Hardware_ECC_Recovered  0x001a   100   064   000    Old_age   Always       -       955402
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       6824 (46 23 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       17926609120
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       89809133037

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      4788         -
# 2  Conveyance offline  Completed without error       00%      4779         -
# 3  Extended offline    Completed without error       00%       114         -
# 4  Short offline       Completed without error       00%       103         -
# 5  Short offline       Completed without error       00%        79         -
# 6  Short offline       Completed without error       00%        55         -
# 7  Short offline       Completed without error       00%        31         -
# 8  Short offline       Completed without error       00%         7         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

ata6 (sde)
Koodi:
smartctl -a /dev/sde
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.18.5+deb13-amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST4000VN006-3CW104
Serial Number:    WW65X0F4
LU WWN Device Id: 5 000c50 0fa98b7a8
Firmware Version: SC60
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database 7.3/5528
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Mar  2 16:46:19 2026 EET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 478) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   082   064   006    Pre-fail  Always       -       167691416
  3 Spin_Up_Time            0x0003   096   096   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       14
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   075   060   045    Pre-fail  Always       -       28244130
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       1408
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       14
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   067   065   040    Old_age   Always       -       33 (Min/Max 31/35)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       45
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       71
194 Temperature_Celsius     0x0022   033   040   000    Old_age   Always       -       33 (0 26 0 0 0)
195 Hardware_ECC_Recovered  0x001a   082   064   000    Old_age   Always       -       167691416
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       1399 (44 193 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       8208489098
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       27581059112

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      1221         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Kyllä mä vaan entistä enemmän alan kallistua sen puoleen ettei levyissä oo mitään vikaa vaan levyohjaimessa.

Mikä se levyohjain mahtaa olla merkiltään ja malliltaan? Löytyy varmaan jostain "lspci" listalta.

Kaikissa levyissä tais olla näyttää tärkeimmät asiat ihan nollaa ongelmien suhteen.
  • Reallocated_Sector_Ct = 0
  • Current_Pending_Sector = 0
  • Offline_Uncorrectable = 0
  • Reported_Uncorrect = 0
Ongelma on itselle melko "tuttu" siltä osin että palvelimessa mulla on käyttis 2 x 240GB SSD levyllä RAID0 pakkana. Syslogiin on myös ilmestyny noita vähä vastaavia SATA hard reseting link herjoja jos jonkin aikaa.
Aikani ongelmaa ihmeteltyäni alko löytyä viitteitä että sillä SATA ohjaimella on ilmeisesti vähä vastaavia ongelmia ollu havaittavissa joittenkin SSD levyjen kanssa. Mene ja tiedä mitkä sit toimii hyvin ja mitkä huonosti
mut levyjärjestelmä on kyllä pysyny silti ehjänä. Ilmeisesti se combo vaan tuon ohjaimen ja kingstonin "halpis" SSD levyjen kanssa ei oo ihan paras mahdollinen.
 
Toki kai noi herjat voi myös liittyä virransäästöön liittyviin asetuksiin.

Testaas muuttaa /etc/default/grub boot parametrejä tälleen:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash ahci.mobile_lpm_policy=1 libata.force=noncq"

Sit vaan update-grub perään ja boottaa kone ympäri toivoen että ongelma on poissa. :-)
 
Kone on tämä: https://www.asrock.com/mb/Intel/J5040-ITX/#Specification
Ohjain josta nuo virheet on tulleet on siis emolla oleva ASM1061.
Lisäkorttina on 2- paikkainen ASM1062 joka ei ole antanut virheitä. Kummallakin on sama ajuri lspci mukaan.
Katsoin myös että bioksessa on aspm auto ja lspci näyttää molemmille LnkCap: ASPM not supported, LnkCtl: ASPM disabled.

Levyt on olleet aina samoissa paikoissa, toinen pakka lisäkortissa, toinen emon 1061 ohjaimessa.

Kokeilin nyt ensimmäisenä jättää boottissd emon 1061 porttiin ja datalevyt "natiivi/mikälie" portteihin. Toinen pakka tietysti edelleen lisäkortissa kuten ennenkin. Katsotaan tällä muutama päivä tuleeko virheitä.

Ihan hyvin kyllä mahdollista että jonkun päivityksen mukana virransäästö/tms asetukset on muuttuneet ja siksi käy kuten käy.

Löytyy 6 paikkaa kuten pitääkin, noista host0, 1, 4 ja 5 on emolla
Koodi:
ll /sys/class/scsi_host/

lrwxrwxrwx 1 root root 0 2026-03-02 18:58 host0 -> ../../devices/pci0000:00/0000:00:12.0/ata1/host0/scsi_host/host0
lrwxrwxrwx 1 root root 0 2026-03-02 18:58 host1 -> ../../devices/pci0000:00/0000:00:12.0/ata2/host1/scsi_host/host1
lrwxrwxrwx 1 root root 0 2026-03-02 18:58 host2 -> ../../devices/pci0000:00/0000:00:13.0/0000:01:00.0/ata3/host2/scsi_host/host2
lrwxrwxrwx 1 root root 0 2026-03-02 18:58 host3 -> ../../devices/pci0000:00/0000:00:13.0/0000:01:00.0/ata4/host3/scsi_host/host3
lrwxrwxrwx 1 root root 0 2026-03-02 18:58 host4 -> ../../devices/pci0000:00/0000:00:13.3/0000:04:00.0/ata5/host4/scsi_host/host4
lrwxrwxrwx 1 root root 0 2026-03-02 18:58 host5 -> ../../devices/pci0000:00/0000:00:13.3/0000:04:00.0/ata6/host5/scsi_host/host5

Koodi:
01:00.0 SATA controller: ASMedia Technology Inc. ASM1061/ASM1062 Serial ATA Controller (rev 02) (prog-if 01 [AHCI 1.0])
        Subsystem: ASMedia Technology Inc. Device 1060
04:00.0 SATA controller: ASMedia Technology Inc. ASM1061/ASM1062 Serial ATA Controller (rev 02) (prog-if 01 [AHCI 1.0])
        Subsystem: ASRock Incorporation Motherboard

Vain emolla oleva näyttää edes tukevan virransäästöä, voi siis hyvinkin olla siitä kiinni koska on eri kuin noissa muissa:
Koodi:
/sys/class/scsi_host/host0/link_power_management_supported = 0
/sys/class/scsi_host/host1/link_power_management_supported = 0
/sys/class/scsi_host/host2/link_power_management_supported = 0
/sys/class/scsi_host/host3/link_power_management_supported = 0
/sys/class/scsi_host/host4/link_power_management_supported = 1
/sys/class/scsi_host/host5/link_power_management_supported = 1

/sys/class/scsi_host/host0/link_power_management_policy = max_performance
/sys/class/scsi_host/host1/link_power_management_policy = max_performance
/sys/class/scsi_host/host2/link_power_management_policy = max_performance
/sys/class/scsi_host/host3/link_power_management_policy = max_performance
/sys/class/scsi_host/host4/link_power_management_policy = min_power_with_partial
/sys/class/scsi_host/host5/link_power_management_policy = min_power_with_partial

SSD taitaa tykätä että ncq on päällä, seuraava testi voisi olla asettaa noille kahdelle myös max_performance.
Tai pelkkä ahci.mobile_lpm_policy=1 ilman ncq poistoa.
 
Aloin kaivelemaan oman palvelimen dmesg logia ja ne hard reset valittelut on tosiaan edellisen bootin jälkeen loppunu kun laitoin jomman kumman noista grub asetuksista päälle.

Noh mulla on valmiina 2 x 2TB NVMe lätyt odottamassa raudan vaihtoa kuhan ubuntu 26.04 tulee ulos.
 

Statistiikka

Viestiketjuista
301 661
Viestejä
5 133 578
Jäsenet
82 027
Uusin jäsen
raakak

Hinta.fi

Back
Ylös Bottom