neufuse Veteran Posted May 13, 2023 Veteran Share Posted May 13, 2023 Ok this one was odd... relatively new system... Soecs ASUS ROG Strix Z790-A G. Skill RiPjaws S5 DDR5 6000 16x4 (64GB Total) ASUS GeForce 3060 RTX Corsair HX750 PSU Corsair H115i CPU Cooler 4x nvme SSDs all Samsung 980 Pros this system had zero issues for a couple months now, but today I set it to do a GPU intensive task (CUDA only) it ran for hours, then it just blipped off No power, pressed the power button nothing, turned the PSU power switch off let it set a minute turned back on, pressed power, nothing unplugged replugged, nothing wouldnt turn on so I pulled the ATX connector, did the jumper Power on to ground pins and the PSU came on checked all the voltates, +12, -12, 3.3, 5 all came back correct... plugged it back into the motherboard, came back on any idea what would cause this? Strangely now though the LED on the power button is off all the time, it was on before this happened... haven't tested the LED independently yet, but it's odd that it's off now when the system is on... it's plugged into the correct header on the motherboard and has the correct polarity. Link to comment Share on other sites More sharing options...
neufuse Veteran Posted May 13, 2023 Author Veteran Share Posted May 13, 2023 power LED header on the motherboard is showing 0 volts when on Link to comment Share on other sites More sharing options...
Mindovermaster Global Moderator Posted May 13, 2023 Global Moderator Share Posted May 13, 2023 Sounds like something shorted. Check your motherboard for any bloated/broken caps. That also could be a memory error.. The board specs say: 7800+(OC)/7600(OC)/7400(OC)/7200(OC)/7000(OC)/6800(OC)/6600(OC)/6400(OC)/ 6200(OC)/6000(OC)/5800(OC)/5600/5400/5200/5000/4800 So if you said 6000, that's still an overclock. I suggest you run it at lower overclock and see how that works.. Link to comment Share on other sites More sharing options...
neufuse Veteran Posted May 13, 2023 Author Veteran Share Posted May 13, 2023 (edited) On 13/05/2023 at 13:51, Mindovermaster said: Sounds like something shorted. Check your motherboard for any bloated/broken caps. That also could be a memory error.. The board specs say: 7800+(OC)/7600(OC)/7400(OC)/7200(OC)/7000(OC)/6800(OC)/6600(OC)/6400(OC)/ 6200(OC)/6000(OC)/5800(OC)/5600/5400/5200/5000/4800 So if you said 6000, that's still an overclock. I suggest you run it at lower overclock and see how that works.. the weird thing is XMP had it at 6000 for months with no issues. It's been working fine since it strangely came back on after testing the voltages... Does anyone know if this mobo has resettable fuses? (fuses that arent physically blown but will "blow" virtually based on conditions then reset chemically after time). that's kinda what it felt like, because it was probably 10 minutes after it was off that it just turned on with no issues hellowalkman 1 Share Link to comment Share on other sites More sharing options...
xMorpheousx416 Posted May 13, 2023 Share Posted May 13, 2023 (edited) On 13/05/2023 at 11:49, neufuse said: this system had zero issues for a couple months now, but today I set it to do a GPU intensive task (CUDA only) it ran for hours, then it just blipped off My system just did the exact same thing as yours, not but a couple days ago. I was encoding a full UHD Bluray, using the 3060Ti, HEVC x265 spec (NVEnc) thru Handbrake. It completed the task, but just as I would have normally seen HB finish at 100%, and then go to "Encode Finished", she green screened on me for a second, and shut off. Wouldn't fire back up until I unplugged the PSU for about five mins... and apparently shut down faster than it could write a kernel crash to disk. Set up is similar to yours in a fashion... I use the XMP profiles to run her at 3200MHz (AM4 vs. your AM5)... and I'm kinda guessing that it may have run near peak capacity in regards to the video card. I had just recently updated the drivers as Nvidia seemed to release quite a few here recently. I didn't really do any troubleshooting, cuz, again, like yourself.. the system was purring like a kitten since the first time it was turned on. I would offer a guess, that it could very well have been a heat issue. Voltages being what they are, it ran peak longer than it had done before perhaps? I think AIDA still has a burn in test... I almost never recommend it because it's there to test the ultimate boundaries of your silicon.. and replacing parts isn't cheap. If you can monitor voltages and heat levels, and you're willing to... set up the same scenario and see if it completes the task without shutting down. Edit: According to your mobo manual, she doesn't come with any fuses. Edited May 13, 2023 by xMorpheousx416 added info hellowalkman 1 Share Link to comment Share on other sites More sharing options...
neufuse Veteran Posted May 13, 2023 Author Veteran Share Posted May 13, 2023 (edited) been running prime 95 with max heat for a few hours now and no issues.... ran a very heavy computation on CUDA and after 15 minutes the system started jerking then off again.... starting to think it's the PSU or GPU?... Im currentlign running nvidia drivers 531.61 studio version Link to comment Share on other sites More sharing options...
neufuse Veteran Posted May 13, 2023 Author Veteran Share Posted May 13, 2023 Temps seem to be ok under load for a while, this is with prime95 running at full heat mode No thermal throttling on any of the cores Link to comment Share on other sites More sharing options...
neufuse Veteran Posted May 13, 2023 Author Veteran Share Posted May 13, 2023 GPU doesn't seem like anything wild when CUDA usage is at 100% for a while No thermal throttling on any of the cores Link to comment Share on other sites More sharing options...
neufuse Veteran Posted May 13, 2023 Author Veteran Share Posted May 13, 2023 I am seeing this error a lot in the sys log now.. Log Name: System Source: nvlddmkm Date: 5/13/2023 5:18:28 PM Event ID: 0 Task Category: None Level: Error Keywords: Classic User: N/A Computer: DT-23-CUDANODE1 Description: The description for Event ID 0 from source nvlddmkm cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer. If the event originated on another computer, the display information had to be saved with the event. The following information was included with the event: \Device\Video3 Error occurred on GPUID: 100 The message resource is present but the message was not found in the message table Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="nvlddmkm" /> <EventID Qualifiers="0">0</EventID> <Version>0</Version> <Level>2</Level> <Task>0</Task> <Opcode>0</Opcode> <Keywords>0x80000000000000</Keywords> <TimeCreated SystemTime="2023-05-13T21:18:28.4952725Z" /> <EventRecordID>7365</EventRecordID> <Correlation /> <Execution ProcessID="4" ThreadID="776" /> <Channel>System</Channel> <Computer>DT-23-CUDANODE1</Computer> <Security /> </System> <EventData> <Data>\Device\Video3</Data> <Data>Error occurred on GPUID: 100</Data> <Binary>00000000020030000000000000000000000000000000000000000000000000000000000000000000</Binary> </EventData> </Event> No thermal throttling on any of the cores Link to comment Share on other sites More sharing options...
neufuse Veteran Posted May 13, 2023 Author Veteran Share Posted May 13, 2023 (edited) Looks like about 40 of these warnings before this happened also A corrected hardware error has occurred. Component: PCI Express Root Port Error Source: Advanced Error Reporting (PCI Express) Primary Bus:Device:Function: 0x0:0x1:0x0 Secondary Bus:Device:Function: 0x0:0x0:0x0 Primary Device Name:PCI\VEN_8086&DEV_A70D&SUBSYS_88821043&REV_01 Secondary Device Name: Obviously from an Intel device, not sure what Device A70D is, nothing came back when I looked it up Only reference I found so far is Intel(R) PCIe RC 010 G5 - A70D Gen 5 PCIe bus controller? Edit: found it PCI Express root port is A70D Link to comment Share on other sites More sharing options...
neufuse Veteran Posted May 14, 2023 Author Veteran Share Posted May 14, 2023 Not sure what is up with the forum here but every time I posted something it duplicated content instead of merging it (see the post above) hellowalkman 1 Share Link to comment Share on other sites More sharing options...
Mindovermaster Global Moderator Posted May 14, 2023 Global Moderator Share Posted May 14, 2023 On 13/05/2023 at 21:07, neufuse said: Not sure what is up with the forum here but every time I posted something it duplicated content instead of merging it (see the post above) Might of been in the latest updates, idk.. Link to comment Share on other sites More sharing options...
xMorpheousx416 Posted May 14, 2023 Share Posted May 14, 2023 (edited) Gut is telling me the chipset driver doesn't care for the graphics driver, or vice versa. If you have the latest chipset, give the game driver 531.79 a run for its money. Just throwing this out there, cuz you're one of Neowin's elite users and hardly need my advice, lol... but I'd recommend using DDU to clean up the graphics, cut off your net connection during the process until you get the newest driver installed.. that way MS doesn't try to install the 400 series drivers in the background. Hopefully, it's one of the two. If not.. I'd be leaning towards a bad chipset or GPU. Edit: You mentioned you think it could be the PSU? Have we tried an online PSU calculator to see what your system is drawing/compared to PSU output? Edited May 14, 2023 by xMorpheousx416 xrobwx71 1 Share Link to comment Share on other sites More sharing options...
neufuse Veteran Posted May 14, 2023 Author Veteran Share Posted May 14, 2023 (edited) odd update, the power LED out of no where works today, it would not come on at all yesterday On 13/05/2023 at 23:22, xMorpheousx416 said: Gut is telling me the chipset driver doesn't care for the graphics driver, or vice versa. If you have the latest chipset, give the game driver 531.79 a run for its money. Just throwing this out there, cuz you're one of Neowin's elite users and hardly need my advice, lol... but I'd recommend using DDU to clean up the graphics, cut off your net connection during the process until you get the newest driver installed.. that way MS doesn't try to install the 400 series drivers in the background. Hopefully, it's one of the two. If not.. I'd be leaning towards a bad chipset or GPU. Edit: You mentioned you think it could be the PSU? Have we tried an online PSU calculator to see what your system is drawing/compared to PSU output? but would a chipset driver prevent a system from even powering on, I'd say no since that's well after POST. the motherboard was 100% off dead not even status or power LEDs on the board came on... it was just confusing there are 4 or 5 POST LEDs on this board and none of them came on, until I disconnected the PSU and did a manual voltage test and it was was fine then it just worked after that... just odd Link to comment Share on other sites More sharing options...
xMorpheousx416 Posted May 15, 2023 Share Posted May 15, 2023 On 14/05/2023 at 14:08, neufuse said: but would a chipset driver prevent a system from even powering on, Only if it were a physical problem with the chip(set). However, if we believe neither to be the case... the last finger still points to your PSU. Or, the fact that you unplugged it, plugged it back into the board.. may have been the tight connection it was looking for. Got another PSU to test against, using the same scenario? xrobwx71 and Mindovermaster 2 Share Link to comment Share on other sites More sharing options...
neufuse Veteran Posted May 15, 2023 Author Veteran Share Posted May 15, 2023 On 15/05/2023 at 01:26, xMorpheousx416 said: Only if it were a physical problem with the chip(set). However, if we believe neither to be the case... the last finger still points to your PSU. Or, the fact that you unplugged it, plugged it back into the board.. may have been the tight connection it was looking for. Got another PSU to test against, using the same scenario? got many PSU's problem is this happened once and hasn't again... the odd thing though which just adds to the confusion, the case power LED (on the motherboard there is one and the LED header LED also) didn't come on when the system came back for a day then randomly it started working again... I'm just stumped Link to comment Share on other sites More sharing options...
xMorpheousx416 Posted May 15, 2023 Share Posted May 15, 2023 I've never believed in the "well, it's new, so it must work" idealism. But... if it's not broken, why fix it? Easy... it drives us nuts when we seen a glitch in the matrix and can't figure out what the heck is happening. It could just be a faulty sensor*, LED, BIOS glitch.. or any of the hundreds of capacitors/resistors in between the PSU and that LED. Could have been a loose connection at first, and you fixed it when you plugged things back in... but without a signal tracer, it's tough to track down what's not up to par on the circuit board. Too many metaphors for the day, but let sleeping dogs lie... if you can't replicate the problem now, she's probably stable enough to keep right on running. Unless they're using those LEDs as part of the circuitry, and not just an indicator light.. it should be okay. * I had a thermistor go bad in one power supply, and the fan wouldn't kick on when under load.. you could smell it getting hot and that was the only indicator to let things cool down. I ended up putting a different fan in it, and plugged that fan directly into the motherboard headers and let it run full speed. Mindovermaster 1 Share Link to comment Share on other sites More sharing options...
Kelxin Posted May 15, 2023 Share Posted May 15, 2023 Just as a note, I've RMAd at least 10x as much GSKILL memory than any other brand I've purchased. In the past I've had the ram over heat and cause massive corruption and system shutdowns. Link to comment Share on other sites More sharing options...
Mindovermaster Global Moderator Posted May 15, 2023 Global Moderator Share Posted May 15, 2023 On 15/05/2023 at 15:13, Kelxin said: Just as a note, I've RMAd at least 10x as much GSKILL memory than any other brand I've purchased. In the past I've had the ram over heat and cause massive corruption and system shutdowns. Funny, I only had to RMA one stick. Was DOA.. But since that, I had no problem. Since ~2005, I always used their RipJaw line.. Link to comment Share on other sites More sharing options...
neufuse Veteran Posted May 16, 2023 Author Veteran Share Posted May 16, 2023 On 15/05/2023 at 16:13, Kelxin said: Just as a note, I've RMAd at least 10x as much GSKILL memory than any other brand I've purchased. In the past I've had the ram over heat and cause massive corruption and system shutdowns. I've never had to RMA RAM, video cards though yes, yes and yes again lol Link to comment Share on other sites More sharing options...
neufuse Veteran Posted May 16, 2023 Author Veteran Share Posted May 16, 2023 did it again tonight randomly.... power just blipped off... would not turn on, zero volts at motherboard on +12,+5 and +3.3.... and once again unplugged the PSU and the ground to power on jumper the PSU started right up with all correct voltages.... and the system worked again when reconnected to the PSU...I don't get it, is it the PSU, the motherboard... it's an HX750 PSU, system is maxing out at 375 watts under load so it shouldn't be overloading... going to have to put another PSU on tomorrow and monitor xrobwx71 1 Share Link to comment Share on other sites More sharing options...
Mindovermaster Global Moderator Posted May 16, 2023 Global Moderator Share Posted May 16, 2023 Well, if your spare PSU does the same thing, you know it's your board.. xMorpheousx416 1 Share Link to comment Share on other sites More sharing options...
xMorpheousx416 Posted May 16, 2023 Share Posted May 16, 2023 (edited) On 15/05/2023 at 20:31, neufuse said: it's an HX750 PSU, system is maxing out at 375 watts under load so it shouldn't be overloading... If all it's internal circuitry is working properly. Remember my thermistor story? On 15/05/2023 at 20:59, Mindovermaster said: Well, if your spare PSU does the same thing, you know it's your board.. Yup. Link to comment Share on other sites More sharing options...
neufuse Veteran Posted May 16, 2023 Author Veteran Share Posted May 16, 2023 (edited) On 16/05/2023 at 01:00, xMorpheousx416 said: If all it's internal circuitry is working properly. Remember my thermistor story? Yup. thermistor story? no I don't remember hearing this one Link to comment Share on other sites More sharing options...
neufuse Veteran Posted May 16, 2023 Author Veteran Share Posted May 16, 2023 Well it may be my graphics card.... moved it to another system that has a 1000 watt psu... ran the same CUDA computations and boom it went off too same issue, computer would not turn back on, tried something different, pulled the GPU when it wouldnt turn on and boom it came on... let the GPU sit out then put back in and it worked again xrobwx71 1 Share Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now