xMorpheousx416 Posted May 16, 2023 Share Posted May 16, 2023 (edited) On 15/05/2023 at 11:35, xMorpheousx416 said: * I had a thermistor go bad in one power supply, and the fan wouldn't kick on when under load.. you could smell it getting hot and that was the only indicator to let things cool down. I ended up putting a different fan in it, and plugged that fan directly into the motherboard headers and let it run full speed. Power supplies are, or were, thermally controlled by a thermistor that's kinda glued to the largest heatsink. Just like there's a thermistor in the vents of AC units to control when to shut the machine off when the room cools down to a certain degree. On 16/05/2023 at 07:59, neufuse said: Well it may be my graphics card.... moved it to another system that has a 1000 watt psu... ran the same CUDA computations and boom it went off too same issue, computer would not turn back on, tried something different, pulled the GPU when it wouldnt turn on and boom it came on... let the GPU sit out then put back in and it worked again That's what's nice about having spare parts. You can do a quick swap, test, and see which component is causing the issue. I find it interesting that neither computer will power on once the card fails... as usually, you'll see fans turn on, maybe hear that typical beep as it powers up (if you have that piezo speaker attached) but you won't see anything on the screen. Seems whatever is overheating in the card, shorts to ground(?).. but cools off quickly enough to power on after a few mins.* I'm also questioning these Cuda calculations... what is it that you're running that's scaring the crap out of the GPU? * Back in the Vista days, HP had released a series of laptops that, for some reason, in manufacturing.. they used a lead based solder instead of silver.. and the connection points between the GPU and the board, would melt under high heat causing a massive short. Spent a good amount of my time as their tech replacing said machines. Sikh 1 Share Link to comment Share on other sites More sharing options...
neufuse Veteran Posted May 16, 2023 Author Veteran Share Posted May 16, 2023 On 16/05/2023 at 15:05, xMorpheousx416 said: Power supplies are, or were, thermally controlled by a thermistor that's kinda glued to the largest heatsink. Just like there's a thermistor in the vents of AC units to control when to shut the machine off when the room cools down to a certain degree. That's what's nice about having spare parts. You can do a quick swap, test, and see which component is causing the issue. I find it interesting that neither computer will power on once the card fails... as usually, you'll see fans turn on, maybe hear that typical beep as it powers up (if you have that piezo speaker attached) but you won't see anything on the screen. Seems whatever is overheating in the card, shorts to ground(?).. but cools off quickly enough to power on after a few mins.* I'm also questioning these Cuda calculations... what is it that you're running that's scaring the crap out of the GPU? * Back in the Vista days, HP had released a series of laptops that, for some reason, in manufacturing.. they used a lead based solder instead of silver.. and the connection points between the GPU and the board, would melt under high heat causing a massive short. Spent a good amount of my time as their tech replacing said machines. they are physics calculations, it basically maxes out the CUDA cores on this card and VRAM to do physics simulations, contemplated moving to a 4090 (since there are 4x about as many cores on it), but haven't done it yet. Sikh and xMorpheousx416 2 Share Link to comment Share on other sites More sharing options...
xMorpheousx416 Posted May 16, 2023 Share Posted May 16, 2023 Well, at least we were able to help narrow down which component was having the issue. Link to comment Share on other sites More sharing options...
farmeunit Posted May 16, 2023 Share Posted May 16, 2023 I try to always keep a spare PSU around. That's the most common issue I have had personally. Glad you got it figured out, though. If you can get a new card, I would definitely do that. There might be a way to fix the issue, but ultimately, if you can use the performance uplift, it's worth it in the end. Sikh 1 Share Link to comment Share on other sites More sharing options...
neufuse Veteran Posted June 23, 2023 Author Veteran Share Posted June 23, 2023 welp... this again... Upgraded my PSU on this system to a HX1200 and put in a MSI 4090 and used Corsairs 12VHPWR cable not the one included with the card and once again... under high load the system just shuts off randomly and wont turn on until you disconnect the PSU from the motherboard and let it sit.. starting to think I have a bad motherboard... ugh Link to comment Share on other sites More sharing options...
Mindovermaster Global Moderator Posted June 23, 2023 Global Moderator Share Posted June 23, 2023 On 23/06/2023 at 12:56, neufuse said: welp... this again... Upgraded my PSU on this system to a HX1200 and put in a MSI 4090 and used Corsairs 12VHPWR cable not the one included with the card and once again... under high load the system just shuts off randomly and wont turn on until you disconnect the PSU from the motherboard and let it sit.. starting to think I have a bad motherboard... ugh Unfortunately, that's what it sounds like Link to comment Share on other sites More sharing options...
xMorpheousx416 Posted June 23, 2023 Share Posted June 23, 2023 (edited) Let's try something. Keep the case open. Have a fan blowing directly on the motherboard, focusing on the power supply circuitry around the CPU. Those are thermally protected, and if they get past their specs, they will shut down the computer. If you can run under load, and keeping that area as cool as possible... you'll have your answer. But, I think you've nailed it already. EDIT: If you do have to replace the motherboard, I'd strongly recommend not getting the exact same model. Could just run right into the same issue(s) again. Link to comment Share on other sites More sharing options...
neufuse Veteran Posted June 23, 2023 Author Veteran Share Posted June 23, 2023 On 23/06/2023 at 14:24, xMorpheousx416 said: Let's try something. Keep the case open. Have a fan blowing directly on the motherboard, focusing on the power supply circuitry around the CPU. Those are thermally protected, and if they get past their specs, they will shut down the computer. If you can run under load, and keeping that area as cool as possible... you'll have your answer. But, I think you've nailed it already. EDIT: If you do have to replace the motherboard, I'd strongly recommend not getting the exact same model. Could just run right into the same issue(s) again. case was never closed, I've been doing all my work with the sides off and it laying flat so heat goes straight up, temps going by my monitoring never got higher then they should either.. and there are 3 front fans running at 100% over the motherboard and the back exhaust fan also running at that Link to comment Share on other sites More sharing options...
xMorpheousx416 Posted June 23, 2023 Share Posted June 23, 2023 Okay... but temp controllers and software don't monitor the power chokes that keep the electricity stable for CPU consumption. I had a 462 based motherboard literally go up in smoke because those circuits got too hot... and even if there's a microscopic gap in the soldering, that can cause serious heat issues under load. Air cooling isn't going to stop it if it's not blowing directly on it. I don't recommend anyone doing this, but, my dumb self would actually touch the aluminum heatsink right as the comp turns off... if I roast the tip of my finger off, there's your culprit. Don't do that! But, that's my final guess... you've eliminated every other component. Link to comment Share on other sites More sharing options...
neufuse Veteran Posted June 23, 2023 Author Veteran Share Posted June 23, 2023 On 23/06/2023 at 14:37, xMorpheousx416 said: Okay... but temp controllers and software don't monitor the power chokes that keep the electricity stable for CPU consumption. I had a 462 based motherboard literally go up in smoke because those circuits got too hot... and even if there's a microscopic gap in the soldering, that can cause serious heat issues under load. Air cooling isn't going to stop it if it's not blowing directly on it. I don't recommend anyone doing this, but, my dumb self would actually touch the aluminum heatsink right as the comp turns off... if I roast the tip of my finger off, there's your culprit. Don't do that! But, that's my final guess... you've eliminated every other component. I know it's not any specific component on the board overheating, I have a FLIR camera and have checked the board over when it went off in the past there were no excessive hot points on the board when this happens... used to do checks like this with FLIR on chips to figure out which one was bad with "vintage" boards Brandon H 1 Share Link to comment Share on other sites More sharing options...
farmeunit Posted June 23, 2023 Share Posted June 23, 2023 (edited) Did you have a Corsair before? The reason I'm asking is that I had a friend that had a CM model Corsair and after a minute or too of gaming, the system would hang or reboot (can't remember). He went and bought another new PSU (Corair RM series) and had the same issue. Switched to a Thermaltake and had ZERO issues are that. I had never seen that, but both the Corsair units were brand new specifically for that build. I was a big Corsair PSU fan until that and then read a lot of issues with the 2020 RM series, I think it was. I understand the HX is there highest end, just thought I would mention it. I have switched to Seasonic, mostly and have had really good luck with them. I believe it was ONLY gaming, and he did have a Vega 56, so that very well could have been the issue, as well. Just wanted to mention it. Edited June 23, 2023 by farmeunit EDIT: I forgot the whole thread and you're probably right on motherboard. That sounds logical. Link to comment Share on other sites More sharing options...
neufuse Veteran Posted June 24, 2023 Author Veteran Share Posted June 24, 2023 (edited) On 23/06/2023 at 16:36, farmeunit said: Did you have a Corsair before? The reason I'm asking is that I had a friend that had a CM model Corsair and after a minute or too of gaming, the system would hang or reboot (can't remember). He went and bought another new PSU (Corair RM series) and had the same issue. Switched to a Thermaltake and had ZERO issues are that. I had never seen that, but both the Corsair units were brand new specifically for that build. I was a big Corsair PSU fan until that and then read a lot of issues with the 2020 RM series, I think it was. I understand the HX is there highest end, just thought I would mention it. I have switched to Seasonic, mostly and have had really good luck with them. I believe it was ONLY gaming, and he did have a Vega 56, so that very well could have been the issue, as well. Just wanted to mention it. I've always used Corsair PSU's always the HX line though never had issues until I rebuilt this system, the same original PSU (HX750) was used for 5yrs prior with no issues, the rest of the system was 100% new, but now its completely new now that I have a HX1200 also Link to comment Share on other sites More sharing options...
hellowalkman Reporter Posted June 24, 2023 Reporter Share Posted June 24, 2023 is it only happening for GPU (CUDA) intense tasks? Link to comment Share on other sites More sharing options...
neufuse Veteran Posted June 24, 2023 Author Veteran Share Posted June 24, 2023 On 24/06/2023 at 04:52, hellowalkman said: is it only happening for GPU (CUDA) intense tasks? strangely yes Link to comment Share on other sites More sharing options...
xMorpheousx416 Posted June 24, 2023 Share Posted June 24, 2023 So... we've tried a new PSU, a new GPU.. but not a new Mobo or CPU? And... where can I get a Forward Looking Infrared Camera? I thought those were mostly used mounted on aircraft... I know the CPU may not have as much to do with the tasks... but those are the only two left in the box. Bad memory chips would usually just lock up the system, or cause a reboot. Last question I have... another comparison, do you have another computer that you can do these CUDAs on? I'm almost curious enough to try out what you're doing to see if it shuts down my computer. Link to comment Share on other sites More sharing options...
+Matthew S. Subscriber² Posted June 24, 2023 Subscriber² Share Posted June 24, 2023 On 24/06/2023 at 13:28, xMorpheousx416 said: So... we've tried a new PSU, a new GPU.. but not a new Mobo or CPU? And... where can I get a Forward Looking Infrared Camera? I thought those were mostly used mounted on aircraft... I know the CPU may not have as much to do with the tasks... but those are the only two left in the box. Bad memory chips would usually just lock up the system, or cause a reboot. Last question I have... another comparison, do you have another computer that you can do these CUDAs on? I'm almost curious enough to try out what you're doing to see if it shuts down my computer. If I'm not mistaken, he did try another PC with the old gfx card. Link to comment Share on other sites More sharing options...
xMorpheousx416 Posted June 25, 2023 Share Posted June 25, 2023 In that case, and going back to see the error with the PCI-E bus... and there's no BIOS to update to or go back to... looks like it needs to be 86'd. Link to comment Share on other sites More sharing options...
neufuse Veteran Posted July 29, 2023 Author Veteran Share Posted July 29, 2023 so never could figure out why this was happening... replaced graphics cards, psu, even swapped motherboard now.... still happened... then... "USB-C Port overpower limit" message on the boot with a new firmware... never saw that before... then the system did it again... odd because I have no USB-C devices plugged in.. so I disconnected the case USB Headers... hasn't happened since... hellowalkman, d0x360, xMorpheousx416 and 2 others 5 Share Link to comment Share on other sites More sharing options...
+Matthew S. Subscriber² Posted July 29, 2023 Subscriber² Share Posted July 29, 2023 Some times its the simple things. Link to comment Share on other sites More sharing options...
d0x360 Posted July 31, 2023 Share Posted July 31, 2023 (edited) *edit* should have read page 2 lol. I do have a similar issue with 1 USB port but usually whatever is plugged in just doesn't work until I reboot but that's likely the problem on my end too. Unfortunately it's a rear port and it began one day when I was plugging in a hub that just arrived from Amazon and I think it had a small short because there was some weird tiny bugs in the hub. Nothing permanent but it fits... Asus! *End edit* I've had a similar issue on my am4 machine. Asus rog Crosshair viii formula, 5800x, 32 gigs ddr4 with a manual OC from 2133 to 4000mhz CL14-14-14-34 1T and a 4090 all powered by an EVGA supernova 1000w PSU. I also have a 10% overclock on the CPU. The system will either crash or I'll have ###### it down and it just won't boot but my motherboards screen and some of the LEDs are lit and there's no error code. It's done it since day 1 which was launch. I've changed the PSU since and I'm also on a different kit of ram that I spent 2 weeks changing nearly everything memory related and pushing voltage to 1.46v and while it was frustrating it also gave me a nice performance bump. Ive never seen an error that explains it but odd that they are both Asus boards. I personally stopped worrying about it because it seems like a random fluke that happens once every could months. I have a Asus rog x670e extreme board, 7950x3d and 5 m.2 drives, all 2TB 990 pro's and a lian li 011 dynamic Evo case I got nib for $120 with very GPU mount and front io addon. Found it on Amazon the week before Christmas so now I have an extra 011 dynamic (non Evo) case and a Corsair 5000d or 7000d...can't remember but I need to sell them since I don't need them. Anyways my current board is fine, no issues with caps or VRM's. The chipset, CPU & memory are liquidity cooled so it's not temps. It's a minor annoyance but if it happens on the new board which has a 14 layer PCB and is so heavy it feels like you could kill things with it but more layers means better memory stability so I'll be pushing whatever kit I bought as far as I can. I ran memtest86 for 24 hours then a full system stress test for 8 hours with my loops pump speed at 3/4 speed and I knocked 600 rpm off the rad fans to ensure I thermal throttled but still no errors or issues other than the CPU not running above base clocks. I did lower my memory to 3600mhz and set timings to 20-20-20-42 but it made no difference aside from a loss in performance. Weird.. probably Asus randomly spiking something with too much power but not enough to damage anything. Edited July 31, 2023 by d0x360 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now