Recommended Posts

I'm having a very weird and annoying problem with my network all of a sudden. This is my network setup:

 

  • pfSense running in Hyper-V
  • 2 dedicated NIC's - one for WAN and one for LAN
  • 4 Wireless AP's - all configured with static addresses and have DHCP disabled so that all wireless clients get their addresses from pfSense instead of the AP's

 

So the problem is, many of my wireless devices particularly phones, start reporting no internet access at least once a day. Other wireless devices like laptops, Fire TV Sticks, and all wired devices work perfectly fine. Phones are connected to WiFi, but are actually using the LTE connection. Checking the IP address confirms this. However if I turn off LTE, the phone again has internet from my home connection, although Android continues to report no internet access! Rebooting the phone doesn't help. Rebooting pfSense makes the problem go away immediately!

 

Static IP's for all devices configured in pfSense itself, except for the Wireless AP's. Everything was working perfectly fine for a few months, this problem only started about a week back. pfSense is on the latest version with the recent security updates applied via Shell.

You understand that pfsense doesn't know the difference between a wireless client and a wired client right.. So if your saying wired clients have no problems - why would you think it has anything to do with pfsense?

 

How exactly are you checking the IP on the device?  If you were on wifi check the wifi connection - do you have an IP... Can you ping pfsense IP (your gateway).. Can you do dns query.. Going to need a bit more to go on than device saying it has no internet.

1 minute ago, BudMan said:

You understand that pfsense doesn't know the difference between a wireless client and a wired client right.. So if your saying wired clients have no problems - why would you think it has anything to do with pfsense?

 

How exactly are you checking the IP on the device?  If you were on wifi check the wifi connection - do you have an IP... Can you ping pfsense IP (your gateway).. Can you do dns query.. Going to need a bit more to go on than device saying it has no internet.

Yes, true. I know to pfSense they are all the same. Reason why I thought it had to do something with pfSense is because as soon as I reboot, everything goes back to normal with all clients reporting a successful connection.

 

I check the internal IP on Android by going to Settings/WiFi and finally the Network Details section for the wireless network I'm connected to. It reports all the right details - the static IP I have configured for the particular device, the Gateway points to pfSense and DNS points to my Pi-hole. And I check the external IP by going to www.whatsmyip.org

 

Yes, I am able to ping everything, from and to. Including pfSense. But I can only do this if I turn off LTE (obviously). That is what is driving me crazy, my phone and some other wireless devices suddenly report no internet access. And I know it is actually not working because I try pinging and accessing my home LAN when it shows this, but nothing works. When I disable LTE, everything works fine, although Android still reports no internet. And it stays that way until I reboot pfSense. As soon as pfSense has started up fully after the reboot, all devices immediately report a full and successful connection.

It cannot be something wrong with any of the AP's as other wireless devices like laptops continue functioning without issues.

How do you have this all connected?

 

When you say static - you mean dhcp reservation.. So it can get a new IP, ie renew the lease?

 

With wireless you could be running into state problem?  Phones like to sometimes switch back and forth between LTE and Wifi and don't always renew states..

 

So you have your wireless segmented different than your lan - because pinging lan stuff would have nothing to do with pfsense if your pinging by IP and not name, etc.

 

I would suggest using say the HE tools on your phone to troubleshoot pinging and dns problems.. They make the app for both iphone and android.

https://networktools.he.net/

 

Do a packet capture on pfsense when you do these test, check its state table, etc.  Many of these devices validate internet connectivity via dns queries.  Maybe something is failure there?

 

Are you running vlans on your hyper-v setup - I have heard rumors that vlans out of the blue fail on hyper-v unless running 2016... But have not had any issues with 2012 or win10 hyper-v in my testing of vlan support... which you have to do via powershell, etc.

  1. Network connections go like this: pfSense - Managed Switch - Wireless AP's.
  2. Yes, DHCP reservations for all devices in pfSense. Except for the AP's. I've tried restarting the DHCP server and and also done Renew DHCP. Although renewing won't make a difference right, as they are all assigned via static?
  3. No, 3 of the AP's are for my primary network, the same one on which my wired devices are. Only 1 AP is serving 2 additional networks (for IoT and Guests) that are VLAN tagged.
  4. I did try renewing states, didn't work. I can try again when some of the devices go down.

 

 

It was all working perfectly for the last few months, this problem has started about a week back. And I didn't make any changes in pfSense other than a couple of port forwards which obviously won't cause this issue. No changes to the AP's as well.

 

Should I try deleting the static address for a device the next time it goes down and let it get one dynamically? That should confirm whether or not it is a states issue right? Although I have tried rebooting the phones, no difference.

But does your lease renew, do you see the new time lease time.. This would validate connectivity is working.

 

You say you can not talk to other devices on another segment... I would suggest you packet capture on pfsense and validate pfsense actually sees the traffic, and does it send it on to the device, etc.

 

Your pfsense is hyper-v maybe something is going wrong with that... How exactly do you have pfsense vnics connected to your physical network?  Did you create different vswitches for each vlan that are tied to different physical nics.. Or did you do vlan in hyper-v, etc. etc..

 

Have you made any updates to windows that this hyper-v is running on... They made 1809 available awhile back, and then a new build even 386 or something... Vs the initial 316 build.

Sure! Will try out the HE tools on phone as soon as it goes down.

 

Yes, I am using VLAN's, but in a non traditional way. Meaning I didn't do anything in Powershell, I just have additional virtual adapters in the pfSense VM with the tags I need. And those adapters are added in pfSense under Interfaces. No VLAN settings in pfSense in this setup.

 

I didn't make note of the new lease time, will check this. My primary network can communicate with the other networks, but not the other way around. I have the default allow all rule on the primary LAN, and only internet access on the other networks.

 

I have 2 physical NIC's on this machine. One is dedicated to WAN and the other is LAN. And all  other virtual adapters in the pfSense VM are tied to the physical LAN NIC. VLAN tags are specified directly in the virtual NIC properties under the pfSense VM.

 

pfsense-vm.thumb.jpg.0980332777789c817acc7ed1abcbd5e4.jpg

Edited by The Dark Knight

Windows is fully updated on the latest publicly available build. Not part of the Insider Programme. pfSense also has the recently released security updates that are to be installed via Shell.

So yes a sniff would be most informative to if pfsense is seeing the traffic or not.

 

So you created an interface in pfsense that is tied to your lan vswitch, which is an access vswitch out of the box you sure its not stripping tags?

 

When you say all gets fixed when you reboot pfsense, are you just rebooting the VM or the whole hyper-v box?

Ok sure, will do a sniff once it's down to check!

 

Regarding VLAN's, my primary network is on the default VLAN of the Managed Switch, so I haven't specified any tag in the virtual adapter for it. The other VLAN's like IoT for example have a tag. All devices are getting IP's and working on the network I specify. And none of them are able to communicate with my primary network. Have verified this with pings and device discovery. No devices other than the ones I have set for a particular network show up or respond to. So VLAN's are no issue at all, working beautifully.

 

I just reboot the VM. Only reboot Windows for applying updates.

Update:

 

Ok, so I just did all the tests on phone using HE Tools and pfSense:

 

  1. Deleted static mapping for phone and restarted WiFi. Phone got a new dynamic lease immediately from pfSense. Confirmed time of lease to be today morning itself.
  2. Tested Ping from phone to pfSense, Pi-hole, AP's and a few external sites. All worked fine.
  3. Tested DNS from phone to pfSense, Pi-hole and a few external sites. All fine as well. However DNS queries to AP's didn't work. Is this normal?
  4. Did packet capture on pfSense with the following settings - LAN Interface, Promiscuous mode, IPv4 only, Any Protocol, Host address set to IP of phone, other settings on default. All working fine - pfSense capture results matched up with whatever I was doing on phone.

 

So everything seems to be working just fine, except that the phone continues to report no internet connectivity! :dizzy::dizzy: The only test that didn't work is what I've highlighted in red. I don't know whether it is supposed to be this way or not. The AP's are set to AP only mode, and I've manually specified the IP settings. So for example, IP is 192.168.1.3, Subnet Mask is 255.255.255.0, Gateway is 192.168.1.1 and DNS is 192.168.1.2 (Pi-hole).

Sorry! Here are the results of the packet capture.

Tests performed - Ping and DNS to pfSense, Pi-hole, 1 AP and 1 Desktop.

Settings used - LAN Interface, Promiscuous mode, IPv4 only, Any Protocol, Host address set to IP of phone, other settings on default.

 

09:44:40.434341 IP 192.168.1.207 > 192.168.1.1: ICMP echo request, id 657, seq 1, length 64
09:44:40.434525 IP 192.168.1.1 > 192.168.1.207: ICMP echo reply, id 657, seq 1, length 64
09:44:40.450003 IP 192.168.1.207 > 192.168.1.1: ICMP echo request, id 658, seq 1, length 64
09:44:40.450209 IP 192.168.1.1 > 192.168.1.207: ICMP echo reply, id 658, seq 1, length 64
09:44:40.461569 IP 192.168.1.207 > 192.168.1.1: ICMP echo request, id 659, seq 1, length 64
09:44:40.461758 IP 192.168.1.1 > 192.168.1.207: ICMP echo reply, id 659, seq 1, length 64
09:45:13.020804 IP 192.168.1.207.37685 > 216.58.196.163.80: tcp 0
09:45:13.062081 IP 216.58.196.163.80 > 192.168.1.207.37685: tcp 0
09:45:16.044485 IP 192.168.1.207.37597 > 172.217.26.202.80: tcp 0
09:45:16.084751 IP 172.217.26.202.80 > 192.168.1.207.37597: tcp 0

On 3/4/2019 at 9:59 PM, The Dark Knight said:

However DNS queries to AP's didn't work. Is this normal?

Why would an AP answer dns query?  Its an AP.. No AP do not answer dns queries.. Never have, Never would -- think your thinking of a wifi router.. So while you might be using your old wifi router as an AP, and it has name server running on it that forwards... Normally they would forward out their wan connection to what they got from their dhcp on their wan, or what you set up in them to forward too..

 

But if using an old wifi router as AP... No dns to them wouldn't work - since they don't have a wan IP.. to use.

 

Only thing you should be using for dns is pfsense in your network - be it wired or wireless clients.

 

If your handing your clients more than just your pfsense IP for dns - ie your also including your AP ip there - you have NO idea which dns a client would ask.. So yeah - if they ask your AP IP then dns would fail, and then internet would fail..

 

If your using pihole - pretty much all things should point to your pihole, and it should forward to your pfsense resolver/forwarder (which your using?)  But pihole could block some dns queries that devices use to see if they are online... One of the whitelists you normally add for pihole is www.msftncsi.com

 

That is normally used by windows machines to validate internet access - they try and query that... If they don't get an answer then internet is broke ;)  Not sure what other devices check with.  But if your pihole blocked that for example - then internet could be broken for that device.. Or they ask your AP and it doesn't respond, etc.

 

A google shows that android might check for this

connectivitycheck.android.com

 

But not 100% on that - my son has android phone, but have never needed to look into that much.. Could also check these for example

http://clients1.google.com/
http://clients3.google.com/
http://connectivitycheck.gstatic.com
http://connectivitycheck.android.com

 

Guess they check that for 204 code response.. But if fails dns, could never even ask.. If resolves and can not get a 204 query then should popup a captive portal login page, etc. etc.. But if does not even resolve not sure... Again I hvae not had to look into any of the ways that android devices check for internet access.  But pointing to your AP for dns is going to be hit or miss for dns failures - don't do that! ;)
 

 

 

 

Thanks for the detailed reply BudMan!

 

Yes, all 4 of my AP's are actually WiFi routers. But DNS is disabled on them as selecting AP mode in the router config does all this. All 4 AP's are pointing to Pi-hole for DNS, and Pi-hole has pfSense IP set in Upstream DNS providers. In pfSense I am using DNS Resolver. And upstream DNS server in pfSense under General Setup is Cloudflare.

 

I checked all the URL's you mentioned in Pi-hole, they are all either already in the Whitelist, or don't exist in the Blacklist or any of the block lists. Windows and Linux machines on my network have no issues at all with connectivity, it seems to be Android only. Although Fire TV is Android based, and that has no issues either. Then again, it is a heavily modified version of Android so probably that's why. I also queried the Android specific URL's you mentioned on my phone, they just give a 404 error.

 

But that's just it, the whole thing is very weird. Android reports no connectivity but everything works (as long as I disable LTE). And if I reboot pfSense, all is fine again....for like half a day. :dizzy:

On 3/6/2019 at 8:14 AM, The Dark Knight said:

All 4 AP's are pointing to Pi-hole for DNS

Do NOT point your clients to your AP for dns... Why would your AP even need dns?

 

Your android should only point to your pihole or pfsense.. Are you blocking or redirecting dns in pfsense? 

 

I would sniff your wifi network when you connect this phone to it and see what it does to check if internet.. Create a isolated wifi network with its own ssid for ease of sniffing.

 

I don't have any android to play with... My son's phone uses my wifi all the time when he is over here.. Next time he is here I will try and take a closer look, and sniff for his traffic to see if I can see what its doing.

No no, my client devices are not pointing to AP's for DNS, only Pi-hole. Checking DNS address on any device confirms this, they all show Pi-hole. Also did nslookup. ;)

 

Yes, I am redirecting all DNS to Pi-hole using NAT.

 

But I actually removed the pfSense VM and setup from scratch a few hours back. Did this also to verify whether I had unknowingly messed up something somewhere which was causing this issue. Working fine till now. If by tomorrow morning it remains fine, then I'm fine! :laugh:

55 minutes ago, The Dark Knight said:

redirecting all DNS to Pi-hole using NAT.

That for sure could cause you info.. Where it won't answer if your forwarding to your pi that is on the same network as your client.. Because the answer will come from the PI ip, vs where it asked.

 

So example

pfsense 192.168.1.1

pihole 192.168.1.100

client 192.168.1.90

 

Client wants to ask 8.8.8.8, and you redirect it at pfsense to 192.168.1.100, the pi will directly answer .90.. But client will say WTF is that, I asked 8.8.8.8

 

If you redirect to loopback on pfsense, so the answer will look lt it comes back from 8.8.8.8, if you want to directly forward to your pi - then you need the pihole to be on a different network than the client network, say 192.168.2/24 vs your 192.168.1/24

I actually have the NAT rule setup like this:

 

dnsredirect.thumb.png.015601da1b0d107793abbcf4c210f09a.png

 

I followed this guide from Netgate for this:

https://docs.netgate.com/pfsense/en/latest/dns/redirecting-all-dns-requests-to-pfsense.html

 

Was working perfectly fine all these months. But now curious...how do I do it as a loopback? Someone on Reddit in a discussion on bypassing hardcoded DNS mentioned adding a masquerade rule so that devices will think they using their own DNS, but I found no way to do it in pfSense.

Nope you didn't follow it ;)

 

redirect.thumb.png.b04003212f910456d2dd4c3b47f2e6c5.png

 

You prob thought it was working because it wasn't actually be used ;)

 

I can tell for sure that is not going to work, unless your client has no problem with getting answer from 10.2.. You need to use loopback address for your resolver.  But then you can not redirect to your pihole.  If you want to forward to your pihole, then you need to put the pihole on its own vlan that is different than your clients doing the asking.

 

127.0.0.1 is loopback..

 

I can show you a sniff why this doesn't work if you want.

 

edit:  Here you go... So I setup redirect to my pi... 3.10, so from client 3.31 I ask 8.8.8.8 for dns.. It gets redirect to 3.10... And he says oh your 3.31 - yeah let me answer you.. But your clients is going to say.. What I asked 8.8.8.8

redictsniff.thumb.png.8df3017f89e12dd0078db1d74d0713d1.png

 

  • Like 2

Ah crap! 😂

 

Cool, I'll put it on its own VLAN then. I should have just asked you in the beginning itself! 😆😎👍

If you need help just ask.

 

Problem is you create an asymmetrical routing issue when you forward to IP on the same network as the asking client... When forward to loopback, and unbound answers the client don't see the difference..

 

So for example... I set forward to loopback, and then asked for something only local would know, not google ;)  See it how it looks like it came back from 8.8.8.8

 

loopredirect.thumb.png.6709cce28d48f90b6b6d86bd7c6a6b24.png

 

  • Like 1
This topic is now closed to further replies.