Better Ad Blocking Through Pi-Hole and Local Caching
Bear Giles | August 26, 2018Ethical consideration
In an ideal world we would not need to block ads. It hurts small sites that depend upon the revenue to offset hosting and network expenses. These sites are giving me something of value (or else I wouldn’t visit them) and it’s not unreasonable for them to ask for a little in return. Many people rush into ad blocking without considering whether it harms others.
However there’s three serious issues with current ads.
They are often poorly written. Specifically, some ads leak massive amounts of memory – so much that I’ve had my browser crash several times per day. The crashes stop after I turn on javascript and ad blocking. It’s not hard to conclude what was causing the crashes. I’ve tried a less-intrusive approach – disabling javascript – but it breaks some sites despite whitelisting (I’m looking at you, gotomeeting), and some ads with memory leaks still get through occasionally. Maybe browsers will allow us to specify the max. amount of memory used by a webpage someday… but they don’t today so crashes due to bad javascript are a risk.
They may be malware vectors. It’s not common but there have been several instances of ads carrying malware. Again javascript blocking helps but it only protects my desktop. My family’s phones, tablets, and other computers (since I can’t expect others to deal with the hassles of knowing when to whitelist javascript sites) are unprotected.
They’re ineffective. This refers to the harm to the advertiser, not the site owner. I can’t remember ever buying anything through an ad on a website. I can only remember clicking through an ad once in the last few years, and it was from a ‘WTF?’ moment where I wanted to see if the site was actually advertising what I thought they were advertising. I was definitely not their target audience. I don’t want to say I would never buy something though an online ad… just that I’m clearly not their target audience. Is it fair for the advertiser to pay for an ad that will always be ignored?
Many people also cite the proliferation of ads on some sites, ads that are increasingly aggressive at catching the reader’s attention. I don’t consider that an issue that justifies ad blocking since there’s an easier remedy – vote with your feet. Don’t go to sites that are predominately ads. However to be fair this is easy for me to say since I’m rarely interested in the content of those sites – I mostly come across them when following ‘WTF?’ comments from others.
Common solutions
There are two common solutions. The first is disabling javascript, e.g., with “safe script”. It allows ads to be displayed while defanging them. It’s good in theory and whitelisting sites is normally just a moment’s work but it has costs. Some sites don’t work unless it’s disabled, e.g., gotomeeting, even if I ‘trust’ the domain. (I suspect the landing page redirects to a second page that pulls in javascript from sites that I don’t know to whitelist.)
In this case it’s obvious that the site is failing but in other cases the failure is more subtle and easily overlooked. E.g., some sites hide links and buttons by default and use javascript to make them visible, or it’s required to perform an action. In some cases it’s obvious, e.g., you fill out a form and there’s either no submit button or it doesn’t do anything, but sometimes it’s more subtle and you don’t notice the problem unless you happen to visit the site with javascript blocking turned off.
The second is ad blocker browser plugins like ‘Ad Block Plus’. It works but I’ve noticed significant delays while the status bar saying that the browser is waiting for a response from the ad blocker site. This seems most common when recovering from a browser crash – I usually have a lot of tabs open and browsers throttle the number of concurrent connections to any particular site – but I’ve seen it at other times as well.
There has to be a better solution.
Pi-Hole (Ad Blocker)
The first half of the solution is Pi-hole. It is a small app designed to run on a Raspberry pi (hence the name) although you can also run it on a micro-instance at AWS or Digital Ocean. It works as a caching DNS server with a blacklist. The sites on the blacklist resolve to the pi-hole server which responds with a simple page.
You will get the best performance if it’s running on a Raspberry pi on your home network. (Be sure to configure it with a static IP address, not a DHCP address assigned by the router, in this case.) You will have the most flexibility if it’s running on a micro-instance at a cloud provider – you can access the server while away from home. There’s no reason why you can’t do both – list the Raspberry pi on your home network as your primary DNS server and the cloud-based server as your secondary DNS server.
A good static address for a Raspberry pi is 192.168.1.53 (or anything ending in .53) since the standard DNS port is 53. Make sure whatever you pick is outside of your router’s DHCP address range.
Installation
These instructions assume you’re running either Raspbian (on a Raspberry pi) or Ubuntu (on a cloud instance.)
- Install curl. apt-get install curl.
- Install pi-hole. curl -sSL https://install.pi-hole.net | bash
You’ll be prompted for a few configuration details and eventually end up with a dialog page that shows the ‘admin’ password. Write it down.
Normally I would stay far, far away from running something downloaded directly through bash. I made an exception in this case since it’s a dedicated system with limited resources. It’s not hard to download the script into a local file for review before running it.
Assuming you’re using the static address mentioned above your admin page is at http://192.168.1.53/admin. The dashboard will show basic statistics (number of sites looked up, number of sites blocked up, how often it hit its cache, etc.). My server has only been up for a few hours so there’s been relatively few queries.
Another pages will list the actual DNS queries. You can use this to verify that you’re hitting this server for your DNS queries. Or to catch your kids going somewhere they shouldn’t go!
If your router supports user-specifed DNS servers you should add this address as your primary DNS server. IMPORTANT: keep your ISP’s DNS server (or an alternative DNS server) as the secondary or tertiary DNS server. This will prevent you from being effectively losing internet access if something goes wrong with your pi-hole server.
Your systems should automatically pick up the new DNS servers as their DHCP leases are renewed. This may take several days. On Ubuntu you can try to force the issue by either manually releasing the DHCP lease
- $ sudo dhclient -r
- $ sudo dhclient
$ sudo dhclient -r $ sudo dhclient
and blowing the dnsmasq cache by cycling the network manager
- $ sudo service network-manager restart
$ sudo service network-manager restart
However it didn’t seem to have any effect when I did this.
Local caching (dnsmasq)
Linux systems, or at least Ubuntu systems, use dnsmasq for their DNS lookup. It is configured via the DNS settings provided by the DHCP server on your router. It does not perform caching by default.
It is fairly straightforward to enable caching. This isn’t particularly important when you’re using pi-hole on a Raspberry pi since network connectivity should never be a problem but it can be helpful if you’re using a cloud provider.
Instructions
- Create the file /etc/dnsmasq.d/localhost. This tells dnsmasq to provide DNS service on the localhost address.
- localhost=127.0.0.1
localhost=127.0.0.1
Note: dnsmasq is already listening to 127.0.0.53. This won’t change that. I think (but am not certain) that one interface will provide caching and the second will always hit the upstream DNS server.
- Edit /etc/dhcp/dhclient.conf. We want to prepend our local cache. We can also explicitly add our pi-hole server to ensure that we always hit the pi-hole regardless of the router settings.
- #prepend domain-name-servers 127.0.0.1;
- #require subnet-mask, domain-name-servers;
- prepend domain-name-servers 127.0.0.1,192.168.1.53;
- require subnet-mask, domain-name-servers;
#prepend domain-name-servers 127.0.0.1; #require subnet-mask, domain-name-servers; prepend domain-name-servers 127.0.0.1,192.168.1.53; require subnet-mask, domain-name-servers;
- Restart the network manager.
- $ sudo service network-manager restart
$ sudo service network-manager restart
- bgiles@eris:/etc/dhcp$ dig google.com
- ; <> DiG 9.11.3-1ubuntu1.1-Ubuntu <> google.com
- ;; global options: +cmd
- ;; Got answer:
- ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9959
- ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
- ;; OPT PSEUDOSECTION:
- ; EDNS: version: 0, flags:; udp: 65494
- ;; QUESTION SECTION:
- ;google.com. IN A
- ;; ANSWER SECTION:
- google.com. 166 IN A 216.58.218.238
- ;; Query time: 56 msec
- ;; SERVER: 127.0.0.53#53(127.0.0.53)
- ;; WHEN: Sun Aug 26 09:21:47 MDT 2018
- ;; MSG SIZE rcvd: 55
- bgiles@eris:/etc/dhcp$ dig google.com
- ; <> DiG 9.11.3-1ubuntu1.1-Ubuntu <> google.com
- ;; global options: +cmd
- ;; Got answer:
- ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33917
- ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
- ;; OPT PSEUDOSECTION:
- ; EDNS: version: 0, flags:; udp: 65494
- ;; QUESTION SECTION:
- ;google.com. IN A
- ;; ANSWER SECTION:
- google.com. 164 IN A 216.58.218.238
- ;; Query time: 0 msec
- ;; SERVER: 127.0.0.53#53(127.0.0.53)
- ;; WHEN: Sun Aug 26 09:21:48 MDT 2018
- ;; MSG SIZE rcvd: 55
.
You can now perform two queries to verify caching has been enabled.
In the first case there’s no entry in the local cache so we hit the upstream server. This has a slight delay since I hit my Digital Ocean instance, and it in turn has to hit its upstream provider.
bgiles@eris:/etc/dhcp$ dig google.com ; <> DiG 9.11.3-1ubuntu1.1-Ubuntu <> google.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9959 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 65494 ;; QUESTION SECTION: ;google.com. IN A ;; ANSWER SECTION: google.com. 166 IN A 216.58.218.238 ;; Query time: 56 msec ;; SERVER: 127.0.0.53#53(127.0.0.53) ;; WHEN: Sun Aug 26 09:21:47 MDT 2018 ;; MSG SIZE rcvd: 55
In the second case there’s an immediate response since the value is in the cache.
bgiles@eris:/etc/dhcp$ dig google.com ; <> DiG 9.11.3-1ubuntu1.1-Ubuntu <> google.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33917 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 65494 ;; QUESTION SECTION: ;google.com. IN A ;; ANSWER SECTION: google.com. 164 IN A 216.58.218.238 ;; Query time: 0 msec ;; SERVER: 127.0.0.53#53(127.0.0.53) ;; WHEN: Sun Aug 26 09:21:48 MDT 2018 ;; MSG SIZE rcvd: 55
Configuring VPNs
Finally we can explicitly add these DNS servers to our VPN settings. This covers us even if our VPN provides its own DNS settings when you connect to it.
If you’re running your own VPN server you can edit your /etc/openvpn/server.conf file to push your pi-hole and backup DNS servers. This means you’re covered even if you don’t modify your network manager settings. IMPORTANT: if you use this VPN while away from home you will want to point to a pi-hole running on a cloud provider instead of your home network.
If you do this remember to reload the settings. sudo service openvpn reload.
A final comment on caching
Wrapping up I want to add a final comment on the caching. Caching DNS entries is dangerous if done improperly. IP addresses may not change often but they do change and it’s important to recognize that.
Fortunately there’s a solution to this. All DNS records have a time-to-live (TTL) value – it’s basically a guarantee that the entry won’t change with in the next TTL seconds. There are benefits to both brief TTL (e.g., 5 minutes) and long TTL (e.g., a week). The cache does not harm if it uses the TTL as guidance for how long to keep a value in the cache. Some caching DNS servers will continue to provide stale information if the upstream DNS server(s) becomes non-responsive, others will immediately discard the value.
This is why so many of the entries above were forwarded. The responses are cached but have a short TTL. Keep that in mind when looking at the dashboard and logs and evaluating whether this is worth the effort.