


High availability

High availability is a term that can be used to refer to systems that are designed to remain functional despite some hardware and/or software failures and/or planned maintenance (e.g. upgrades). Actual measured availability (e.g. percentage of time or requests that succeed) can vary.

In this howto, we'll be describing a simple 2 router setup, in an active/backup configuration. The devices will share a virtual ip address that hosts on the lan can use as a gateway to reach the internet. In case the active router fails or is rebooted, a backup router will take over.

We will be using keepalived to implement healthchecking and ip failover, and conntrack-tools to implement firewall/nat syncing.

Most of openwrt configuration required (but not all) is doable from luci web ui as well.

Preparation, assumptions, description of environment

  • You have 2 openwrt routers and a static WAN IP. (could also be a private IP+DMZ).

  • If you're not doing NAT or connection tracking based firewalling, skip the conntrackd/conntrack-tools sections.

  • DHCP dynamic WAN IP is possible with keepalived, but requires extra scripting and is not going to be described here.

  • VPNs and tunnel setups and failing those over is not covered.

  • Failing over PPPoE WAN is not implement, best bet: let the modem do PPPoE and setup your virtual wan ip to DMZ.

Individual Router Configuration

1. Configure 1st openwrt router

  • Internal LAN ip: (change so is available for initial configuration of 2nd router)

  • WAN IP, gateway: static gw metric 10 (using double nat / dmz on the isp provided router)

  • DHCP on defaults is fine, we'll configure it later.

2. Configure 2nd openwrt router

  • Interface LAN ip: (change so that when you connect the second router to the same network you can configure it)

  • WAN IP, gateway: static gw metric 10 (using double nat / dmz on the isp provided router)

  • DHCP on defaults is fine for now, if you have any static leases in dhcp, or fixed host entries, make sure they're the same as on 1st router.

verification and troubleshooting

  • change a client to use gw and dns, make sure second router is working as well

  • hosts that have IPs issued with one dnsmasq might not be resolvable using the second dnsmasq, assigning static leases helps.

Both router configuration

3. Configure keepalived

keepalived is a linux daemon that uses VRRP (Virtual Router Redundancy Protocol) to healthcheck and elect a router on the network that will serve a particular IP. We'll be using a small subset of its features in our use case.

opkg update opkg install keepalived

The following configuration in /etc/keepalived/keepalived.conf assumes routers are symmetrical, ie. they're of the same priority, they start up in backup mode and they will not preemept the other router until they establish other router is gone. You will need to adjust the interfaces to match your device.

! Configuration File for keepalived

! failover E1 and I1 at the same time
vrrp_sync_group G1 {
group {
} ! internal
vrrp_instance I1 {
state backup
interface br-lan
virtual_router_id 51
priority 101
advert_int 1
virtual_ipaddress {
authentication {
auth_type PASS
auth_pass s3cret
} ! external
vrrp_instance E1 {
state backup
interface eth0.2
virtual_router_id 51
priority 101
advert_int 1
virtual_ipaddress {
virtual_routes {
src to via dev eth0.2 metric 5
authentication {
auth_type PASS
auth_pass s3cret

4. Configure conntrackd

This step is optional, keepalived will be failing over (successing over?) the ip address with or without conntrackd, however, as NAT relies on tracking connection state in a (network address) table that links external ip:port with internal ip:port (per given protocol, tcp or udp), connections might be broken on failover to backup openwrt instance. New connections (such as application level reconnects) will work just fine. This is because the backup instance will not know who to send outgoing packets to.

Below is a simple config file for conntrackd. It would be advisable to navigate to /etc/conntrackd/ in order to rename the original config. Creating a brand new "conntrackd.conf" file allows you to browse back to the old one for reference.

Sync {
Mode FTFW {
DisableExternalCache Off
CommitTimeout 1800
PurgeTimeout 5
} UDP {
IPv4_address "ip addr of host router"
IPv4_Destination_Address "ip addr of partner router"
Port 3780
Interface eth*
SndSocketBuffer 1249280
RcvSocketBuffer 1249280
Checksum on
} General {
Nice -20
HashSize 32768
HashLimit 131072
LogFile on
Syslog on
LockFile /var/lock/conntrack.lock
Path /var/run/conntrackd.ctl
Backlog 20
NetlinkBufferSize 2097152
NetlinkBufferSizeMaxGrowth 8388608
Filter From Userspace {
Protocol Accept {
ICMP # This requires a Linux kernel >= 2.6.31
Address Ignore {
IPv4_address # loopback

Run simple commands to verify functionality

Summary of connected devices:

conntrackd -s
Resync nodes:

conntrackd -n

3. Configure dhcp

You'll want DHCP (dnsmasq) to serve (vip address) to hosts on the lan, both as their gateway and DNS. Here's an excerpt from /etc/config/dhcp that instructs dnsmasq to do that.

config dhcp 'lan'
option force '1'
list dhcp_option '3,'
list dhcp_option '6,'

option force '1' is needed for dnsmasq to not deactivate when it sees the other dhcp server. dhcp_option 3 is gateway, dhcp_option 6 is DNS.

5. Sysupgrade backup add dirs

Add the following directories to /etc/sysupgrade.conf. (can be done from luci as well).


Testing and verification

TODO(risk): restarting keepalived with logread -f open, pulling cables with ssh / telnet / http sessions open, forcing dhcp renewal with tcpdump running, ensure


