Overview

The u32 filter allows you to match on any bit field within a packet, so
it is in some ways the most powerful filter provided by the Linux
traffic control engine. It is also the most complex, and by far the
hardest to use. To explain it I will start with a bit
of a tutorial.
Matching

The base operation of the u32 filter is actually very simple. 
It extracts a bit field from a 32 bit word in the packet, and if it is
equal to a value supplied by you it has a match. The 32 bit word must
lie at a 32 bit boundry. The syntax in tc is:
# tc filter add dev eth0 parent 999:0 protocol ip prio 99 u32 \
    classid 1:1 \
    match u32 0xc0a80800 0xffffff00 at 12
The first line uses the same syntax shared by all filters, so I will
ignore it for now. The second line just says that if the filter matches
assign the packet to class 1:1. The third line is the interesting one;
this is what it means:
match u32   This keyword introduces a match condition.  The u32
            is the type of match.  It must be followed by a value
            and mask.  A u32 match extracts a 32 bit word out of
            the header, masks it and compares the result to the
            supplied value.  This is in fact the only type of
            match the kernel can do.  Tc "compiles" all other
            types of matches into this one.

0xc0a80800  This is the value to compare the masked 32 bit word
            to.  If it is equal to the masked word the match is
            successfull.

0xffffff00  This is the mask.  The word extracted from the
            packet is bit-wise and'ed with this mask before
            comparision.

at 12       This keyword tells the kernel where the 32 bit word
            lives in the packet.  It is an offset, in bytes,
            from the start of the packet.  So in this case
            we are loading the 32 bit word that is 12 bytes from
            the start of the packet.  The offset is optional.
            If not supplied it defaults to 0 which is generally
            not what you want.
Now if you look at rfc791 you will see that the source address is stored
at offset 12 in an IP packet. So the match condition could be read as:
"match if the packet was sent from the network 192.168.8.0/24". To use
the u32 filter you do have to be familiar
with the fields in IP and TCP, UDP and ICMP headers. But you don't have
to remember the offsets of the individual fields - tc has some syntatic
sugar for that. This command has does the same thing as the one above.
The syntax is different, but the filter submitted
to the kernel is identical:
# tc filter add dev eth0 parent 999:0 protocol ip prio 99 u32 \
    classid 1:1 \
    match ip src 192.168.8.0/24
A u32 filter item can logically "and" several matches together,
succeeding if only if all matches succeed. This example will succeed
only if the packet was sent from network 192.168.8.0/24, and has a TOS
of 10 hex:
# tc filter add dev eth0 parent 999:0 protocol ip prio 99 u32 \
    classid 1:1 \
    match ip src 192.168.8.0/24 \
    match ip tos 0x10 1e
You can have as many match conditions on the one line as you want. All must be successful for the filter item to score a match.
If you enter several tc filter commands the filters are tried in turn until one matches. For example:
# tc filter add dev eth0 parent 999:0 protocol ip prio 99 u32 \
    classid 1:1 \
    match ip src 192.168.8.0/24 \
    match ip tos 0x10 1e
# tc filter add dev eth0 parent 999:0 protocol ip prio 99 u32 \
    classid 1:2 \
    match ip src 192.168.4.0/24 \
    match ip tos 0x08 1e
The first filter item checks if the packet is from network 192.16.8.0/24
and has a TOS of 10 hex. If so the packet is assigned to class 1:1. If
not the second filter item is tried. It checks if the packet is from
network 192.168.0.4/24 and has a TOS of 08 hex
and if so it will assign to packet to class 1:2. If not the next filier
item would be tried. But there is none, so u32 filter fails to classify
the packet.
Now it is time to discuss u32 handles. A u32 handle is actually 3
numbers, written like this: 800:0:3. They are all in hex. For now we are
only interested in the last one. This last number identifies the filter
items we have been adding. Because we did not
specify an number generated for the filter item the kernel allocated
one for us. In fact it allocated the handles 800:0:800 and 800:0:801.
The handle it generates is one bigger than the largest handle used so
far, with a minum value of 800 hex. Valid filter
item handles range from 1 to ffe hex. Like all filter handles, the
complete handle (as in 800:0:801) must be unique. We can force a
particular handle to be used for a filier item by using the "handle"
option of "tc filter", like this:
# tc filter add dev eth0 parent 999:0 protocol ip prio 99 u32 \
    classid 1:1 \
    match ip src 192.168.8.0/24 \
    match ip tos 0x10 1e
# tc filter add dev eth0 parent 999:0 protocol ip prio 99 handle ::1 u32 \
    classid 1:2 \
    match ip src 192.168.4.0/24 \
    match ip tos 0x08 1e
These tc commands are almost identical to the previous example. In fact
the tc command creating the first item is identical, so it will be
allocated the same handle as before, 800:0:800. The second command only
differs from the previous example in that it specifies
item handle 1 is to be used. (The rest of the numbers in the handle are
not specified, so the defaults are used.) The full handle created for
the second filter item will be 800:0:1. The kernel evaluates filter
items in handle order, with lower handle numbers
being checked first. So the impact of doing this will be to reverse the
order they two filter items are evaluated by the kernel, compared to
the previous example.
Linking

Before proceeding we need a new concept. In effect filter items that
share the same prefix in their handle (800:0 in the above examples) form
a numbered list. The number is the filter item number, ie the last
number is the handle. In the last example above
we had a two item list with these handles:
list 800:0:
  1   [src=192.168.4.0/24, tos=0x08] -> return classid 1:2
  800 [src=192.168.8.0/24, tos=0x10] -> return classid 1:1
I will call this a u32 filter list, or just a filter list for short. The
prefix (800:0 in this case) can be used as a handle to identify the
list. In the section above I described how the kernel "executes" such a
list. To recap it does this by running through
the list in filter item number order, checking each filter item in turn
to see if it matches. If a filter item matches it can classify the
packet, in which case the u32 filter stops and returns the classified
packet. But when a u32 filter item matches a packet
there is one other thing it can do besides classifing the packet. It
can "link" to another u32 filter list. For example:
# tc filter add dev eth0 parent 999:0 protocol ip prio 99 u32 \
    link 1:0: \
    match ip src 192.168.8.0/24
If this filter item matches it will "link" to filter list 1:0:, meaning
the kernel will now execute filter list 1:0:. If a filter item in that
list matches and classifies a packet then the u32 stops and returns
classified packet. If that does not happen, ie
if no filter item in the list classifies the packet then the kernel
resumes executing the original list. Execution continues at the next
filter item in the original list, ie the one after the filter item that
did the "link". A linked list can in turn link
to other lists. You can nest up to 7 link commands.
If you specify a "link" command for a filter item any attempt to
classify a packet in the same filter item will be ignored. Another way
of saying this is the "classid" option and its aliases won't work if you
put "link" on the command line.
This linking is not in itself very useful. It is usually faster to use
one big list, and it always easier to do it that way. But there are two
commands you can combine with the "link" command, and in fact neither
can be used without it.
Hashing

The filter lists we have been discussing are actually part of much
larger structures called hash tables. A hash table is just an an array
of things that I will call buckets. A bucket contains one thing: a
filter list. This will all become clear shortly, I hope.
We can now look at the meanings of the other two numbers in a u32 filter
handle. One handle in the examples above was 800:0:1. Well, the 800
identifies the hash table, and the 0 is the bucket within that hash
table. So 10:20:30 means: filter item 30, which
is located in bucket 20, which is located in hash table 10.
Hash table 800 is special. It is called the root. When you create a u32
filter the root hash table gets created for you automaically. It always
has exactly one bucket, numbered 0. This means the root hash table also
exactly one filter list associated with it.
When the u32 filter starts execution it always executes this filter
list. In other words a u32 filter does its thing by executing filter
list 800:0. If filter list 800:0 does not classify the packet (implying
that none of the lists it linked to clasified it
either) then the u32 filter returns the packet unclassified.
Not unsurprisingly you can't delete the root hash table. Actually you
can't delete any other hash table either (as of 2.4.9), but that is
because of a bug in the in the kernel u32 filter's reference counting
code. The only way to get rid of a hash table in
2.4.9 or earlier is to delete the entire u32 filter.
Hash tables other than the root must be created before you can add
filter items that link to them. Use this tc command to create hash
table:
# tc filter add dev eth0 parent 999:0 protocol ip prio 99 handle 1: u32 \
    divisor 256
This creates a hash table with 256 buckets. The buckets are numbered 0
through to 255. So we have effectively created 256 filter lists with
handles 1:0, 1:1, ... 1:255. A hash table can have 1, 2, 4, 8, 16, 32,
64, 128 or 256 buckets. Other values are possible
but can be very inefficient. The kernel has a bug that will allow you
to have 257 buckets, but doing that may cause an oops.
If you omit the "handle" option the kernel will allocate you a new
handle. Currently (2.4.9) the kernel has a bug - the handle allocation
routing will go infinite rather than return failure in the very unlikely
circumstance that all hash table handles are in
use.
The way the tc "link" option is written it might appear that you can
link to any bucket. You can't. The link option only allows you to
specify bucket 0 (implying that "link 1:1" is illegal). To select a
bucket other than 0 you must use the "hashkey" option:
# tc filter add dev eth0 parent 999:0 protocol ip prio 99 u32 \
    link 1: hashkey mask ffffff00 at 12 \
    match ip src 192.168.8.0/24
The hashkey option causes the kernel to calculate the bucket number of
the filter list to link to from data in the packet. You get to specify
what data. This operation is usually called a hashing. In this case the
hash it is a particularly fast but primitive
one. In the example above this is what happens, in detail, if the match
succeeds:
The kernel reads the 32 bit word at offset 12 of the packet being sent.
This word is masked with ffffff00
The word is left shifted by 8 bits, and them masked with 0xff. The
amount of left shift is calculated from the mask - it is the number of
bits the mask has to be shifted so the first 1 bit appears in the least
significant bit. The is the "hashing function".
It changed between 2.4 and 2.6. In 2.4 the 4 bytes in the word are
xor'ed together. From what I have seen, the 2.4 version did a better job
on real data.
The result of the hash, which is a number in the range 0..255, is then masked with (number of buckets - 1).
The result is a bucket number, which is then combined with the hash table in the link option to form a filter list handle.
That filter list is then executed.
If you look at rfc791 you will see the hash in the example is selecting
the senders network address. Tc offers no syntatic sugar to help you
this time, ie there is no "hashkey ip src 0.0.0.0/24" or similar. You to
do it the hard way and look up the rfc's.
Why would you hash on the source network rather than testing for it in a
match option? Its only useful if you want to classify a packet based on
a lot of different source networks. If there is only one or two source
networks you are better off using match as
doing a couple of matches is faster than doing a hash. But, the amount
of time required to test all the matches will grow as number of source
networks grows. Hashing on the other hand takes a fixed amount of time
regardless whether there is 1 or 100 source
networks, so if there are thousand's of source networks hashing is
going to be literally 100's of times faster than testing them one by one
using matching.
I mention this because there is an example from Alexey's
"README.iproute2+tc" that selected the TCP protocol (among others) using
hashing. As an example of how to use hashing it is good, but it has
been cut and pasted by every man + dog, altered to only select
the TCP protocol, and then quoted as the way to do it. Wrong. A simple
"link" without hashing would be better in that case.
We have dealt with one side of hashing - how the filter list to be
executed (hash table, bucket) is selected. There is a second side to it -
adding items to the selected filter list. The problem is really quite
simple - which one is it? You know the hash table
number, it is the bucket number that is the problem. You could use the
description of the hashing algroithm above and manually calculate the
bucket number. That is a bad option for two reasons. Firstly, its hard
work in the general case. Secondly its fragile,
because the hashing algroithm in the kernel can and has changed. Tc can
calculate the hash for you, and it is better and easier to let it do
so. Letting Tc do this does not effect time it takes the kernel to
execute the filter. Here is how you do it:
# tc filter add dev eth0 parent 999:0 protocol ip prio 99 u32 \
    classid 1:2 \
    ht 1: sample ip src 192.168.8.0/8 \
    match ip src 192.168.8.0/8 \
    match ip tos 0x08 1e
Caveat: as of 07/Feb/2006, the hashing algorithm in tc is still at the
2.4 version(!). Ergo, for 2.6 tc ends up with the wrong answer, so this
example above won't work until this bug is fixed.
The line in question is the third one - the rest we have seen before.
The "ht 1:" says the filter item is to be inserted into hash table 1::.
The "sample ..." says what value we want to calculate has bucket number
for, ie we want that value to be hashed. Tc
will apply the same hashing algorithm used by the kernel to calculate
the bucket number. The "..." can be anything that could legally follow a
"match" option, so all the syntatic sugar for calculating IP offsets is
available to you.
There is, unfortunately, three bugs in the current version of "tc" (ie
tc up until cvs 2006-02-09), which render "sample" useless. Firstly,
"sample" assumes the target hash table has 256 buckets. If it doesn't,
you are out of luck - you must use the "ht" option
instead. Secondly, the "sample" option always uses the 2.4 kernel
hashing function. Ie, it doesn't work on 2.6 kernels. Finally, the
"sample" parsing code in "tc" has a bug (a missing memset()), which
causes tc to get segmentation violations. This last bug
renders it completely useless.
Now for some random points. First of all, why did I not use the "handle"
option of tc to specify the hash table, as is done everywhere else?
Answer: because you must give the "ht" option. You can also give the
"handle" option, but if you do the hash table number
in it must be blank, (as in ::1), or be equal to the hash table given
in the "ht" option. Is there a good reason why tc and the kernel work
like this? No, not that I can see.
Secondly, why is the fourth line required in the command? First of all,
perhaps isn't obvious why it may not be required. It may not be needed
because the "sample" option has already selected the filter list for
this source network. If no other source networks
hash this this same bucket there is indeed no reason to for the match
command. But if several source networks hash to the same bucket it is
required - the filter won't work without it. If you are hashing for a
good reason, ie to speed up the process of selecting
among many possibilities, and you are being conservative, ie you assume
you don't know the internals of the hashing algroithm, then you can
never be sure that each bucket will only have one filter item. So this
match line should always be present.
Thirdly, there are many examples on the net that hash on the IP
protocol, then selects protocol 6 directly using "ht 1:6:" rather than
using the "sample" option. Should I copy that? Answer: No. This example
should sound familiar. It is the same cut & paste
(aka hack, because they always try to improve the example) from
Alexey's "README.iproute2+tc" file I referred to earlier. In that
example Alexey assumed he knew how the hashing algroithm worked. It
probably sounded like a reasonable assumption to him - he
designed and coded the algorithm. But it is not a good assumption for
the rest of us. He did this because under the current hashing algorithm
the value is trival to calculate under some circumstances. If you are
selecting one byte from the packet on a byte
boundry, and use a hash table 256 elements long, then the byte always
hashes to itself. The IP protocol byte meets those conditions.
Fourthly, should I allocate my own handles to filter items in a hash
bucket? Answer: Avoid it if possible. You can manually allocate filter
item numbers using the handle option, as in "handle ::1". If you do so
be sure to allocate a unique filter item number
to each filter item in the hash table (as opposed to unique to just the
bucket the filter item lives it). You have to do this if you assume (as
you should) that you don't know what bucket the filter item is going to
hash to. But, as I said earlier, avoid it
if possible. You would not be hashing if there weren't a lot of filter
items to choose from. And if there are a lot of them doing your own
filter item numbering will be painful.
Header Offsets

The IP header (and other headers) are variable length. This creates a
problem if you are trying to use "match" to look at a value in a header
that follows - you don't know where it is. It is not an impossible
problem because every header in an IP packet contains
a length field. The "header offsets" feature of u32 allows you to
extract that length from the packet, and then add it to the offset
specified in the "match" option.
Here is how it works. Recall that the match option looks like this:
match u32 VALUE MASK at OFFSET
I said earlier that OFFSET tells the kernel which word in the packet to
compare to VALUE. That statement was a simplification. Two other values
can be added to OFFSET to determine which word to use. Both those values
start off as 0, but they can be modified
when a "link" option calls another filter list. Any modification made
only applies while called filter list is being executed as the old
values are restored if the called filter list fails to classify the
packet. Here are the two values and the names I call
them:
permoff     This value is unconditionally added to every OFFSET
            that is done in the destination link, ie that one
            that is called.  This includes calculations of new
            permoff's and tempoff's.  Permoff's are cumulative
            in that if the destination link calls another link
            and calculates a new permoff, the result is added to
            this one.

tempoff     A "match" option in the destintaion link can optionally
            add this value its OFFSET.  Tempoff's are temporary, in
            that it does not apply to any links the destination link
            calls.  It also does not effect the calculation of
            OFFSET's for new permoff's and tempoff's.
Time for an example. Consider this command:
# tc filter add dev eth0 parent 999:0 protocol ip prio 99 u32 \
    link 1: offset at 0 mask 0f00 shift 6 plus 0 eat \
    match ip protocol 6 ff
The match extression selects tcp packets (which is IP protocol 6). If we
have protocol 6 we execute filter 1:0. Now for the rest of it:
offset  This signals that we want to modify permoff or tempoff
        if the link is executed.  If this is not present,
        neither permoff nor tempoff are effected - in other
        words the target of the link inherits the current
        permoff and tempoff.

at 0    This says the 16 bit word that contains the value we
        are going to use to calculate permoff or tempoff lives
        offset 0 the IP packet - ie at the start of the packet.
        This offset must be even.  If not specified 0 is used.

mask 0f00   This mask (which is in hex) is bit-wise anded with the
            16 bit word extracted from the packet header.  It
            isolates the header length from the rest of the
            information in the word.  If not specified 0 is used
            for the extracted value.

shift 6     This says the word extracted is to be divided by 32
            after being masked.  If not present the value is not
            shifted.

plus 0      After extracting the word, masking it and dividing it by
            32, this value is now added to it.  If not present is
            assumed to be 0.

eat         If this is present we are calculating permoff, and the
            result of the calculation above is added to it.  Tempoff
            is set to 0 in this case.  If this is not present we are
            calculating tempoff, and the result of the calculation
            becomes tempoff's new value.  Permoff is not altered in
            this case.
If you don't understand this then accept at face value that it does
calculate the position of the second header in an IP packet. Copy &
paste it into your scripts. I am not going to try and explain it
further. You should of course dig out rfc791 and verify
it for yourself. That way you will be able to apply it to headers
beyond the second one.
Having calculated your offset you can now add entries to the destination
filter list that depend on it. Here is an example entry:
# tc filter add dev eth0 parent 999:0 protocol ip prio 99 u32 \
    classid 1:4 \
    ht 1:0 \
    match u32 0x140000 ffff0000 at nexthdr+0
We have see almost all of this before. "ht 1:0" inserts this filter item
into hash table 1, bucket 0. "classid 1:4" classifies the packet if the
filter matches. The "match" selects protocol 14 hex (which is 20
decimal - ftp). The "at nexthdr+0" is the only
new bit, or at least the "nexthdr+" is new. The "0" sort of means the
same thing as it always did - that the 32 bit word that contains the TCP
port is at offset 0. But it is offset 0 from the TCP header, because
either permoff of tempoff has been set to point
to that header. As for "nexthdr+", recall that adding "tempoff" was
optional. If you add "nexthdr+" it gets added. If you don't it doesn't.
Tc does supply syntatic sugar for this as well. I could of written this way, and generated an identical filter item:
# tc filter add dev eth0 parent 999:0 protocol ip prio 99 u32 \
    classid 1:4 \
    ht 1:0 \
    match tcp protocol 6 ff
Recall that I said modification made to permoff and tempoff only applies
while called filter list is being executed as the old values are
restored if the called filter list fails to classify the packet. This
was a lie. Permoff is restored, but tempoff isn't.
This can make for subtle suprises in the way a U32 filter executes,
because you tend to assume that during the execution of a filter list
permoff and tempoff never change. But if you link to another list
tempoff may change. I recommend always using permoff's
(ie, always specify "eat", and never use "nexthdr+") to avoid this.
Reference

Handles

The u32 filter uses 3 numbers for its handle. These numbers are written:
H:B:I, eg 1:1:2. All are in hex. The first number, H, identifies a hash
table. The second number, B, identifies a bucket within the hash table,
and the third number, I, identifies the
filter item within the bucket. The combination must be unique.
Hash table numbers must lie between 001 and fff hex. The traffic control
engine will generate a hash table number for you if you don't supply
one. Generated numbers are 800 or above. The hash table number in the
handle is not used when creating or changing
a hash table item. Instead the hash table specified by the "ht" option
is used, and the hash table in the handle must be not specified, 0, or
equal to the hash table in the "ht" option.
A bucket number can range from 0 to 1 less than then number of buckets
in the parent hash table. If no bucket is specified (as in 1::2), then 0
is assumed.
Filter Item numbers must lie between 001 and fff hex. The traffic
control engine will generate a filter item number for you if you don't
supply one. The generated number is the larger of 800 and one bigger
than the current largest item number in the bucket.
Execution

Each "tc filter add ... u32" item adds either a hash table, or adds a
filter item to a bucket within a hash table. When a u32 filter is
created the root hash table, whose handle is 800::, is automatically
created. It has one bucket. The u32 starts by checking
each filter item in bucket 0 of the root hash table. Filter items
within a bucket are always checked in filter item number order. As soon
as a filter item classifies the packet the u32 filter stops execution.
Filter items may use the "link" option to execute
a filter item list held by a bucket in another hash table.
Options

classid :<classify-spec>: | flowid :<classify-spec>:
  If all the match options succeed then this will :classify:
  the packet and the u32 filter will stop execution.  Ignored
  if "link" option is given.

divisor <NUMBER>
  If supplied this parameter must appear on its own, without
  any other arguments.  It creates a new hash table.  <NUMBER>
  specifies the number of buckets in the hash table.  It can
  range from 1 to 256, and should be a power of 2.  The hash
  table number is taken from the handle supplied.  If no handle
  is supplied a new hash table number is generated.

hashkey mask <MASK> at <AT>
  If the link specified by the "link" option is taken then this
  option specifies the bucket within the hash table to use.  This
  is how the bucket number is calculated:
  1.  The 32 bit work at offset <AT> is read from the packet.
  2.  The 32 word is masked with <MASK>.  <MASK> is in hex.
  3.  The 4 bytes in the result are xor'ed together.
  4.  The result is bit-wise anded with the value (number of
      buckets in the hash tabled linked to - 1).

ht <HASHTABLE-HANDLE>
  This option specifies the handle of a filter item being added
  or changed.  The filter item in <HASHTABLE-HANDLE> must be
  unspecified or 0 - it can only be specified by the "handle"
  option.  The bucket specified may be overridden by the "sample"
  option.

link <HASHTABLE-HANDLE>
  If all match options succeed in this filter item the "link"
  option causes the filter items in another hash table's bucket to
  be checked.  If none of the filter items in the linked to bucket
  classify the item then u32 filter continues checking filter
  items in the current bucket.  The "link"ed to bucket may link to
  yet another bucket, to a maximum level of 7 such calls (in 2.4.9
  .. 2.6.15).  The <HASHTABLE-HANDLE> specifies the hash table to
  link to.  The bucket and filter item numbers in that handle must
  both be unspecified or blank.  Bucket 0 will be used unless
  overridden by the "hashkey" option.  The "offset" option can be
  used to alter packet offsets in the linked to bucket.

match <selector>
  This option checks if a field in the packet has a particular
  value.  A filter item may contain more than one "match" option.
  All match options must be satisified before the filter item
  considers it has a match.  What the filter item does when it
  has a match is specified by the "link", "classid"/"flowid", and
  "police" options.  If none are specified the filter item does
  nothing when it matches.  Selectors are described below.

offset mask <MASK> at <AT> shift <SHIFT> plus <PLUS> eat
  If the link specified by the "link" option is taken then the
  position of the values extracted from the packet by the hash
  table linked to will be offset by this specification.  This is
  how the offset is evaluated & implemented:
  1.  The 16 bit word at offset <AT> is read from the packet.  If
      <AT> is not present the 16 bit work is read from offset 0.
  2.  The 16 bit word is masked with <MASK>.  If no masked is
      specified 0 is assumed.
  3.  The masked 16 bit word is divided by (2**<SHIFT>).
  4.  The resulting value has <PLUS> added to it.  If not specified
      <PLUS> defaults to 0.
  5.  If none of <MASK>, <AT>, <SHIFT>, nor <PLUS> are specified
      then the current temporary offset is used.
  6.  If "eat" is specified the offset is permanent, and is added
      to the current permanant offset.  The permanent offset is
      unconditionally added to the <AT> value in "match", "offset"
      and "hashkey" options in the hash table linked to, and any
      nested links.  If "eat" is not specified the offset is
      temporary.  Temporary offsets any added to the "at
      nexthdr+<AT>" values in "match" options, but do not effect
      any other <AT> values.
  7.  If then specified does not classify the packet, and hence
      execution resumes at the next filter item, then the permanent
      offset calculated here is discarded.  The temporary offset,
      however, remains in effect.
  8.  When the u32 filter starts executing both the permanent and
      temporary offset are initialised to 0.

police <police-spec>
  If all the match options succeed then this will :police: the
  packet and the u32 filter will stop execution.  Ignored if "link"
  option is given.

sample <selector>
  This option computes the bucket for the filter item being or
  changed from the <selctor> passed.  The packet offset and
  mask parts of the selector are ignored if given.  When
  calculating the hash bucket, the divisor in the target hash
  bucket is assumed to be 256.  There is no way of altering
  this.  If the divisor isn't 256, use the "ht" option instead.
  Selectors are described below.
Selectors

Selectors are used by the match option to extract information from the
packet and compare it to a value. All selectors compile to the one
format which is accepted by the kernel. This format reads a 32 value
from the supplied offset within the packet. The offset
must be on a 32 bit boundary. The value read is bit wise anded with the
supplied mask. If match succeeds if the result is equal to the supplied
value. In C:
if ((*(u32*)((char*)packet+offset+permoff) & mask) == value)
   match();
The "permoff" variable in this statement is calculated by the "offset" option that executed this filter list.
Here are some conventions which won't be repeated below for brevity:
at nexthdr+<OFFSET>
  Except where noted this can be appended to all selectors to
  override the default position of the field in the packet.  The
  <OFFSET> is the offset within the packet where the field can
  be found.  If an 16 bit value is being compared the <OFFSET>
  should be on a 16 bit boundary, and if a 32 bit value is being
  compared if should be on a 32 bit boundary.  The <OFFSET> is
  given in decimal; prefix with 0x to enter it in hex.  If
  "nexthdr+" is present any temporary offset calculated by the
  "offset" option is added to <OFFSET>.  The current permanent
  offset calculated by the "offset" optional is unconditionally
  added to <OFFSET>.  It is unlikely you will want to specify
  the "at" option with anthing other than u32, u16 and u8
  selectors.

<IP6ADDR>/<CIDR>
  This specifies set of up to 4 32 bit masks and values that
  will match a 128 bit IPv6 address.  The combined values equal
  the IPv6 address supplied, which may be in any IPv6 address
  format.  The combined masks are derived from the <CIDR>
  portion - it is a 128 bit word with the upper <CIDR> bits
  set to 1's, the rest are 0's.  If the HOST is not given the
  host is all 1's.  The IP address must be numeric.

<IPADDR>/<CIDR>
  This specifies a mask and value.  The value is equal to the
  IPv4 address supplied.  The mask is derived from the
  <CIDR> portion - it is a 32 bit word with the upper
  <CIDR> bits set to 1's, the rest are 0's.  If <CIDR>
  is not given the mask is all 1's.  The IP address must be
  numeric.  For example, 192.168.10.0/24 would yield a value
  of c0a80a00 hex and a mask of ffffff00 hex.

<MASK>
  This specifies a mask value the field will be bit wise anded
  with before being compared to <VALUE>.  It is given in hex.

<VALUE>
  This specifies the value the field extracted from the packet
  must equal, after being anded with the <MASK>.  It is decimal,
  unless prefixed with 0x, in which case it is hex.  Ie 0x10 and
  16 both mean the same thing.
Here are the selectors that can follow a "match" or "sample" option:
icmp code <VALUE> <MASK>
  Match the 8 bit code field an the icmp packet.  This must
  be in a hash table that is "link"ed to by a filter item which
  contains an "offset" option that skips the IP header.

icmp type <VALUE> <MASK>
  Match the 8 bit type field an the icmp packet.  This must be
  in a hash table that is "link"ed to by a filter item which
  contains an "offset" option that skips the IP header.

ip df
  Matches if the IPv4 packet has the "don't fragment" bit set.
  May not be followed by an "at" option.

ip dport <VALUE> <MASK>
  Matches the 16 bit desination port in a tcp or udp IPv4 packet.
  This only works if the ip header contains no options.  Use the
  "link" and "match tcp dst" or "match udp dst" option if you can
  not be sure of that.

ip dst <IPADDR>/<CIDR>
  Matches the destination IP address of an IPv4 packet.

ip firstfrag
  Matches is this IPv4 packet is not fragmented, or it the first
  first fragment.

ip icmp_code <VALUE> <MASK>
  Matches the 8 bit code field in icmp IPv4 packet.  This only
  works if the ip header contains no options.  Use the "link"
  and "match icmp code" options if you can not be sure of that.

ip icmp_type <VALUE> <MASK>
  Matches the 8 bit type field in ICMP IPv4 packet.  This only
  works if the ip header contains no options.  Use the "link"
  and "match ip icmp" options if you can not be sure of that.

ip ihl <VALUE> <MASK>
  Matches the 8 bit ip version + header length byte in the IPv4
  header.

ip mf
  Matches if the IPv4 packet is there are more fragments from the
  same packet to follow this one.  May not be followed by an "at"
  option.

ip nofrag
  Matches if this is not a fragmented IPv4 packet.  May not be
  followed by an "at" option.

ip protocol <VALUE> <MASK>
  Matches the 8 bit protocol byte in the IPv4 header.  You can
  not use symbolic protocol names (eg "tcp" or "udp").

ip sport <VALUE> <MASK>
  Matches the 16 bit source port in a TCP or UDP IPv4 packet.
  This only works if the ip header contains no options.  Use the
  "link" and "match tcp src" or "match udp src" options if you
  can not be sure of that.

ip src <IPADDR>/<CIDR>
  Matches the source IP address of an IPv4 packet.

ip tos <VALUE> <MASK> | ip precedence <VALUE> <MASK>
  Matches the 8 bit TOS byte in the IPv4 header.

ip6 dport <VALUE> <MASK>
  Matches the 16 bit desination port in a TCP or UDP IPv6 packet.
  This only works if the ip header contains no options.  Use the
  "link" and "match ip tcp" or "match ip udp" options if you can
  not be sure of that.

ip6 dst <IP6ADDR>/<CIDR>
  Matches the destination IP address of an IPv6 packet.

ip6 icmp_code <VALUE> <MASK>
  Matches the 8 bit code field in ICMP IPv6 packet.  This only
  works if the ip header contains no options.  Use the "link" and
  "match icmp" options if you can not be sure of that.

ip6 icmp_type <VALUE> <MASK>
  Matches the 8 bit type field in an ICMP IPv4 packet.  This only
  works if the ip header contains no options.  Use the "link" and
  "match icmp" options if you can not be sure of that.

ip6 flowlabel <VALUE> <MASK>
  Matches the 32 bit flowlabel in the IPv6 header.

ip6 priority <VALUE> <MASK>
  Matches the 8 bit priority byte in the IPv6 header.

ip6 protocol <VALUE> <MASK>
  Matches the 8 bit protocol byte in the IPv6 header.  You can
  not use symbolic protocol names (eg "tcp" or "udp").

ip6 sport <VALUE> <MASK>
  Matches thw 16 bit source port in a TCP or UDP IPv6.  This only
  works if the ip header contains no options.  Use the "link" and
  "match tcp src" or "match udp src" options if you can not be sure
  of that.

ip6 src <IPADDR>/<CIDR>
  Matches the src IP address in an IPv6 packet.

tcp dst <VALUE> <MASK>
  Match the 16 bit destination port in the tcp packet.  This must
  be in a hash table is "link"ed to by a filter item which contains
  an "offset" option that skips the IP header.

tcp src <VALUE> <MASK>
  Match the 16 bit source port in the tcp packet.  This must be
  in a hash table is "link"ed to by a filter item which contains
  an "offset" option that skips the IP header.

u16 <VALUE> <MASK>
  Match a 16 bit value in the packet.  The offset defaults to 0
  which is usually not want you want, at append the "at" option
  to give the correct value.

u32 <VALUE> <MASK>
  Match a 32 bit value in the packet.  The offset defaults to 0
  which is usually not want you want, at append the "at" option
  to give the correct value.

u8 <VALUE> <MASK>
  Match a 8 bit value in the packet.  The offset defaults to 0
  which is usually not want you want, at append the "at" option
  to give the correct value.

udp dst <VALUE> <MASK>
  Match the 16 bit destination port in the udp packet.  This must
  be in a hash table is "link"ed to by a filter item which contains
  an "offset" option that skips the IP header.

udp src <VALUE> <MASK>
  Match the 16 bit source port in the udp packet.  This must be
  in a hash table is "link"ed to by a filter item which contains
  an "offset" option that skips the IP header.

TC Hash Filter的更多相关文章

  1. Linux TC基于CBQ队列的流量管理范例

    参考了TC的很多文档,自己也整理了一篇配置记录.在实际使用过程中效果还不错,在此分享给大家以备参考.环境:局域网规模不是很大40多台机器. NAT共享上网(内网:eth0 外网:eth2)CBQ是通过 ...

  2. js实现hash

    由于项目中用到了hash,自己实现了一个. Hash = function () { } Hash.prototype = { constructor: Hash, add: function (k, ...

  3. Linux下TC使用说明

    Linux下TC使用说明   一.TC原理介绍 Linux操作系统中的流量控制器TC(Traffic Control)用于Linux内核的流量控制,主要是通过在输出端口处建立一个队列来实现流量控制. ...

  4. Linux下TC使用说明 & 使用备注 ZZ

    一.TC原理介绍 Linux操作系统中的流量控制器TC(Traffic Control)用于Linux内核的流量控制,主要是通过在输出端口处建立一个队列来实现流量控制. Linux流量控制的基本原理如 ...

  5. iproute2和tc的高级路由用法

    #Linux advanced router ip link show #显示链路 ip addr show #显示地址(或ifconfig) ip route show #显示路由(route -n ...

  6. django 操作数据库--orm(object relation mapping)---models

    思想 django为使用一种新的方式,即:关系对象映射(Object Relational Mapping,简称ORM). PHP:activerecord Java:Hibernate C#:Ent ...

  7. 使用ingress qdisc和ifb进行qos

    ifb   The Intermediate Functional Block device is the successor to the IMQ iptables module that was ...

  8. OpenWrt sscanf问题之于MT7620N与AR9341

    在MT7620N平台做好了wifidog的相关调试工作,除了eth驱动.wireless性能问题,其余的都能够基本正常. 依据实际须要要对已完毕的工作在AR9341平台上实现. 事实上也简单.基本功能 ...

  9. 【EMV L2】EMV终端数据

    Account TypeAcquirer IdentifierAdditional Terminal CapabilitiesAmount, Authorised (Binary)Amount, Au ...

随机推荐

  1. centos chkconfig 服务设置

    chkconfig命令主要用来更新(启动或停止)和查询系统服务的运行级信息.谨记chkconfig不是立即自动禁止或激活一个服务,它只是简单的改变了符号连接. 使用语法:chkconfig [--ad ...

  2. Linux系统中的日常监控知识点

    1.命令熟悉之w [xiongchao@oc3006745124 Desktop]$ w :: up :, users, load average: 1.48, 1.19, 1.11 USER TTY ...

  3. MySQL数据库百万级高并发网站实战

    在一开始接触PHP接触MYSQL的时候就听不少人说:“MySQL就跑跑一天几十万IP的小站还可以,要是几百万IP就不行了”,原话不记得了,大体 就是这个意思.一直也没有好的机会去验证这个说法,一是从没 ...

  4. async = require('async')

    var mongoose = require('mongoose'), async = require('async'); mongoose.connect('localhost', 'learn-m ...

  5. 线程系列3---ThreadLocal类研究

    2013-12-23 17:44:44 Java为线程安全提供了一些工具类,如ThreadLocal类,它代表一个线程局部变量,通过把数据放在ThreadLocal中就可以让每个线程创建一个该变量的副 ...

  6. wp8.1 Study10:APP数据存储

    一.理论 1.App的各种数据在WP哪里的? 下图很好介绍了这个问题.有InstalltionFolder, knownFolder, SD Card... 2.一个App的数据存储概览 主要分两大部 ...

  7. JSON基础使用

    1)JSON概念 JSON 是纯文本 JSON 具有“自我描述性”(人类可读) JSON 具有层级结构(值中存在值) JSON 可通过 JavaScript 进行解析 JSON 数据可使用 AJAX ...

  8. Android-LogCat日志工具(一)

    LogCat : Android中一个命令行工具,可以用于得到程序的log信息. 就像你知道一个人的日志.航程,你可以无时无刻知道一个人在干什么. 而LogCat , 就是程序的日志.通过日志,你可以 ...

  9. iOS应用崩溃日志分析

    转自raywenderlich   作为一名应用开发者,你是否有过如下经历?   为确保你的应用正确无误,在将其提交到应用商店之前,你必定进行了大量的测试工作.它在你的设备上也运行得很好,但是,上了应 ...

  10. URAL 1671 Anansi's Cobweb (并查集)

    题意:给一个无向图.每次查询破坏一条边,每次输出查询后连通图的个数. 思路:并查集.逆向思维,删边变成加边. #include<cstdio> #include<cstring> ...