[Bug 714] Kernel panics in same_src()

Mon Sep 9 04:48:19 CEST 2013

https://bugzilla.netfilter.org/show_bug.cgi?id=714

lizhao09 at huawei.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lizhao09 at huawei.com

--- Comment #15 from lizhao09 at huawei.com 2013-09-09 04:48:17 CEST ---
Here is another case related to this issue.
version: 2.6.32.43-0.4-default
hardware: X86_64

[10542399.515396] BUG: unable to handle kernel NULL pointer dereference at
000000000000003e
[10542399.523469] IP: [<ffffffffa1491a4b>] find_appropriate_src+0xdb/0x1a0
[nf_nat]
[10542399.530843] PGD 17f55ec067 PUD 17fba37067 PMD 0
[10542399.535727] Oops: 0000 [#1] SMP
[10542399.539220] last sysfs file:
/sys/devices/system/cpu/cpu23/cache/index2/shared_cpu_map
[10542399.547355] CPU 8
[10542399.647544] Supported: Yes, External
[10542399.651361] Pid: 0, comm: swapper Tainted: P          NX
2.6.32.43-0.4-default #1 Thurley
[10542399.659755] RIP: 0010:[<ffffffffa1491a4b>]  [<ffffffffa1491a4b>]
find_appropriate_src+0xdb/0x1a0 [nf_nat]
[10542399.669552] RSP: 0018:ffff88002c3039f0  EFLAGS: 00010286
[10542399.675095] RAX: 0000000000000000 RBX: ffff8817814beb90 RCX:
0000000024852261
[10542399.682454] RDX: 0000000000000000 RSI: 00000000327c4d71 RDI:
ffffffff81cd4dc0
[10542399.689812] RBP: ffff88002c303ad0 R08: 0000000000000011 R09:
0000000000000002
[10542399.697170] R10: 0000000000004000 R11: ffffffffa14726e0 R12:
ffff88002c303aa0
[10542399.704529] R13: ffff88002c303b40 R14: ffff88002c303b4c R15:
ffff88002c303b4e
[10542399.711888] FS:  0000000000000000(0000) GS:ffff88002c300000(0000)
knlGS:0000000000000000
[10542399.720199] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[10542399.726175] CR2: 000000000000003e CR3: 00000017f67f1000 CR4:
00000000000006e0
[10542399.733534] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[10542399.740893] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[10542399.748254] Process swapper (pid: 0, threadinfo ffff881810db2000, task
ffff881810db0080)
[10542399.756560] Stack:
[10542399.758821]  00000000ffffffff ffff88002c303aa0 ffff88002c303ad0
ffff88002c303b40
[10542399.766301] <0> 0000000000000000 ffff8817f7d639e8 0000000000000100
ffffffffa1491beb
[10542399.774237] <0> ffff88002c303ad0 ffff8817f7d639e8 ffff88002c303b40
ffff88002c303aa0
[10542399.782365] Call Trace:
[10542399.785085]  [<ffffffffa1491beb>] get_unique_tuple+0xdb/0x240 [nf_nat]
[10542399.791847]  [<ffffffffa1491de9>] nf_nat_setup_info+0x99/0x350 [nf_nat]
[10542399.798697]  [<ffffffffa149e162>] alloc_null_binding+0x52/0x90
[iptable_nat]
[10542399.805977]  [<ffffffffa149e519>] nf_nat_fn+0x1e9/0x280 [iptable_nat]
[10542399.812654]  [<ffffffff81318d18>] nf_iterate+0x68/0xa0
[10542399.818031]  [<ffffffff81318db2>] nf_hook_slow+0x62/0xf0
[10542399.823582]  [<ffffffff813214a1>] ip_local_deliver+0x51/0x80
[10542399.829477]  [<ffffffff81320a59>] ip_rcv_finish+0x1b9/0x440
[10542399.835288]  [<ffffffff812f5f89>] netif_receive_skb+0x599/0x6a0
[10542399.841454]  [<ffffffffa0ea4837>] ixgbe_clean_rx_irq+0x3d7/0xe50 [ixgbe]
[10542399.848397]  [<ffffffffa0ea53e4>] ixgbe_clean_rxtx_many+0x134/0x270
[ixgbe]
[10542399.855595]  [<ffffffff812f6863>] net_rx_action+0xe3/0x1a0
[10542399.861318]  [<ffffffff810533ef>] __do_softirq+0xbf/0x170
[10542399.866956]  [<ffffffff810040bc>] call_softirq+0x1c/0x30
[10542399.872506]  [<ffffffff81005cfd>] do_softirq+0x4d/0x80
[10542399.877883]  [<ffffffff81053275>] irq_exit+0x85/0x90
[10542399.883087]  [<ffffffff8100525e>] do_IRQ+0x6e/0xe0
[10542399.888120]  [<ffffffff81003913>] ret_from_intr+0x0/0xa
[10542399.893582]  [<ffffffff8100ae42>] mwait_idle+0x62/0x70
[10542399.898957]  [<ffffffff8100204a>] cpu_idle+0x5a/0xb0
[10542399.904159] Code: 00 00 00 4d 8d 7d 0e 4d 8d 75 0c 48 89 c3 eb 14 48 8b
03 48 85 c0 0f 84 84 00 00 00 44 0f b6 45 26 48 89 c3 48 8b 53 20 48 8b 03 <44>
38 42 3e 0f 18 08 75 dc 8b 42 18 3b 45 00 75 d4 0f b7 42 28

>From the vmcore,we found that: 

1 OOPS occured at the statement 't->dst.protonum == tuple->dst.protonum' in
inline function same_src. 

2 The first parameter of same_src "ct" is NULL,The value of 'ct' came from 'ct
= nat->ct'.

3 Read the content of the 'nat', all member's value are zero. The 'nat' has
been freed ?  

static void nf_nat_cleanup_conntrack(struct nf_conn *ct)
{
    struct nf_conn_nat *nat = nf_ct_ext_find(ct, NF_CT_EXT_NAT);

    if (nat == NULL || nat->ct == NULL)
        return;

    NF_CT_ASSERT(nat->ct->status & IPS_NAT_DONE_MASK);

    spin_lock_bh(&nf_nat_lock);
    hlist_del_rcu(&nat->bysource); 
    spin_unlock_bh(&nf_nat_lock);
         //no synchronize_rcu here
}

void nf_conntrack_free(struct nf_conn *ct)
{
  struct net *net = nf_ct_net(ct);
  nf_ct_ext_destroy(ct); //For NAT，it will call nf_nat_cleanup_conntrack
  atomic_dec(&net->ct.count);    
  nf_ct_ext_free(ct);  // Free nat-extention memory by kfree; is it possible
that the extention was still used in a RCU read side (same_src)?
  kmem_cache_free(net->ct.nf_conntrack_cachep, ct);
}

Is it safe to call function 'synchronize_rcu' at the end of function
nf_nat_cleanup_conntrack 
or
replace rcu_read_lock with spin_lock_bh(&nf_nat_lock) in RCU read side
(same_src)?

the output of "iptables -t nat -nvL" :

JINLUB017_01:~ # iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 22M packets, 2590M bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 DNAT       udp  --  pubeth9 *       0.0.0.0/0            0.0.0.0/0 
         udp dpt:4045 to:172.17.136.2:4045
    0     0 DNAT       tcp  --  pubeth9 *       0.0.0.0/0            0.0.0.0/0 
         tcp dpt:4045 to:172.17.136.2:4045
    0     0 DNAT       udp  --  pubeth4 *       0.0.0.0/0            0.0.0.0/0 
         udp dpt:4045 to:172.17.136.2:4045
    0     0 DNAT       tcp  --  pubeth4 *       0.0.0.0/0            0.0.0.0/0 
         tcp dpt:4045 to:172.17.136.2:4045
    0     0 DNAT       udp  --  pubeth3 *       0.0.0.0/0            0.0.0.0/0 
         udp dpt:4045 to:172.17.136.2:4045
    0     0 DNAT       tcp  --  pubeth3 *       0.0.0.0/0            0.0.0.0/0 
         tcp dpt:4045 to:172.17.136.2:4045
    0     0 DNAT       udp  --  pubeth2 *       0.0.0.0/0            0.0.0.0/0 
         udp dpt:4045 to:172.17.136.2:4045
    0     0 DNAT       tcp  --  pubeth2 *       0.0.0.0/0            0.0.0.0/0 
         tcp dpt:4045 to:172.17.136.2:4045
    0     0 DNAT       udp  --  pubeth10 *       0.0.0.0/0            0.0.0.0/0
          udp dpt:4045 to:172.17.136.2:4045
    0     0 DNAT       tcp  --  pubeth10 *       0.0.0.0/0            0.0.0.0/0
          tcp dpt:4045 to:172.17.136.2:4045
    0     0 DNAT       udp  --  pubeth1 *       0.0.0.0/0            0.0.0.0/0 
         udp dpt:4045 to:172.17.136.2:4045
    0     0 DNAT       tcp  --  pubeth1 *       0.0.0.0/0            0.0.0.0/0 
         tcp dpt:4045 to:172.17.136.2:4045
    0     0 DNAT       tcp  --  *      *       0.0.0.0/0            172.18.53.1
        tcp dpt:80 to:172.18.53.1:8080

Chain POSTROUTING (policy ACCEPT 88090 packets, 6081K bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 SNAT       tcp  --  *      priveth0  0.0.0.0/0           
172.17.136.2        tcp dpt:4045 to:172.17.136.153
    0     0 SNAT       tcp  --  *      *       172.18.53.1          0.0.0.0/0  
        tcp spt:8080 to:172.18.53.1:80

Chain OUTPUT (policy ACCEPT 88090 packets, 6081K bytes)
pkts bytes target     prot opt in     out     source               destination

-- 
Configure bugmail: https://bugzilla.netfilter.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.