Fri, 01 May 2009

r8169 NETDEV WATCHDOG transmit timed out problem

I recently built a new home server box using an Intel Atom (BOXD945GCLF2 Atom 330 Dual Core 1.6Ghz to be exact), and ran into a strange problem where the box would crash with an error like this:
[322865.976030] ------------[ cut here ]------------
[322865.976038] WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0xf6/0x18b()
[322865.976043] Hardware name:
[322865.976047] NETDEV WATCHDOG: eth0 (r8169): transmit timed out
[322865.976051] Modules linked in: ipt_MASQUERADE xt_limit xt_helper xt_multiport xt_DSCP xt_tcpudp xt_state ipt_LOG ipt_REJECT iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter iptable_mangle ip_tables x_tables ipv6 fuse loop hid_pl hid_cypress hid_zpff hid_gyration hid_sony hid_ntrig hid_samsung hid_microsoft hid_tmff hid_monterey hid_ezkey hid_apple hid_a4tech hid_logitech ff_memless hid_cherry hid_sunplus hid_petalynx hid_belkin hid_chicony usbhid hid ds2490 wire cn serio_raw 8139too i2c_i801 rng_core 8139cp parport_pc evdev i2c_core floppy parport ehci_hcd uhci_hcd button thermal processor iTCO_wdt thermal_sys usbcore
[322865.976152] Pid: 0, comm: swapper Not tainted 2.6.29.1 #1
[322865.976156] Call Trace:
[322865.976167]  [] warn_slowpath+0x80/0xb6
[322865.976176]  [] cpumask_next_and+0x23/0x33
[322865.976184]  [] find_busiest_group+0x2fa/0x7e2
[322865.976193]  [] sched_clock_cpu+0x136/0x147
[322865.976200]  [] dev_watchdog+0xf6/0x18b
[322865.976207]  [] hrtimer_forward+0x10c/0x124
[322865.976214]  [] scheduler_tick+0x9c/0x1a3
[322865.976220]  [] getnstimeofday+0x4c/0xcf
[322865.976227]  [] lapic_next_event+0x10/0x13
[322865.976233]  [] dev_watchdog+0x0/0x18b
[322865.976241]  [] run_timer_softirq+0x14a/0x1b4
[322865.976247]  [] dev_watchdog+0x0/0x18b
[322865.976254]  [] __do_softirq+0x8c/0x130
[322865.976260]  [] do_softirq+0x45/0x53
[322865.976266]  [] irq_exit+0x35/0x62
[322865.976272]  [] smp_apic_timer_interrupt+0x71/0x7b
[322865.976280]  [] apic_timer_interrupt+0x28/0x30
[322865.976287]  [] mwait_idle+0x4c/0x5a
[322865.976293]  [] cpu_idle+0x60/0x7a
[322865.976298] ---[ end trace f9e87d98b4ee5218 ]---
[322866.001730] r8169: eth0: link up
It would always happen while transfering large amounts of data out from the server through the onboard gigabyte ethernet listed in lspci as:
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)

Sometimes it would sort of freeze up the machine for a minute or two, and others it crashed and rebooted. Anyways, tracking the problem down was quite the pain since it only happened sometimes when transfering large amounts of data. Searching for a fix also was hard, and I found many others with the same problem with this realtek NIC, but no one had a solution. But I eventually stumbled upon this post which was the same problem and the last post is someone saying they were going to try the pci=nomsi boot option. I guess it worked for him and so he never posted back, so I tried that out myself and it seems to have fixed the problem.

The pci=nomsi option seems to disable MSI (Message Signaled Interrupt) which is a feature of the PCI bus revision 2.3 or later. It seems like it sometimes causes problems as it is the solution to a number of different problems with pci devices not working so well.

posted at: 01:37 | path: /debian | permanent link to this entry


2019-Sep
2019-Jul
2019-Jun
2019-May
2018-Dec
2018-Jan
2017-Aug
2017-Jun
2017-May
2016-Nov
2015-Dec
2015-Nov
2015-Oct
2015-Jul
2015-Jun
2014-Dec
2012-Oct
2012-Sep
2012-Jun
2012-Feb
2012-Jan
2011-Dec
2011-Sep
2011-Aug
2011-May
2011-Feb
2010-Jun
2010-Apr
2010-Jan
2009-Sep
2009-Jul
2009-May
2009-Jan
2008-Oct
2008-Sep
2008-Jun
2008-May
2008-Jan
2007-Nov
2007-Oct
2007-Aug
2007-Jun
2007-May
2007-Mar
2007-Feb
2007-Jan
2006-Nov
2006-Oct
2006-Sep
2006-Aug
2006-Jun
2006-Apr
2006-Jan
2005-Dec
2005-Nov
2005-Oct
2005-Sep
2005-Aug
2005-Jul
2005-Jun
2005-May
2005-Apr
2005-Mar
2005-Feb
2005-Jan
2004-Dec
2004-Nov
2004-Oct
2004-Sep
2004-Aug
2004-Jul
2004-Jun
2004-May

Powered by PyBlosxom | RSS 2.0