Vasily Averin [Fri, 28 Oct 2005 20:46:35 +0000 (16:46 -0400)]
sis900: come alive after temporary memory shortage
1) Forgotten counter incrementation in sis900_rx() in case
it doesn't get memory for skb, that leads to whole interface failure.
Problem is accompanied with messages:
eth0: Memory squeeze,deferring packet.
eth0: NULL pointer encountered in Rx ring, skipping
2) If counter cur_rx overflows and there'll be temporary memory problems
buffer can't be recreated later, when memory IS available.
3) Limit the work in handler to prevent the endless packets processing
if new packets are generated faster then handled.
Signed-off-by: Konstantin Khorenko <khorenko@sw.ru> Signed-off-by: Vasily Averin <vvs@sw.ru> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Eugene Surovegin [Mon, 10 Oct 2005 23:58:14 +0000 (16:58 -0700)]
[PATCH] New PowerPC 4xx on-chip ethernet controller driver
This patch replaces current PowerPC 4xx EMAC driver with
new, re-written from the scratch version. This patch is quite big
(~234K) because there is virtualy 0% of common code between old and
new version.
New driver uses NAPI, it solves stability problems under heavy packet
load and low memory, corrects chip register access and fixes numerous
small bugs I don't even remember now.
This patch has been tested on all supported in 2.6 PPC 4xx boards.
It's been used in production for almost a year now on custom
4xx hardware. PPC32 specific parts are already upstream.
Patch was acked by the current EMAC driver maintainer (Matt Porter). I
will be maintaining this new version.
Under heavy PCI bus load, ports of the DFE-580TX 4-ethernet port board stop
working, with currently no other cure than a powercycle. Here is a tested
fix. By the way, I also fixed some references and attribution.
Signed-off-by: Philippe De Muyter <phdm@macqel.be> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Roger While [Fri, 28 Oct 2005 20:11:49 +0000 (16:11 -0400)]
[wireless prism54] Fix frame length
prism54 is leaking information when passing transmits to the firmware.
There is no requirement to adjust the length to >= ETH_ZLEN.
Just pass the skb length (after possible adjustment).
Signed-off-by: Roger While <simrw@sim-basis.de> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Santiago Leon [Wed, 26 Oct 2005 16:47:23 +0000 (10:47 -0600)]
[PATCH] ibmveth fix failed addbuf
This patch fixes a bug that happens when the hypervisor can't add a
buffer. The old code wrote IBM_VETH_INVALID_MAP into the free_map
array, so next time the index was used, a ibmveth_assert() caught it and
called BUG(). The patch writes the right value into the free_map array
so that the index can be reused.
Signed-off-by: Santiago Leon <santil@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Santiago Leon [Wed, 26 Oct 2005 16:47:16 +0000 (10:47 -0600)]
[PATCH] ibmveth lockless TX
This patch adds the lockless TX feature to the ibmveth driver. The
hypervisor has its own locking so the only change that is necessary is
to protect the statistics counters.
Signed-off-by: Santiago Leon <santil@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Santiago Leon [Wed, 26 Oct 2005 16:47:08 +0000 (10:47 -0600)]
[PATCH] ibmveth fix buffer replenishing
This patch removes the allocation of RX skb's buffers from a workqueue
to be called directly at RX processing time. This change was suggested
by Dave Miller when the driver was starving the RX buffers and
deadlocking under heavy traffic:
> Allocating RX SKBs via tasklet is, IMHO, the worst way to
> do it. It is no surprise that there are starvation cases.
>
> If tasklets or work queues get delayed in any way, you lose,
> and it's very easy for a card to catch up with the driver RX'ing
> packets very fast, no matter how aggressive you make the
> replenishing. By the time you detect that you need to be
> "more aggressive" it is already too late.
> The only pseudo-reliable way is to allocate at RX processing time.
>
Signed-off-by: Santiago Leon <santil@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Santiago Leon [Wed, 26 Oct 2005 16:47:01 +0000 (10:47 -0600)]
[PATCH] ibmveth fix buffer pool management
This patch changes the way the ibmveth driver handles the receive
buffers. The old code mallocs and maps all the buffers in the pools
regardless of MTU size and it also limits the number of buffer pools to
three. This patch makes the driver malloc and map the buffers necessary
to support the current MTU. It also changes the hardcoded names of the
buffer pool number, size, and elements to arrays to make it easier to
change (with the hope of making them runtime parameters in the future).
Signed-off-by: Santiago Leon <santil@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Michael Chan [Wed, 26 Oct 2005 22:48:35 +0000 (15:48 -0700)]
[PATCH] tg3: fix ASF heartbeat
Change the ASF heart beat to 5 seconds for faster detection of system
crash. The driver sends the heartbeat every 2 seconds and the ASF
firmware will timeout and reset the device if no heartbeat is received
after 5 seconds. The old scheme of 2 minutes is ineffective.
tg3_write_mem_fast() is added to speed up the IO to send the heartbeat.
When no workaround is needed, it will use direct MMIO to memory space
to write to memory.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Michael Chan [Wed, 26 Oct 2005 22:46:52 +0000 (15:46 -0700)]
[PATCH] tg3: add 5714/5715 support
Add complete support for 5714/5715. These chips are very similar to
5780 so the changes are very trivial. A TG3_FLG2_5780_CLASS flag is
added to identify these chips.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Al Viro [Fri, 21 Oct 2005 07:20:48 +0000 (03:20 -0400)]
[PATCH] gfp_t: fs/*
- ->releasepage() annotated (s/int/gfp_t), instances updated
- missing gfp_t in fs/* added
- fixed misannotation from the original sweep caught by bitwise checks:
XFS used __nocast both for gfp_t and for flags used by XFS allocator.
The latter left with unsigned int __nocast; we might want to add a
different type for those but for now let's leave them alone. That,
BTW, is a case when __nocast use had been actively confusing - it had
been used in the same code for two different and similar types, with
no way to catch misuses. Switch of gfp_t to bitwise had caught that
immediately...
One tricky bit is left alone to be dealt with later - mapping->flags is
a mix of gfp_t and error indications. Left alone for now.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Al Viro [Fri, 21 Oct 2005 06:55:38 +0000 (02:55 -0400)]
[PATCH] gfp_t: infrastructure
Beginning of gfp_t annotations:
- -Wbitwise added to CHECKFLAGS
- old __bitwise renamed to __bitwise__
- __bitwise defined to either __bitwise__ or nothing, depending on
__CHECK_ENDIAN__ being defined
- gfp_t switched from __nocast to __bitwise__
- force cast to gfp_t added to __GFP_... constants
- new helper - gfp_zone(); extracts zone bits out of gfp_t value and casts
the result to int
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jens Axboe [Fri, 28 Oct 2005 06:30:39 +0000 (08:30 +0200)]
[BLOCK] elevator switch fixes/cleanup
- 100msec sleep is a little excessive, lots of requests can complete
in that timeframe. Use 10msec instead.
- Rename QUEUE_FLAG_BYPASS to QUEUE_FLAG_ELVSWITCH to indicate what
is going on.
Tejun Heo [Fri, 28 Oct 2005 06:29:39 +0000 (08:29 +0200)]
[BLOCK] Reimplement elevator switch
This patch reimplements elevator switch. This patch assumes generic
dispatch queue patchset is applied.
* Each request is tagged with REQ_ELVPRIV flag if it has its elevator
private data set.
* Requests which doesn't have REQ_ELVPRIV flag set never enter
iosched. They are always directly back inserted to dispatch queue.
Of course, elevator_put_req_fn is called only for requests which
have its REQ_ELVPRIV set.
* Request queue maintains the current number of requests which have
its elevator data set (elevator_set_req_fn called) in
q->rq->elvpriv.
* If a request queue has QUEUE_FLAG_BYPASS set, elevator private data
is not allocated for new requests.
To switch to another iosched, we set QUEUE_FLAG_BYPASS and wait until
elvpriv goes to zero; then, we attach the new iosched and clears
QUEUE_FLAG_BYPASS. New implementation is much simpler and main code
paths are less cluttered, IMHO.
Tejun Heo [Thu, 20 Oct 2005 14:46:23 +0000 (16:46 +0200)]
[PATCH] 03/05 move last_merge handlin into generic elevator code
Currently, both generic elevator code and specific ioscheds
participate in the management and usage of last_merge. This
and the following patches move last_merge handling into
generic elevator code.
Jens Axboe [Thu, 20 Oct 2005 14:42:29 +0000 (16:42 +0200)]
[PATCH] 02/05: update ioscheds to use generic dispatch queue
This patch updates all four ioscheds to use generic dispatch
queue. There's one behavior change in as-iosched.
* In as-iosched, when force dispatching
(ELEVATOR_INSERT_BACK), batch_data_dir is reset to REQ_SYNC
and changed_batch and new_batch are cleared to zero. This
prevernts AS from doing incorrect update_write_batch after
the forced dispatched requests are finished.
* In cfq-iosched, cfqd->rq_in_driver currently counts the
number of activated (removed) requests to determine
whether queue-kicking is needed and cfq_max_depth has been
reached. With generic dispatch queue, I think counting
the number of dispatched requests would be more appropriate.
* cfq_max_depth can be lowered to 1 again.
Original from Tejun Heo, modified version applied.
Tejun Heo [Thu, 20 Oct 2005 14:23:44 +0000 (16:23 +0200)]
[PATCH] 01/05 Implement generic dispatch queue
Implements generic dispatch queue which can replace all
dispatch queues implemented by each iosched. This reduces
code duplication, eases enforcing semantics over dispatch
queue, and simplifies specific ioscheds.
Tejun Heo [Thu, 20 Oct 2005 08:56:41 +0000 (10:56 +0200)]
[PATCH] fix try_module_get race in elevator_find
This patch removes try_module_get race in elevator_find.
try_module_get should always be called with the spinlock protecting
what the module init/cleanup routines register/unregister to held. In
the case of elevators, we should be holding elv_list to avoid it going
away between spin_unlock_irq and try_module_get.
Chen, Kenneth W [Thu, 13 Oct 2005 19:49:29 +0000 (21:49 +0200)]
Following the same idea, it occurs to me that we should only update
disk stat when "now" is different from disk->stamp. Otherwise, we
are again needlessly adding zero to the stats.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Jens Axboe <axboe@suse.de>
Chen, Kenneth W [Thu, 13 Oct 2005 19:48:42 +0000 (21:48 +0200)]
[patch] remove gendisk->stamp_idle field
struct gendisk has these two fields: stamp, stamp_idle. Update to
stamp_idle is always in sync with stamp and they are always the same.
Therefore, it does not add any value in having two fields tracking
same timestamp. Suggest to remove it.
Also, we should only update gendisk stats with non-zero value.
Advantage is that we don't have to needlessly calculate memory address,
and then add zero to the content.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Jens Axboe <axboe@suse.de>
Trond Myklebust [Fri, 28 Oct 2005 02:12:41 +0000 (22:12 -0400)]
NFS: Add optional post-op getattr instruction to the NFSv4 file close.
"Optional" means that the close call will not fail if the getattr
at the end of the compound fails.
If it does succeed, try to refresh inode attributes.
Chuck Lever [Tue, 25 Oct 2005 15:48:36 +0000 (11:48 -0400)]
NFS: nfs_lookup doesn't need to revalidate the parent directory's inode
nfs_lookup() used to consult a lookup cache before trying an actual wire
lookup operation. The lookup cache would be invalid, of course, if the
parent directory's mtime had changed, so nfs_lookup performed an inode
revalidation on the parent.
Since nfs_lookup() doesn't use a cache anymore, the revalidation is no
longer necessary. There are cases where it will generate a lot of
unnecessary GETATTR traffic.
See http://bugzilla.linux-nfs.org/show_bug.cgi?id=9
Test-plan:
Use lndir and "rm -rf" and watch for excess GETATTR traffic or application
level errors.
Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Fri, 28 Oct 2005 02:12:39 +0000 (22:12 -0400)]
NFS: Don't let nfs_end_data_update() clobber attribute update information
Since we almost always call nfs_end_data_update() after we called
nfs_refresh_inode(), we now end up marking the inode metadata
as needing revalidation immediately after having updated it.
This patch rearranges things so that we mark the inode as needing
revalidation _before_ we call nfs_refresh_inode() on those operations
that need it.
Herbert Xu [Thu, 27 Oct 2005 08:47:46 +0000 (18:47 +1000)]
[TCP]: Clear stale pred_flags when snd_wnd changes
This bug is responsible for causing the infamous "Treason uncloaked"
messages that's been popping up everywhere since the printk was added.
It has usually been blamed on foreign operating systems. However,
some of those reports implicate Linux as both systems are running
Linux or the TCP connection is going across the loopback interface.
In fact, there really is a bug in the Linux TCP header prediction code
that's been there since at least 2.1.8. This bug was tracked down with
help from Dale Blount.
The effect of this bug ranges from harmless "Treason uncloaked"
messages to hung/aborted TCP connections. The details of the bug
and fix is as follows.
When snd_wnd is updated, we only update pred_flags if
tcp_fast_path_check succeeds. When it fails (for example,
when our rcvbuf is used up), we will leave pred_flags with
an out-of-date snd_wnd value.
When the out-of-date pred_flags happens to match the next incoming
packet we will again hit the fast path and use the current snd_wnd
which will be wrong.
In the case of the treason messages, it just happens that the snd_wnd
cached in pred_flags is zero while tp->snd_wnd is non-zero. Therefore
when a zero-window packet comes in we incorrectly conclude that the
window is non-zero.
In fact if the peer continues to send us zero-window pure ACKs we
will continue making the same mistake. It's only when the peer
transmits a zero-window packet with data attached that we get a
chance to snap out of it. This is what triggers the treason
message at the next retransmit timeout.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>