The issue is that the code calls sg_mark_end() which clobbers the
sg_page() pointer of the final scatterlist entry.
The first part fo the fix makes skb_to_sgvec() do __sg_mark_end().
After considering all skb_to_sgvec() call sites the most correct
solution is to call __sg_mark_end() in skb_to_sgvec() since that is
what all of the callers would end up doing anyways.
I suspect this might have fixed some problems in virtio_net which is
the sole non-crypto user of skb_to_sgvec().
Other similar sg_mark_end() cases were converted over to
__sg_mark_end() as well.
Arguably sg_mark_end() is a poorly named function because it doesn't
just "mark", it clears out the page pointer as a side effect, which is
what led to these bugs in the first place.
The one remaining plain sg_mark_end() call is in scsi_alloc_sgtable()
and arguably it could be converted to __sg_mark_end() if only so that
we can delete this confusing interface from linux/scatterlist.h
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Lezcano [Tue, 30 Oct 2007 22:39:33 +0000 (15:39 -0700)]
[IPV6]: remove duplicate call to proc_net_remove
The file /proc/net/if_inet6 is removed twice.
First time in:
inet6_exit
->addrconf_cleanup
And followed a few lines after by:
inet6_exit
-> if6_proc_exit
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Lezcano [Tue, 30 Oct 2007 22:38:57 +0000 (15:38 -0700)]
[NETNS]: fix net released by rcu callback
When a network namespace reference is held by a network subsystem,
and when this reference is decremented in a rcu update callback, we
must ensure that there is no more outstanding rcu update before
trying to free the network namespace.
In the normal case, the rcu_barrier is called when the network namespace
is exiting in the cleanup_net function.
But when a network namespace creation fails, and the subsystems are
undone (like the cleanup), the rcu_barrier is missing.
This patch adds the missing rcu_barrier.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Lezcano [Tue, 30 Oct 2007 22:38:18 +0000 (15:38 -0700)]
[NET]: Fix free_netdev on register_netdev failure.
Point 1:
The unregistering of a network device schedule a netdev_run_todo.
This function calls dev->destructor when it is set and the
destructor calls free_netdev.
Point 2:
In the case of an initialization of a network device the usual code
is:
* alloc_netdev
* register_netdev
-> if this one fails, call free_netdev and exit with error.
Point 3:
In the register_netdevice function at the later state, when the device
is at the registered state, a call to the netdevice_notifiers is made.
If one of the notification falls into an error, a rollback to the
registered state is done using unregister_netdevice.
Conclusion:
When a network device fails to register during initialization because
one network subsystem returned an error during a notification call
chain, the network device is freed twice because of fact 1 and fact 2.
The second free_netdev will be done with an invalid pointer.
Proposed solution:
The following patch move all the code of unregister_netdevice *except*
the call to net_set_todo, to a new function "rollback_registered".
The following functions are changed in this way:
* register_netdevice: calls rollback_registered when a notification fails
* unregister_netdevice: calls rollback_register + net_set_todo, the call
order to net_set_todo is changed because it is the
latest now. Since it justs add an element to a list
that should not break anything.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Zach Brown [Tue, 30 Oct 2007 18:45:46 +0000 (11:45 -0700)]
dio: fix cache invalidation after sync writes
Commit commit 65b8291c4000e5f38fc94fb2ca0cb7e8683c8a1b ("dio: invalidate
clean pages before dio write") introduced a bug which stopped dio from
ever invalidating the page cache after writes. It still invalidated it
before writes so most users were fine.
Karl Schendel reported ( http://lkml.org/lkml/2007/10/26/481 ) hitting
this bug when he had a buffered reader immediately reading file data
after an O_DIRECT wirter had written the data. The kernel issued
read-ahead beyond the position of the reader which overlapped with the
O_DIRECT writer. The failure to invalidate after writes caused the
reader to see stale data from the read-ahead.
The following patch is originally from Karl. The following commentary
is his:
The below 3rd try takes on your suggestion of just invalidating
no matter what the retval from the direct_IO call. I ran it
thru the test-case several times and it has worked every time.
The post-invalidate is probably still too early for async-directio,
but I don't have a testcase for that; just sync. And, this
won't be any worse in the async case.
I added a test to the aio-dio-regress repository which mimics Karl's IO
pattern. It verifed the bad behaviour and that the patch fixed it. I
agree with Karl, this still doesn't help the case where a buffered
reader follows an AIO O_DIRECT writer. That will require a bit more
work.
This gives up on the idea of returning EIO to indicate to userspace that
stale data remains if the invalidation failed.
Signed-off-by: Zach Brown <zach.brown@oracle.com> Cc: Karl Schendel <kschendel@datallegro.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Leonid Ananiev <leonid.i.ananiev@linux.intel.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 30 Oct 2007 19:04:45 +0000 (12:04 -0700)]
Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Fix incorrect return value from ia64_setup_msi_irq()
[IA64] arch/ia64/sn/kernel/mca.c: undo lock when sn_oemdata can't be extended
[IA64] update sn2 defconfig to 64kb pages
[IA64] fix typo in per_cpu_offset
[IA64] /proc/cpuinfo "physical id" field cleanups
[IA64] vDSO vs --build-id
[IA64] check-segrel.lds vs --build-id
[IA64] vmcore_find_descriptor_size should be in __init
[IA64] ia64/mm/init.c: fix section mismatches
Linus Torvalds [Tue, 30 Oct 2007 19:04:29 +0000 (12:04 -0700)]
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6:
ixgb: fix TX hangs under heavy load
e1000e: Fix typo ! &
ixgbe: minor sparse fixes
e1000: sparse warnings fixes
ixgb: fix sparse warnings
e1000e: fix sparse warnings
mv643xx_eth: Fix MV643XX_ETH offsets used by Pegasos 2
Blackfin EMAC driver: Fix Ethernet communication bug (dupliated and lost packets)
DM9601: Support for ADMtek ADM8515 NIC
Linus Torvalds [Tue, 30 Oct 2007 18:49:13 +0000 (11:49 -0700)]
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
libata: implement and use ATA_QCFLAG_QUIET
libata: stop being overjealous about non-IO commands
libata: flush is an IO command
sata_promise: cleanups
sata_promise: ASIC PRD table bug workaround, take 2
Auke Kok [Tue, 30 Oct 2007 18:21:50 +0000 (11:21 -0700)]
ixgb: fix TX hangs under heavy load
A merge error occurred where we merged the wrong block here
in version 1.0.120. The right condition for frags is slightly
different then for the skb, so account for the difference properly
and trim the TSO based size right.
Originally part of a fix reported by IBM to fix TSO hangs on
pSeries hardware.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Fix sparse warnings in ixgb driver for net-2.6.24.
Added a sparse fix for invalid declaration using non-constant value
in ixgb_set_multi. Added a fix for the module param array index
and allows int params in the array. --Auke
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Dale Farnsworth [Mon, 29 Oct 2007 22:39:01 +0000 (15:39 -0700)]
mv643xx_eth: Fix MV643XX_ETH offsets used by Pegasos 2
In the mv643xx_eth driver, we now use offsets from the ethernet
register block within the chip, but the pegasos 2 platform still
needs offsets from the full chip's register base address.
Signed-off-by: Dale Farnsworth <dale@farnsworth.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Blackfin EMAC driver: Fix Ethernet communication bug (dupliated and lost packets)
Fix Ethernet communication bug(dupliated and lost packets)
in RMII PHY mode- dont call mac_disable and mac_enable during
10/100 REFCLK changes - mac_enable screws up the DMA descriptor chain
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Bryan Wu <bryan.wu@analog.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Kenji Kaneshige [Tue, 30 Oct 2007 07:01:49 +0000 (16:01 +0900)]
[IA64] Fix incorrect return value from ia64_setup_msi_irq()
Fix the problem that pci_enable_msi() fails on ia64 platform. The cause of
this problem is incorrect return value of ia64_setup_msi_irq(). It must
return 0 on success, instead of irq number.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
J. Bruce Fields [Tue, 30 Oct 2007 15:20:02 +0000 (11:20 -0400)]
locks: fix possible infinite loop in posix deadlock detection
It's currently possible to send posix_locks_deadlock() into an infinite
loop (under the BKL).
For now, fix this just by bailing out after a few iterations. We may
want to fix this in a way that better clarifies the semantics of
deadlock detection. But that will take more time, and this minimal fix
is probably adequate for any realistic scenario, and is simple enough to
be appropriate for applying to stable kernels now.
Thanks to George Davis for reporting the problem.
Cc: "George G. Davis" <gdavis@mvista.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 30 Oct 2007 15:39:20 +0000 (08:39 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/blackfin-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/blackfin-2.6:
Blackfin arch: use a less common define name in BF549
Blackfin arch: Add missing definitions for BF561
Blackfin arch: reclaim a few bytes from the end of our init section
Blackfin arch: fix libata data struct member from irq_type to irq_flags
Blackfin arch: Do not pollute name space used in linux-2.6.x/sound
Blackfin arch: Fix bug set correct baud for spi mmc and enable SPI after DMA.
Blackfin arch: update board defconfig files according to latest information from ADI datasheet
Blackfin arch: ensure that speculative loads of bad pointers don't cause us to do bad things.
Blackfin arch: Add missing definitions of BF54x
Blackfin arch: Fix random crash issue found by Michael.
Blackfin arch: fix bug: tell users if the kernel is recovering from a fault condition
Blackfin arch: add support for checking/clearing overruns in generic purpose Timer API
Blackfin arch: cleanup arch/blackfin/kernel/traps.c handling code.
Blackfin arch: Apply Bluetchnix vendor patch provided by Harald Krapfenbauer
Blackfin arch: fix bug BlueTechnix CM-BF537 board config uses wrong IRQ for net2272 driver
Blackfin arch: fix bug: kernel prints out error message twice
Blackfin arch: add NFC driver support in BF527-EZKIT board
Blackfin arch: Added support for HV Sistemas H8606 board
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
[TIPC]: Add tipc_config.h to include/linux/Kbuild.
[WAN]: lmc_ioctl: don't return with locks held
[SUNRPC]: fix rpc debugging
[TCP]: Saner thash_entries default with much memory.
[SUNRPC] rpc_rdma: we need to cast u64 to unsigned long long for printing
[IPv4] SNMP: Refer correct memory location to display ICMP out-going statistics
[NET]: Fix error reporting in sys_socketpair().
[NETFILTER]: nf_ct_alloc_hashtable(): use __GFP_NOWARN
[NET]: Fix race between poll_napi() and net_rx_action()
[TCP] MD5: Remove some more unnecessary casting.
[TCP] vegas: Fix a bug in disabling slow start by gamma parameter.
[IPVS]: use proper timeout instead of fixed value
[IPV6] NDISC: Fix setting base_reachable_time_ms variable.
Krzysztof Helt [Mon, 29 Oct 2007 21:37:24 +0000 (14:37 -0700)]
s3c-rtc: remove unused variable
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Acked-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Atsushi Nemoto [Mon, 29 Oct 2007 21:37:23 +0000 (14:37 -0700)]
serial: fix serial_txx9 console initialization
Since commit 97d97224ff361e08777fb33e0fd193ca877dac28 ("[SERIAL] Fix
console initialisation ordering"), serial_core calls ->pm() on
initialization even if the port was used for console.
This behaviour breaks serial_txx9 console since The serial_txx9 driver
initialize its port entirely on its ->pm() method if new state was 0.
This patch adds checking for oldstate value to fix this probelm.
Takashi Iwai [Mon, 29 Oct 2007 21:37:22 +0000 (14:37 -0700)]
intel-iommu: Fix array overflow
Fix possible array overflow:
drivers/pci/intel-iommu.c: In function ¡dmar_get_fault_reason¢:
drivers/pci/intel-iommu.c:753: warning: array subscript is above array bounds
drivers/pci/intel-iommu.c: In function ¡iommu_page_fault¢:
drivers/pci/intel-iommu.c:753: warning: array subscript is above array bounds
Andrew Morton [Mon, 29 Oct 2007 21:37:21 +0000 (14:37 -0700)]
revert "ufs: Fix mount check in ufs_fill_super()"
Evgeniy said:
I wonder on what type of UFS do you test this patch? NetBSD and FreeBSD
do not use "fs_state", they use "fs_clean" flag, only Solaris does check
like this: fs_state + fs_time == FSOK.
That's why parentheses was like that.
At now with linux-2.6.24-rc1-git1, I get: fs need fsck, but NetBSD's fsck
says that's all ok.
Hugh Dickins [Mon, 29 Oct 2007 21:37:20 +0000 (14:37 -0700)]
fix tmpfs BUG and AOP_WRITEPAGE_ACTIVATE
It's possible to provoke unionfs (not yet in mainline, though in mm and
some distros) to hit shmem_writepage's BUG_ON(page_mapped(page)). I expect
it's possible to provoke the 2.6.23 ecryptfs in the same way (but the
2.6.24 ecryptfs no longer calls lower level's ->writepage).
This came to light with the recent find that AOP_WRITEPAGE_ACTIVATE could
leak from tmpfs via write_cache_pages and unionfs to userspace. There's
already a fix (e423003028183df54f039dfda8b58c49e78c89d7 - writeback: don't
propagate AOP_WRITEPAGE_ACTIVATE) in the tree for that, and it's okay so
far as it goes; but insufficient because it doesn't address the underlying
issue, that shmem_writepage expects to be called only by vmscan (relying on
backing_dev_info capabilities to prevent the normal writeback path from
ever approaching it).
That's an increasingly fragile assumption, and ramdisk_writepage (the other
source of AOP_WRITEPAGE_ACTIVATEs) is already careful to check
wbc->for_reclaim before returning it. Make the same check in
shmem_writepage, thereby sidestepping the page_mapped BUG also.
mm/sparse-vmemmap.c: make sure init_mm is included
mm/sparse-vmemmap.c uses init_mm in some places. However, it is not
present in any of the headers currently included in the file.
init_mm is defined as extern in sched.h, so we add it to the headers list
Up to now, this problem was masked by the fact that functions like
set_pte_at() and pmd_populate_kernel() are usually macros that expand to
simpler variants that does not use the first parameter at all.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andi Kleen [Mon, 29 Oct 2007 21:37:18 +0000 (14:37 -0700)]
Remove bogus default y for DMAR and NET_DMA
No reason I can think of of making them default y Most people don't have
the hardware and with default y they just pollute lots of configs during
make oldconfig.
Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Jeff Garzik <jeff@garzik.org> Acked-by: "Nelson, Shannon" <shannon.nelson@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
J. Bruce Fields [Mon, 29 Oct 2007 21:37:18 +0000 (14:37 -0700)]
sunrpc: fix rpc debugging
Commit baa3a2a0d24ebcf1c451bec8e5bee3d3467f4cbb ("sysctl: remove broken
sunrpc debug binary sysctls"), by removing initialization of the
ctl_name field, broke this conditional, preventing the display of
rpc_tasks that you previously got when turning on rpc debugging.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Corey Minyard [Mon, 29 Oct 2007 21:37:13 +0000 (14:37 -0700)]
IPMI: fix comparison in demangle_device_id
Coverity spotted some incorrect code in a recent change to the IPMI driver;
this patch make sure the data is really long enough to pull the
manufacturer id and product id out of a get device id message.
Signed-off-by: Corey Minyard <cminyard@mvista.com> Cc: Adrian Bunk <bunk@kernel.org> Cc: Stian Jordet <liste@jordet.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Freezer: do not allow freezing processes to clear TIF_SIGPENDING
Do not allow processes to clear their TIF_SIGPENDING if TIF_FREEZE is set,
so that they will not race with the freezer (like mysqld does, for example).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Nigel Cunningham <nigel@suspend2.net> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Fri, 26 Oct 2007 07:19:26 +0000 (16:19 +0900)]
libata: implement and use ATA_QCFLAG_QUIET
Implement ATA_QCFLAG_QUIET which indicates that there's no need to
report if the command fails with AC_ERR_DEV and set it for passthrough
commands.
Combined with previous changes, this now makes device errors for all
direct commands reported directly to the issuer without going through
EH actions and reporting.
Note that EH is still invoked after non-IO device errors to determine
the nature of the error and resume command execution (some controller
requires special care after error to continue). It just performs
default maintenance after error, examines what's going on, realizes
that it's none of its business and reports the command failure without
logging any error messages.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Fri, 26 Oct 2007 07:12:41 +0000 (16:12 +0900)]
libata: stop being overjealous about non-IO commands
libata EH always revalidated device and retried failed command after
error except for ATAPI CCs. This is unnecessary and hinders with
users issuing direct commands. This patch makes the following
changes.
* Make sata_sil24 not request ATA_EH_REVALIDATE on device errors.
sil24 is the only driver which does this. All others let libata EH
core code decide.
* Don't request revalidation after device error of non-IO command.
Revalidation doesn't really help anybody. As ATA_EH_REVALIDATE
isn't set by default, there's no reason to clear it after sense data
is read. Kill ATA_EH_REVALIDATE clearing code while at it.
* Don't retry non-IO command after device error. Device has rejected
the command. There's no point in retrying.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Fri, 26 Oct 2007 06:53:59 +0000 (15:53 +0900)]
libata: flush is an IO command
ATA_QCFLAG_IO is used to mark commands which are used to perform
regluar IO transfers via block layer. These commands are assumed to
be valid and taken more seriously during error handling. Cache flush
is used by regular IO path and necessary for data integrity. Mark it
with ATA_QCFLAG_IO.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Minor sata_promise cleanups:
- use C99 array initialisers in pdc_port_info[]
- add myself in the file head's Maintained by note,
since users don't always read the MAINTAINERS file
- SG/PRD bug workaround warrants driver version bump
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
--
drivers/ata/sata_promise.c | 17 +++++++++--------
1 files changed, 9 insertions(+), 8 deletions(-) Signed-off-by: Jeff Garzik <jeff@garzik.org>
sata_promise: ASIC PRD table bug workaround, take 2
Second-generation Promise SATA controllers have an ASIC bug
which can trigger if the last PRD entry is larger than 164 bytes,
resulting in intermittent errors and possible data corruption.
Work around this by replacing calls to ata_qc_prep() with a
private version that fills the PRD, checks the size of the
last entry, and if necessary splits it to avoid the bug.
Also reduce sg_tablesize by 1 to accommodate the new entry.
Tested on the second-generation SATA300 TX4 and SATA300 TX2plus,
and the first-generation PDC20378.
Thanks to Alexander Sabourenkov for verifying the bug by
studying the vendor driver, and for writing the initial patch
upon which this one is based.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
--
Changes since previous version:
* use new PDC_MAX_PRD constant to initialise sg_tablesize
Roel Kluin [Tue, 30 Oct 2007 08:11:46 +0000 (01:11 -0700)]
[WAN]: lmc_ioctl: don't return with locks held
(akpm: it's doing copy_to_user() inside spin_lock_irqsave(): this driver
appears to be beyond help).
Signed-off-by: Roel Kluin <12o3l@tiscali.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
J. Bruce Fields [Tue, 30 Oct 2007 08:07:15 +0000 (01:07 -0700)]
[SUNRPC]: fix rpc debugging
Commit baa3a2a0d24ebcf1c451bec8e5bee3d3467f4cbb, by removing initialization
of the ctl_name field, broke this conditional, preventing the display of
rpc_tasks that you previously got when turning on rpc debugging.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jean Delvare [Tue, 30 Oct 2007 07:59:25 +0000 (00:59 -0700)]
[TCP]: Saner thash_entries default with much memory.
On systems with a very large amount of memory, the heuristics in
alloc_large_system_hash() result in a very large TCP established hash
table: 16 millions of entries for a 128 GB ia64 system. This makes
reading from /proc/net/tcp pretty slow (well over a second) and as a
result netstat is slow on these machines. I know that /proc/net/tcp is
deprecated in favor of tcp_diag, however at the moment netstat only
knows of the former.
I am skeptical that such a large TCP established hash is often needed.
Just because a system has a lot of memory doesn't imply that it will
have several millions of concurrent TCP connections. Thus I believe
that we should put an arbitrary high limit to the size of the TCP
established hash by default. Users who really need a bigger hash can
always use the thash_entries boot parameter to get more.
I propose 2 millions of entries as the arbitrary high limit. This
makes /proc/net/tcp reasonably fast on the system in question (0.2 s)
while being still large enough for me to be confident that network
performance won't suffer.
This is just one way to limit the hash size, there are others; I am not
familiar enough with the TCP code to decide which is best. Thus, I
would welcome the proposals of alternatives.
[ 2 million is still too large, thus I've modified the limit in the
change to be '512 * 1024'. -DaveM ]
Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Rothwell [Tue, 30 Oct 2007 07:44:32 +0000 (00:44 -0700)]
[SUNRPC] rpc_rdma: we need to cast u64 to unsigned long long for printing
as some architectures have unsigned long for u64.
net/sunrpc/xprtrdma/rpc_rdma.c: In function 'rpcrdma_create_chunks':
net/sunrpc/xprtrdma/rpc_rdma.c:222: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64'
net/sunrpc/xprtrdma/rpc_rdma.c:234: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64'
net/sunrpc/xprtrdma/rpc_rdma.c: In function 'rpcrdma_count_chunks':
net/sunrpc/xprtrdma/rpc_rdma.c:577: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64
Noticed on PowerPC pseries_defconfig build.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
While displaying ICMP out-going statistics as Out<name> counters in
/proc/net/snmp, the memory location for ICMP in-coming statistics
was referred by mistake.
Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Acked-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 30 Oct 2007 04:28:47 +0000 (21:28 -0700)]
[NET]: Fix race between poll_napi() and net_rx_action()
netpoll_poll_lock() synchronizes the ->poll() invocation
code paths, but once we have the lock we have to make
sure that NAPI_STATE_SCHED is still set. Otherwise we
get:
while reviewing the tcp_md5-related code further i came across with
another two of these casts which you probably have missed. I don't
actually think that they impose a problem by now, but as you said we
should remove them.
Signed-off-by: Matthias M. Dellweg <2500@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
[TCP] vegas: Fix a bug in disabling slow start by gamma parameter.
TCP Vegas implementation has a bug in the process of disabling
slow-start with gamma parameter. The bug may lead to extreme
unfairness in the presence of early packet loss. See details in:
http://www.cs.caltech.edu/~weixl/technical/ns2linux/known_linux/index.html#vegas
Switch the order of "if (tp->snd_cwnd <= tp->snd_ssthresh)" statement
and "if (diff > gamma)" statement to eliminate the problem.
Signed-off-by: Xiaoliang (David) Wei <davidwei79@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Gospodarek [Mon, 29 Oct 2007 11:35:45 +0000 (04:35 -0700)]
[IPVS]: use proper timeout instead of fixed value
Instead of using the default timeout of 3 minutes, this uses the timeout
specific to the protocol used for the connection. The 3 minute timeout
seems somewhat arbitrary (though I know it is used other places in the
ipvs code) and when failing over it would be much nicer to use one of
the configured timeout values.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 30 Oct 2007 04:46:09 +0000 (21:46 -0700)]
Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (26 commits)
cpuidle: remove unused exports
acpi: remove double mention of Support for ACPI option
ACPI: use select POWER_SUPPLY for AC, BATTERY and SBS
ACPI: Battery: Allow extract string from integer
ACPI: battery: Support for non-spec name for LiIon technology
ACPI: battery: register power_supply subdevice only when battery is present
suspend: MAINTAINERS update
ACPI: update MAINTAINERS
fujitsu-laptop.c: remove dead code
cpuidle: unexport tick_nohz_get_sleep_length
ACPI: battery: Update battery information upon sysfs read.
fujitsu-laptop: make 2 functions static
ACPI: EC: fix use-after-free
ACPI: battery: remove dead code
ACPI: Fan: Drop force_power_state acpi_device option
ACPI: Fan: fan device does not need own structure
ACPI: power: don't cache power resource state
ACPI: EC: Output changes to operational mode
ACPI: EC: Add workaround for "optimized" controllers
ACPI: EC: Don't re-enable GPE for each transaction.
...
* git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
x86 boot: document for 32 bit boot protocol
remove the dead X86_REMOTE_DEBUG option
x86: merge EARLY_PRINTK options
x86: mm/discontig_32.c: make code static
x86: kernel/setup_32.c: unexport machine_id
x86 gart: rename symbols only used for the GART implementation
x86 gart: make some variables and functions static
x86 gart: rename CONFIG_IOMMU to CONFIG_GART_IOMMU
x86 gart: rename iommu.h to gart.h
x86: additional CPUID strings; fix strings for AMD-ecx
Sonic Zhang [Tue, 30 Oct 2007 03:48:42 +0000 (11:48 +0800)]
Blackfin arch: Fix bug set correct baud for spi mmc and enable SPI after DMA.
Changes:
1. The baud for spi mmc defined in board file is not used by the old
spi driver. A slower value from spi framework is used instead. In latest
bug fixing, the correct baud is use which is too high for spi MMC card.
2. SPI is enabled only after DMA is started.
3. MMC detection IRQ is set to 55.
Signed-off-by: Sonic Zhang <sonic.zhang@analog.com> Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Joerg Roedel [Wed, 24 Oct 2007 10:49:50 +0000 (12:49 +0200)]
x86 gart: rename symbols only used for the GART implementation
This patch renames the 4 symbols iommu_hole_init(), iommu_aperture,
iommu_aperture_allowed, iommu_aperture_disabled. All these symbols are only
used for the GART implementation of IOMMUs.
Joerg Roedel [Wed, 24 Oct 2007 10:49:47 +0000 (12:49 +0200)]
x86 gart: rename iommu.h to gart.h
This patch renames the include file asm-x86/iommu.h to asm-x86/gart.h to make
clear to which IOMMU implementation it belongs. The patch also adds "GART" to
the Kconfig line.
H. Peter Anvin [Fri, 26 Oct 2007 21:09:09 +0000 (14:09 -0700)]
x86: additional CPUID strings; fix strings for AMD-ecx
Additional CPUID strings (sse4_1, sse4_2, sse5, skinit, wdt); fix the
positioning of the AMD ecx strings (cr8_legacy was duplicated under
two different names, so the alignment of all the other strings were
off by one.)
Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Frans Pop [Mon, 29 Oct 2007 21:20:38 +0000 (17:20 -0400)]
acpi: remove double mention of Support for ACPI option
Current description for CONFIG_ACPI includes the word "Support" twice. One
effect of this is that in menuconfig the "--->" that indicates the presence
of sub-options will not show up unless you have a very wide console.
Signed-off-by: Frans Pop <elendil@planet.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
ACPI: use select POWER_SUPPLY for AC, BATTERY and SBS
POWER_SUPPLY is needed for AC, battery, and SBS sysfs support. Use
'select' instead of 'depends on', as it is will not be selected by anything
else, leading to confusion.
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de> Tested-by: Frans Pop <elendil@planet.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched:
sched: fix style in kernel/sched.c
sched: fix style of swap() macro in kernel/sched_fair.c
sched: report CPU usage in CFS cgroup directories
sched: move rcu_head to task_group struct
sched: fix incorrect assumption that cpu 0 exists
sched: keep utime/stime monotonic
sched: make kernel/sched.c:account_guest_time() static
First off, testing in Fedora has shown it to cause boot failures,
bisected down by Martin Ebourne, and reported by Dave Jobes. So the
commit will likely be reverted in the 2.6.23 stable kernels.
Secondly, in the 2.6.24 model, x86-64 has now grown support for
SPARSEMEM_VMEMMAP, which disables the relevant code anyway, so while the
bug is not visible any more, it's become invisible due to the code just
being irrelevant and no longer enabled on the only architecture that
this ever affected.
Reported-by: Dave Jones <davej@redhat.com> Tested-by: Martin Ebourne <fedora@ebourne.me.uk> Cc: Zou Nan hai <nanhai.zou@intel.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some machines return integer instead of expected string.
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de> Tested-by: Andrey Borzenkov <arvidjaar@mail.ru> Tested-by: Frans Pop <elendil@planet.nl> Signed-off-by: Len Brown <len.brown@intel.com>
sched: clean up code under CONFIG_FAIR_GROUP_SCHED
Introduced an assumption of the existence of CPU0 via this line
cfs_rq = tg->cfs_rq[0];
If you have no CPU0, that will be NULL. The fix seems to be just to
take whatever cfs_rq queue comes out of the for_each_possible_cpu()
loop, since they're all equally good for the destruction operation.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Mon, 29 Oct 2007 20:18:11 +0000 (21:18 +0100)]
sched: keep utime/stime monotonic
keep utime/stime monotonic.
cpustats use utime/stime as a ratio against sum_exec_runtime, as a
consequence it can happen - when the ratio changes faster than time
accumulates - that either can be appear to go backwards.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ralf Baechle [Mon, 8 Oct 2007 15:38:37 +0000 (16:38 +0100)]
[MIPS] MT: Fix bug in multithreaded kernels.
When GDB writes a breakpoint into address area of inferior process the
kernel needs to invalidate the modified memory in the inferior which
is done by calling flush_cache_page which in turns calls
r4k_flush_cache_page and local_r4k_flush_cache_page for VSMP or SMTC
kernel via r4k_on_each_cpu().
As the VSMP and SMTC SMP kernels for 34K are running on a single shared
caches it is possible to get away without interprocessor function calls.
This optimization is implemented in r4k_on_each_cpu, so
local_r4k_flush_cache_page is only ever called on the local CPU.
This is where the following code in local_r4k_flush_cache_page() strikes:
/*
* If ownes no valid ASID yet, cannot possibly have gotten
* this page into the cache.
*/
if (cpu_context(smp_processor_id(), mm) == 0)
return;
On VSMP and SMTC had a function of cpu_context() for each CPU(TC).
So in case another CPU than the CPU executing local_r4k_cache_flush_page
has not accessed the mm but one of the other CPUs has there may be data
to be flushed in the cache yet local_r4k_cache_flush_page will falsely
return leaving the I-cache inconsistent for the breakpoint.
While the issue was discovered with GDB it also exists in
local_r4k_flush_cache_range() and local_r4k_flush_cache().
Fixed by introducing a new function has_valid_asid which on MT kernels
returns true if a mm is active on any processor in the system.
This is relativly expensive since for memory acccesses in that loop
cache misses have to be assumed but it seems the most viable solution
for 2.6.23 and older -stable kernels.