Mark Fasheh [Fri, 7 Sep 2007 21:46:51 +0000 (14:46 -0700)]
ocfs2: Write support for inline data
This fixes up write, truncate, mmap, and RESVSP/UNRESVP to understand inline
inode data.
For the most part, the changes to the core write code can be relied on to do
the heavy lifting. Any code calling ocfs2_write_begin (including shared
writeable mmap) can count on it doing the right thing with respect to
growing inline data to an extent tree.
Size reducing truncates, including UNRESVP can simply zero that portion of
the inode block being removed. Size increasing truncatesm, including RESVP
have to be a little bit smarter and grow the inode to an extent tree if
necessary.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
Mark Fasheh [Fri, 7 Sep 2007 21:05:51 +0000 (14:05 -0700)]
ocfs2: Read support for inline data
This hooks up ocfs2_readpage() to populate a page with data from an inode
block. Direct IO reads from inline data are modified to fall back to
buffered I/O. Appropriate checks are also placed in the extent map code to
avoid reading an extent list when inline data might be stored.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
Mark Fasheh [Fri, 7 Sep 2007 20:58:15 +0000 (13:58 -0700)]
ocfs2: Structure updates for inline data
Add the disk, network and memory structures needed to support data in inode.
Struct ocfs2_inline_data is defined and embedded in ocfs2_dinode for storing
inline data.
A new inode field, i_dyn_features, is added to facilitate tracking of
dynamic inode state. Since it will be used often, we want to mirror it on
ocfs2_inode_info, and transfer it via the meta data lvb.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
Mark Fasheh [Thu, 13 Sep 2007 23:29:01 +0000 (16:29 -0700)]
ocfs2: Cleanup dirent size check
The check to see if a new dirent would fit in an old one is pretty ugly, and
it's done at least twice. Clean things up by putting this in it's own
easier-to-read function.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
Mark Fasheh [Wed, 12 Sep 2007 00:21:56 +0000 (17:21 -0700)]
ocfs2: Rename cleanups
ocfs2_rename() does direct manipulation of the dirent it's gotten back from
a directory search. Wrap this manipulation inside of a function so that we
can transparently change directory update behavior in the future. As an
added bonus, this gets rid of an ugly macro.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
Mark Fasheh [Tue, 11 Sep 2007 22:22:06 +0000 (15:22 -0700)]
ocfs2: Provide convenience function for ino lookup
A couple paths which needed to just match a parent dir + name pair to an
inode number were a bit messy because they had to deal with
ocfs2_find_files_on_disk() which returns a larger number of values. Provide
a convenience function, ocfs2_lookup_ino_from_name() which internalizes all
the extra accounting.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
Mark Fasheh [Wed, 12 Sep 2007 18:19:00 +0000 (11:19 -0700)]
ocfs2: Implement ocfs2_empty_dir() as a caller of ocfs2_dir_foreach()
We can preserve the behavior of ocfs2_empty_dir(), while getting rid of the
open coded directory walk by just providing a smart filldir callback. This
also automatically gets to use the dir readahead code, though in this case
any advantage is minor at best.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
Mark Fasheh [Tue, 11 Sep 2007 00:30:26 +0000 (17:30 -0700)]
ocfs2: Pass raw u64 to filldir
filldir_t can take this, so don't turn de->inode into a 32 bit value. Right
now this doesn't make a difference since no ocfs2 inodes overflow that, but
it could be a nasty surprise later on if some kernel code is calling
ocfs2_dir_foreach_blk() and expecting real inode numbers back...
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
Mark Fasheh [Sat, 8 Sep 2007 01:21:26 +0000 (18:21 -0700)]
ocfs2: Move directory manipulation code into dir.c
The code for adding, removing, deleting directory entries was splattered all
over namei.c. I'd rather have this all centralized, so that it's easier to
make changes for inline dir data, and eventually indexed directories.
None of the code in any of the functions was changed. I only removed the
static keyword from some prototypes so that they could be exported.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
Mark Fasheh [Fri, 7 Sep 2007 21:20:45 +0000 (14:20 -0700)]
ocfs2: Small refactor of truncate zeroing code
We'll want to reuse most of this when pushing inline data back out to an
extent. Keeping this part as a seperate patch helps to keep the upcoming
changes for write support uncluttered.
The core portion of ocfs2_zero_cluster_pages() responsible for making sure a
page is mapped and properly dirtied is abstracted out into it's own
function, ocfs2_map_and_dirty_page(). Actual functionality doesn't change,
though zeroing becomes optional.
We also turn part of ocfs2_free_write_ctxt() into a common function for
unlocking and freeing a page array. This operation is very common (and
uniform) for Ocfs2 cluster sizes greater than page size, so it makes sense
to keep the code in one place.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
Mark Fasheh [Wed, 29 Aug 2007 00:13:23 +0000 (17:13 -0700)]
ocfs2: move nonsparse hole-filling into ocfs2_write_begin()
By doing this, we can remove any higher level logic which has to have
knowledge of btree functionality - any callers of ocfs2_write_begin() can
now expect it to do anything necessary to prepare the inode for new data.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
Tao Mao [Wed, 29 Aug 2007 00:22:33 +0000 (17:22 -0700)]
ocfs2: remove mostly unused field from insert structure
ocfs2_insert_type->ins_free_records was only used in one place, and was set
incorrectly in most places. We can free up some memory and lose some code by
removing this.
* Small warning fixup contributed by Andrew Mortom <akpm@linux-foundation.org>
Signed-off-by: Tao Mao <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
David S. Miller [Fri, 12 Oct 2007 05:15:08 +0000 (22:15 -0700)]
[ZLIB]: Fix external builds of zlib_inflate code.
Move zlib_inflate_blob() out into it's own source file,
infutil.c, so that things like the powerpc zImage builder
in arch/powerpc/boot/Makefile don't end up trying to
compile it.
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure and not dump reserved areas of device space.
Touching some of these causes machine check exceptions on boards
like D-Link DGE-550SX.
Coding note, used a complex switch statement rather than bitmap
because it is easier to relate the block values to the documentation
rather than looking at a encoded bitmask.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 12 Oct 2007 04:55:47 +0000 (21:55 -0700)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (408 commits)
[POWERPC] Add memchr() to the bootwrapper
[POWERPC] Implement logging of unhandled signals
[POWERPC] Add legacy serial support for OPB with flattened device tree
[POWERPC] Use 1TB segments
[POWERPC] XilinxFB: Allow fixed framebuffer base address
[POWERPC] XilinxFB: Add support for custom screen resolution
[POWERPC] XilinxFB: Use pdata to pass around framebuffer parameters
[POWERPC] PCI: Add 64-bit physical address support to setup_indirect_pci
[POWERPC] 4xx: Kilauea defconfig file
[POWERPC] 4xx: Kilauea DTS
[POWERPC] 4xx: Add AMCC Kilauea eval board support to platforms/40x
[POWERPC] 4xx: Add AMCC 405EX support to cputable.c
[POWERPC] Adjust TASK_SIZE on ppc32 systems to 3GB that are capable
[POWERPC] Use PAGE_OFFSET to tell if an address is user/kernel in SW TLB handlers
[POWERPC] 85xx: Enable FP emulation in MPC8560 ADS defconfig
[POWERPC] 85xx: Killed <asm/mpc85xx.h>
[POWERPC] 85xx: Add cpm nodes for 8541/8555 CDS
[POWERPC] 85xx: Convert mpc8560ads to the new CPM binding.
[POWERPC] mpc8272ads: Remove muram from the CPM reg property.
[POWERPC] Make clockevents work on PPC601 processors
...
Fixed up conflict in Documentation/powerpc/booting-without-of.txt manually.
[POWERPC] Add legacy serial support for OPB with flattened device tree
Currently find_legacy_serial_ports() can find no serial ports on the
OPB with flattened device tree. Thus no legacy boot console can be
initialized. Just the early udbg console works, which is initialized
with udbg_init_44x_as1 on the UART's physical address specified in
kernel config. This happens because we look for ns16750 serial
devices only and expect opb node to have a device type property. This
patch makes it look for ns16550-compatible devices and use
of_device_is_compatible() for opb in case device type is not
specified.
Signed-off-by: Valentine Barshak <vbarshak@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
Paul Mackerras [Thu, 11 Oct 2007 10:37:10 +0000 (20:37 +1000)]
[POWERPC] Use 1TB segments
This makes the kernel use 1TB segments for all kernel mappings and for
user addresses of 1TB and above, on machines which support them
(currently POWER5+, POWER6 and PA6T).
We detect that the machine supports 1TB segments by looking at the
ibm,processor-segment-sizes property in the device tree.
We don't currently use 1TB segments for user addresses < 1T, since
that would effectively prevent 32-bit processes from using huge pages
unless we also had a way to revert to using 256MB segments. That
would be possible but would involve extra complications (such as
keeping track of which segment size was used when HPTEs were inserted)
and is not addressed here.
Parts of this patch were originally written by Ben Herrenschmidt.
Grant Likely [Wed, 10 Oct 2007 18:31:51 +0000 (04:31 +1000)]
[POWERPC] XilinxFB: Add support for custom screen resolution
Some custom implementations of the xilinx fb can use resolutions other
than 640x480. This patch allows the resolution to be specified in the
device tree or the xilinx_platform_data structure.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Paul Mackerras <paulus@samba.org>
Grant Likely [Wed, 10 Oct 2007 18:31:46 +0000 (04:31 +1000)]
[POWERPC] XilinxFB: Use pdata to pass around framebuffer parameters
The call to xilinxfb_assign is getting unwieldy when adding features
to the Xilinx framebuffer driver. Change xilinxfb_assign() to accept
a pointer to a xilinxfb_platform_data structure to prepare for adding
additition configuration options.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Paul Mackerras <paulus@samba.org>
Linus Torvalds [Fri, 12 Oct 2007 02:43:13 +0000 (19:43 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (87 commits)
mlx4_core: Fix section mismatches
IPoIB: Allow setting policy to ignore multicast groups
IB/mthca: Mark error paths as unlikely() in post_srq_recv functions
IB/ipath: Minor fix to ordering of freeing and zeroing of tid pages.
IB/ipath: Remove redundant link state checks
IB/ipath: Fix IB_EVENT_PORT_ERR event
IB/ipath: Better handling of unexpected GPIO interrupts
IB/ipath: Maintain active time on all chips
IB/ipath: Fix QHT7040 serial number check
IB/ipath: Indicate a couple of chip bugs to userspace
IB/ipath: iba6110 rev4 no longer needs recv header overrun workaround
IB/ipath: Use counters in ipath_poll and cleanup interrupts in ipath_close
IB/ipath: Remove duplicate copy of LMC
IB/ipath: Add ability to set the LMC via the sysfs debugging interface
IB/ipath: Optimize completion queue entry insertion and polling
IB/ipath: Implement IB_EVENT_QP_LAST_WQE_REACHED
IB/ipath: Generate flush CQE when QP is in error state
IB/ipath: Remove redundant code
IB/ipath: Future proof eeprom checksum code (contents reading)
IB/ipath: UC RDMA WRITE with IMMEDIATE doesn't send the immediate
...
Linus Torvalds [Fri, 12 Oct 2007 02:40:14 +0000 (19:40 -0700)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (867 commits)
[SKY2]: status polling loop (post merge)
[NET]: Fix NAPI completion handling in some drivers.
[TCP]: Limit processing lost_retrans loop to work-to-do cases
[TCP]: Fix lost_retrans loop vs fastpath problems
[TCP]: No need to re-count fackets_out/sacked_out at RTO
[TCP]: Extract tcp_match_queue_to_sack from sacktag code
[TCP]: Kill almost unused variable pcount from sacktag
[TCP]: Fix mark_head_lost to ignore R-bit when trying to mark L
[TCP]: Add bytes_acked (ABC) clearing to FRTO too
[IPv6]: Update setsockopt(IPV6_MULTICAST_IF) to support RFC 3493, try2
[NETFILTER]: x_tables: add missing ip6t_modulename aliases
[NETFILTER]: nf_conntrack_tcp: fix connection reopening
[QETH]: fix qeth_main.c
[NETLINK]: fib_frontend build fixes
[IPv6]: Export userland ND options through netlink (RDNSS support)
[9P]: build fix with !CONFIG_SYSCTL
[NET]: Fix dev_put() and dev_hold() comments
[NET]: make netlink user -> kernel interface synchronious
[NET]: unify netlink kernel socket recognition
[NET]: cleanup 3rd argument in netlink_sendskb
...
Fix up conflicts manually in Documentation/feature-removal-schedule.txt
and my new least favourite crap, the "mod_devicetable" support in the
files include/linux/mod_devicetable.h and scripts/mod/file2alias.c.
(The latter files seem to be explicitly _designed_ to get conflicts when
different subsystems work with them - that have an absolutely horrid
lack of subsystem separation!)
Linus Torvalds [Fri, 12 Oct 2007 02:21:23 +0000 (19:21 -0700)]
Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: (80 commits)
[MIPS] tlbex.c: Cleanup __init usage.
[MIPS] WRPPMC serial support move to platform device
[MIPS] R1: Fix hazard barriers to make kernels work on R2 also.
[MIPS] VPE: reimplement ELF loader.
[MIPS] cleanup WRPPMC include files
[MIPS] Add BUG_ON assertion for attempt to run kernel on the wrong CPU type.
[MIPS] SMP: Use ISO C struct initializer for local structs.
[MIPS] SMP: Kill useless casts.
[MIPS] Kill num_online_cpus() loops.
[MIPS] SMP: Implement smp_call_function_mask().
[MIPS] Make facility to convert CPU types to strings generally available.
[MIPS] Convert list of CPU types from #define to enum.
[MIPS] Optimize get_unaligned / put_unaligned implementations.
[MIPS] checkfiles: Fix "need space after that ','" errors.
[MIPS] Fix "no space between function name and open parenthesis" warnings.
[MIPS] Allow hardwiring of the CPU type to a single type for optimization.
[MIPS] tlbex: Size optimize code by declaring a few functions inline.
[MIPS] pg-r4k.c: Dump the generated code
[MIPS] Cobalt: Remove cobalt_machine_power_off()
[MIPS] Cobalt: Move reset port definition to arch/mips/cobalt/reset.c
...
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (24 commits)
ide: use only ->set_pio_mode method for programming PIO modes (take 2)
sis5513: don't change UDMA settings when programming PIO
it8213/piix/slc90e66: don't change DMA settings when programming PIO
alim15x3: PIO mode setup fixes
siimage: fix ->set_pio_mode method to select PIO data transfer
cs5520: don't enable VDMA in ->speedproc
sc1200: remove redundant warning message from sc1200_tune_chipset()
ide-pmac: PIO mode setup fixes (take 3)
icside: fix ->speedproc to return on unsupported modes (take 5)
sgiioc4: use ide_tune_dma()
amd74xx/via82cxxx: use ide_tune_dma()
ide: add ide_set{_max}_pio() (take 4)
ide: Kconfig face-lift
ide: move ide_rate_filter() calls to the upper layer (take 2)
sis5513: add ->udma_filter method for chipset_family >= ATA_133
ide: mode limiting fixes for user requested speed changes
ide: add missing ide_rate_filter() calls to ->speedproc()-s
ide: call udma_filter() before resorting to the UltraDMA mask
ide: make jmicron match vendor and device class
pdc202xx_new: switch to using pci_get_slot() (take 2)
...
Linus Torvalds [Fri, 12 Oct 2007 02:20:16 +0000 (19:20 -0700)]
Merge branch 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds
* 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds:
leds: Update Cobalt Qube series front LED support
leds: Add Cobalt Raq series LEDs support
leds: Rename leds-cobalt driver
Linus Torvalds [Fri, 12 Oct 2007 02:19:50 +0000 (19:19 -0700)]
Merge branch 'for-linus' of git://git.o-hand.com/linux-rpurdie-backlight
* 'for-linus' of git://git.o-hand.com/linux-rpurdie-backlight:
backlight: Convert corgi backlight driver into a more generic driver
backlight: Add Samsung LTV350QV LCD driver
backlight: Fix cr_bllcd allocations and error paths
backlight/leds: Make two structs static
Linus Torvalds [Fri, 12 Oct 2007 02:14:22 +0000 (19:14 -0700)]
Merge branch 'block-2.6.24' of git://git.kernel.dk/data/git/linux-2.6-block
* 'block-2.6.24' of git://git.kernel.dk/data/git/linux-2.6-block: (37 commits)
[BLOCK] Fix failing compile with BLK_DEV_IO_TRACE=n
compat_ioctl: move floppy handlers to block/compat_ioctl.c
compat_ioctl: move cdrom handlers to block/compat_ioctl.c
compat_ioctl: move BLKPG handling to block/compat_ioctl.c
compat_ioctl: move hdio calls to block/compat_ioctl.c
compat_ioctl: handle blk_trace ioctls
compat_ioctl: add compat_blkdev_driver_ioctl()
compat_ioctl: move common block ioctls to compat_blkdev_ioctl
Sysace: Don't enable IRQ until after interrupt handler is registered
Sysace: sparse fixes
Sysace: Minor coding convention fixup
drivers/block/umem: use DRIVER_NAME where appropriate
drivers/block/umem: trim trailing whitespace
drivers/block/umem: minor cleanups
drivers/block/umem: use dev_printk()
drivers/block/umem: move private include away from include/linux
Sysace: Labels in C code should not be indented.
Sysace: Add of_platform_bus binding
Sysace: Move IRQ handler registration to occur after FSM is initialized
Sysace: minor rework and cleanup changes
...
Linus Torvalds [Fri, 12 Oct 2007 02:13:44 +0000 (19:13 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6:
[AVR32] Fix random segfault with preemption
[AVR32] Don't use __builtin_xchg()
[AVR32] ngw100 i2c-gpio tweaks
[AVR32] Ignore a few irrelevant syscalls
[AVR32] SMC configuration in clock cycles
[AVR32] Drop support for redundant "keepinitrd" boot-time parm.
[AVR32] Make dma_sync_*_for_cpu no-ops
[AVR32] Remove unneeded 8K alignment of .text section
[AVR32] Kill a few hardcoded constants in vmlinux.lds
[AVR32] rename vmlinux.lds
[AVR32] fix command line parsing in early_parse_fbmem
[AVR32] checkstack support
[AVR32] Wire up USBA device
[AVR32] add multidrive support for pio driver
[AVR32] /sys/kernel/debug/at32ap_clk
[AVR32] Move AT32_PM_BASE definition into pm.h
Linus Torvalds [Fri, 12 Oct 2007 02:11:51 +0000 (19:11 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/blackfin-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/blackfin-2.6: (74 commits)
Blackfin serial driver: pending a unique anomaly id, tie the break flood issue to ANOMALY_05000230
blackfin enable arbitary speed serial setting
Blackfin arch: Remove cruft - CONFIG_DEBUG_SERIAL_EARLY_INIT and DEBUG_KERNEL_START
Blackfin arch: fix typo in register name
Blackfin arch: trim the Blackfin arch MAINTAINERS list
Blackfin arch: fix bug libstdc++ calling writev with an iovec containing { NULL, 0 } fails on Blackfin
Blackfin arch: Export strcpy - occasionally get module link failures otherwise
Blackfin arch: the load address is not safe to point to as a workaround for ANOMALY 05000281
Blackfin arch: show_mem can not be marked as init, since it is called during OOM condition
Blackfin arch: flush/inv the correct range when using write back cache and fix bugs find by dmacopy
Blackfin arch: update kgdb patch
Blackfin arch: Comply with revised Anomaly Workarounds for BF533 05000311 and BF561 05000323
Blackfin arch: Print out debug info, as early as possible
Blackfin arch: Enable earlyprintk earlier - so any error after our interrupt tables are set up will print out
Blackfin arch: fix endless loop bug when a double fault happens
Blackfin arch: Initial patch to add earlyprintk support
Blackfin arch: add TWIx_REGBASE and SPIx_REGBASE to specific CPU header files, use the new REGBASE for board platform resources
Blackfin arch: modify the insX/outsX and dma_insX/dma_outsX to be compatible with other archs
Blackfin arch: add more common defines for output sections
Blackfin arch: cleanup IO and DMA_IO API function definitions according to other arches
...
David S. Miller [Fri, 12 Oct 2007 01:08:29 +0000 (18:08 -0700)]
[NET]: Fix NAPI completion handling in some drivers.
In order for the list handling in net_rx_action() to be
correct, drivers must follow certain rules as stated by
this comment in net_rx_action():
/* Drivers must not modify the NAPI state if they
* consume the entire weight. In such cases this code
* still "owns" the NAPI instance and therefore can
* move the instance around on the list at-will.
*/
A few drivers do not do this because they mix the budget checks
with reading hardware state, resulting in crashes like the one
reported by takano@axe-inc.co.jp.
BNX2 and TG3 are taken care of here, SKY2 fix is from Stephen
Hemminger.
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Fri, 12 Oct 2007 00:36:13 +0000 (17:36 -0700)]
[TCP]: Limit processing lost_retrans loop to work-to-do cases
This addition of lost_retrans_low to tcp_sock might be
unnecessary, it's not clear how often lost_retrans worker is
executed when there wasn't work to do.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Fri, 12 Oct 2007 00:35:41 +0000 (17:35 -0700)]
[TCP]: Fix lost_retrans loop vs fastpath problems
Detection implemented with lost_retrans must work also when
fastpath is taken, yet most of the queue is skipped including
(very likely) those retransmitted skb's we're interested in.
This problem appeared when the hints got added, which removed
a need to always walk over the whole write queue head.
Therefore decicion for the lost_retrans worker loop entry must
be separated from the sacktag processing more than it was
necessary before.
It turns out to be problematic to optimize the worker loop
very heavily because ack_seqs of skb may have a number of
discontinuity points. Maybe similar approach as currently is
implemented could be attempted but that's becoming more and
more complex because the trend is towards less skb walking
in sacktag marker. Trying a simple work until all rexmitted
skbs heve been processed approach.
Maybe after(highest_sack_end_seq, tp->high_seq) checking is not
sufficiently accurate and causes entry too often in no-work-to-do
cases. Since that's not known, I've separated solution to that
from this patch.
Noticed because of report against a related problem from TAKANO
Ryousei <takano@axe-inc.co.jp>. He also provided a patch to
that part of the problem. This patch includes solution to it
(though this patch has to use somewhat different placement).
TAKANO's description and patch is available here:
...In short, TAKANO's problem is that end_seq the loop is using
not necessarily the largest SACK block's end_seq because the
current ACK may still have higher SACK blocks which are later
by the loop.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Fri, 12 Oct 2007 00:34:57 +0000 (17:34 -0700)]
[TCP]: No need to re-count fackets_out/sacked_out at RTO
Both sacked_out and fackets_out are directly known from how
parameter. Since fackets_out is accurate, there's no need for
recounting (sacked_out was previously unnecessarily counted
in the loop anyway).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Fri, 12 Oct 2007 00:33:11 +0000 (17:33 -0700)]
[TCP]: Fix mark_head_lost to ignore R-bit when trying to mark L
This condition (plain R) can arise at least in recovery that
is triggered after tcp_undo_loss. There isn't any reason why
they should not be marked as lost, not marking makes in_flight
estimator to return too large values.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Fri, 12 Oct 2007 00:32:31 +0000 (17:32 -0700)]
[TCP]: Add bytes_acked (ABC) clearing to FRTO too
I was reading tcp_enter_loss while looking for Cedric's bug and
noticed bytes_acked adjustment is missing from FRTO side.
Since bytes_acked will only be used in tcp_cong_avoid, I think
it's safe to assume RTO would be spurious. During FRTO cwnd
will be not controlled by tcp_cong_avoid and if FRTO calls for
conventional recovery, cwnd is adjusted and the result of wrong
assumption is cleared from bytes_acked. If RTO was in fact
spurious, we did normal ABC already and can continue without
any additional adjustments.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
[MIPS] R1: Fix hazard barriers to make kernels work on R2 also.
Tested with Malta; inflates malta_defconfig by 3932 bytes. Ideally there
should be additional configuration to allow getting rid of this overhead
but that would be too much complexity at this stage of the release cycle.
Ralf Baechle [Wed, 10 Oct 2007 12:33:03 +0000 (13:33 +0100)]
[MIPS] VPE: reimplement ELF loader.
Loading ELF binaries based on the section table is totally wrong. This
still leaves the other fat bug of referencing symbols in an executable
unfixed, so people better don't run strip on their binaries ...
As added bonus the new loader is also 23 lines shorter.
On MP configurations it's highly dubious what this code will actually
affect since blasting away cachelines may or may not do the right
thing wrt. cache coherency.
[MIPS] Don't abort the build process if '-msym32' isn't supported
-msym32 and previously the strategy to tell the compiler to generate
64-bit code but the assembler to put it into 32-bit ELF was initially
a hack to get around the lack of proper 64-bit binutils support and
later turned into a neat optimization with significant code size
savings. But it's really just an optimization so there is nothing
wrong with just dropping the option (and whatever else goes along with
it, I forgot all the nasty details) on the floor if due to a vintage
compiler it can't be suported.
Ralf Baechle [Thu, 11 Oct 2007 22:46:09 +0000 (23:46 +0100)]
[MIPS] Dyntick support for SMTC:
The kernel currently only supports broadcasting of the timer interrupt
from a single timer, not multicasting into two multicast groups of
processors. So the implemented mechanism for SMTC works by broadcasting
the cp0 compare interrupt on VPE 0 and ignoring it on any additional VPEs.