It's possible for the following set of events to happen:
cifsd calls cifs_reconnect which reconnects the socket. A userspace
process then calls cifs_negotiate_protocol to handle the NEGOTIATE and
gets a reply. But, while processing the reply, cifsd calls
cifs_reconnect again. Eventually the GlobalMid_Lock is dropped and the
reply from the earlier NEGOTIATE completes and the tcpStatus is set to
CifsGood. cifs_reconnect then goes through and closes the socket and sets the
pointer to zero, but because the status is now CifsGood, the new socket
is not created and cifs_reconnect exits with the socket pointer set to
NULL.
Fix this by only setting the tcpStatus to CifsGood if the tcpStatus is
CifsNeedNegotiate, and by making sure that generic_ip_connect is always
called at least once in cifs_reconnect.
Note that this is not a perfect fix for this issue. It's still possible
that the NEGOTIATE reply is handled after the socket has been closed and
reconnected. In that case, the socket state will look correct but it no
NEGOTIATE was performed on it be for the wrong socket. In that situation
though the server should just shut down the socket on the next attempted
send, rather than causing the oops that occurs today.
Reported-and-Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
AppArmor may do a GFP_KERNEL memory allocation with task_lock(tsk->group_leader);
held when called from security_task_setrlimit. This will only occur when the
task's current policy has been replaced, and the task's creds have not been
updated before entering the LSM security_task_setrlimit() hook.
BUG: sleeping function called from invalid context at mm/slub.c:847
in_atomic(): 1, irqs_disabled(): 0, pid: 1583, name: cupsd
2 locks held by cupsd/1583:
#0: (tasklist_lock){.+.+.+}, at: [<ffffffff8104dafa>] do_prlimit+0x61/0x189
#1: (&(&p->alloc_lock)->rlock){+.+.+.}, at: [<ffffffff8104db2d>]
do_prlimit+0x94/0x189
Pid: 1583, comm: cupsd Not tainted 3.0.0-rc2-git1 #7
Call Trace:
[<ffffffff8102ebf2>] __might_sleep+0x10d/0x112
[<ffffffff810e6f46>] slab_pre_alloc_hook.isra.49+0x2d/0x33
[<ffffffff810e7bc4>] kmem_cache_alloc+0x22/0x132
[<ffffffff8105b6e6>] prepare_creds+0x35/0xe4
[<ffffffff811c0675>] aa_replace_current_profile+0x35/0xb2
[<ffffffff811c4d2d>] aa_current_profile+0x45/0x4c
[<ffffffff811c4d4d>] apparmor_task_setrlimit+0x19/0x3a
[<ffffffff811beaa5>] security_task_setrlimit+0x11/0x13
[<ffffffff8104db6b>] do_prlimit+0xd2/0x189
[<ffffffff8104dea9>] sys_setrlimit+0x3b/0x48
[<ffffffff814062bb>] system_call_fastpath+0x16/0x1b
Signed-off-by: John Johansen <john.johansen@canonical.com> Reported-by: Miles Lane <miles.lane@gmail.com> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Full-speed isoc endpoints specify interval in exponent based form in
frames, not microframes, so we need to adjust accordingly.
NEC xHCI host controllers will return an error code of 0x11 if a full
speed isochronous endpoint is added with the Interval field set to
something less than 3 (2^3 = 8 microframes, or one frame). It is
impossible for a full speed device to have an interval smaller than one
frame.
This was always an issue in the xHCI driver, but commit dfa49c4ad120a784ef1ff0717168aa79f55a483a "USB: xhci - fix math in
xhci_get_endpoint_interval()" removed the clamping of the minimum value
in the Interval field, which revealed this bug.
This needs to be backported to stable kernels back to 2.6.31.
Reported-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Dmitry Torokhov <dtor@vmware.com> Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Some Fresco Logic hosts, including those found in the AUAU N533V laptop,
advertise MSI, but fail to actually generate MSI interrupts. Add a new
xHCI quirk to skip MSI enabling for the Fresco Logic host controllers.
Fresco Logic confirms that all chips with PCI vendor ID 0x1b73 and device
ID 0x1000, regardless of PCI revision ID, do not support MSI.
This should be backported to stable kernels as far back as 2.6.36, which
was the first kernel to support MSI on xHCI hosts.
xHCI controllers respond to a Reset Device command when the Slot is in the
Enabled/Disabled state by returning an error. This is fine on other host
controllers, but the Etron xHCI host controller returns a vendor-specific
error code that the xHCI driver doesn't understand. The xHCI driver then
gives up on device enumeration.
Instead of issuing a command that will fail, just return. This fixes the
issue with the xhci driver not working on ASRock P67 Pro/Extreme boards.
This should be backported to stable kernels as far back as 2.6.34.
It breaks some people's machines, so this will all get worked out in the
3.0 kernel release, it's not quite ready for 2.6.39 just yet.
Thanks to Maarten Lankhorst <m.b.lankhorst@gmail.com> for reporting the
issue.
Cc: Maarten Lankhorst <m.b.lankhorst@gmail.com> Cc: Jim Bos <jim876@xs4all.nl> Cc: Matthew Garrett <mjg@redhat.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Some USB mass-storage devices have bugs that cause them not to handle
the first READ(10) command they receive correctly. The Corsair
Padlock v2 returns completely bogus data for its first read (possibly
it returns the data in encrypted form even though the device is
supposed to be unlocked). The Feiya SD/SDHC card reader fails to
complete the first READ(10) command after it is plugged in or after a
new card is inserted, returning a status code that indicates it thinks
the command was invalid, which prevents the kernel from retrying the
read.
Since the first read of a new device or a new medium is for the
partition sector, the kernel is unable to retrieve the device's
partition table. Users have to manually issue an "hdparm -z" or
"blockdev --rereadpt" command before they can access the device.
This patch (as1470) works around the problem. It adds a new quirk
flag, US_FL_INVALID_READ10, indicating that the first READ(10) should
always be retried immediately, as should any failing READ(10) commands
(provided the preceding READ(10) command succeeded, to avoid getting
stuck in a loop). The patch also adds appropriate unusual_devs
entries containing the new flag.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Sven Geggus <sven-usbst@geggus.net> Tested-by: Paul Hartman <paul.hartman+linux@gmail.com> CC: Matthew Dharm <mdharm-usb@one-eyed-alien.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Protocol stall should not be fatal while reading port or hub status as it is
transient state. Currently hub EP0 STALL during port status read results in
failed device enumeration. This has been observed with ST-Ericsson (formerly
Philips) USB 2.0 Hub (04cc:1521) after connecting keyboard.
The funtion option_send_status times out when sending USB messages
to the interfaces 0, 1, and 2 of this UMTS stick. This results in a
5s timeout in the function causing other tty operations to feel very
sluggish.
This patch adds a blacklist entry for these 3 interfaces on the ZTE
K3765-Z device.
I was also able to reproduce the problem with v2.6.38 and v2.6.39.
This modem really wants sendsetup blacklisted for interfaces 0 and 1,
otherwise the kernel hardlocks for about 10 seconds while waiting for
the modem's firmware to respond, which it of course doesn't do.
A slight complication here is that TCT (who owns the Alcatel brand) used
the same USB IDs for the X200 as the X060s despite the devices having
completely different firmware and AT command sets, so we end up adding
the X060s to the blacklist at the same time. PSA to OEMs: don't use the
same USB IDs for different devices. Really. It makes your kittens cry.
Signed-off-by: Dan Williams <dcbw@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Unconditionally changing the address limit to USER_DS and not restoring
it to its old value in the error path is wrong because it prevents us
using kernel memory on repeated calls to this function. This, in fact,
breaks the fallback of hard coded paths to the init program from being
ever successful if the first candidate fails to load.
With this patch applied switching to USER_DS is delayed until the point
of no return is reached which makes it possible to have a multi-arch
rootfs with one arch specific init binary for each of the (hard coded)
probed paths.
Since the address limit is already set to USER_DS when start_thread()
will be invoked, this redundancy can be safely removed.
Many Linux distributions would enable vesafb in order to display
early stage boot splash. In this case, we will get garbled X
Window screen if running X fbdev on psbfb.
This is because fb0 is occupied by vesafb while psbfb is on fb1.
They tried to drive the same pieces of hardware at the same
time. With unmodified X start-up, it would try to use default
fb0 framebuffer device and unfortunately it is now broken
becaues fb1 supersedes it.
We should let psbfb takeover framebuffer control from vesafb
to get around this problem.
This adds the Nokia E7 and C7 to the list of devices in cdc-acm, allowing
the secondary ACM channel on the device to be exposed. Without this patch
the ACM driver won't claim this secondary channel as it's marked as
having a vendor-specific protocol.
Some PCIe cards ship with a PCI-PCIe bridge which is not
visible as a PCI device in Linux. But the device-id of the
bridge is present in the IOMMU tables which causes a boot
crash in the IOMMU driver.
This patch fixes by removing these cards from the IOMMU
handling. This is a pure -stable fix, a real fix to handle
this situation appriatly will follow for the next merge
window.
The driver contains several loops counting on an u16 value
where the exit-condition is checked against variables that
can have values up to 0xffff. In this case the loops will
never exit. This patch fixed 3 such loops.
Unfortunatly there are systems where the AMD IOMMU does not
cover all devices. This breaks with the current driver as it
initializes the global dma_ops variable. This patch limits
the AMD IOMMU to the devices listed in the IVRS table fixing
DMA for devices not covered by the IOMMU.
The find_next_zero_bit() is called with the from and to arguments in the
wrong order. This results in the function always returning 0, and all
media devices being registered with minor 0. Furthermore, mdev->minor is
then used before being assigned with the find_next_zero_bit() return
value. This really makes sure we'll always use minor 0.
Fix this and let the system support more than one media device.
When signing is enabled, the first session that's established on a
socket will cause a printk like this to pop:
CIFS VFS: Unexpected SMB signature
This is because the key exchange hasn't happened yet, so the signature
field is bogus. Don't try to check the signature on the socket until the
first session has been established. Also, eliminate the specific check
for SMB_COM_NEGOTIATE since this check covers that case too.
Cc: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The genirq changes are initializing descriptors for sparse IRQs quite
differently from how non-sparse (stacked?) IRQs are initialized, with
the effect that on my platform all IRQs are default-disabled on sparse
IRQs and default-enabled if non-sparse IRQs are used, crashing some
GPIO driver.
Fix this by refactoring the non-sparse IRQs to use the same descriptor
init function as the sparse IRQs.
Since fb_info is now refcounted and thus may get freed at any time it
gets unregistered module unloading will try to unregister framebuffer
as stored in platform data on probe though this pointer may
be stale.
Cleanup platform data on framebuffer release.
Signed-off-by: Bruno Prémont <bonbons@linux-vserver.org> Signed-off-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On my x86_64 system with >4GB of ram and swiotlb instead of
a hardware iommu (because I have a VIA chipset), the call
to pci_set_dma_mask (see below) with 40bits returns an error.
But it seems that the radeon driver is designed to have
need_dma32 = true exactly if pci_set_dma_mask is called
with 32 bits and false if it is called with 40 bits.
I have read somewhere that the default are 32 bits. So if the
call fails I suppose that need_dma32 should be set to true.
And indeed the patch fixes the problem I have had before
and which I had described here:
http://choon.net/forum/read.php?21,106131,115940
Acked-by: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I found this while figuring out why gnome-shell would not run on my
Asus EeeBox PC EB1007. As a standalone "pc" this device cleary does not have
an internal panel, yet it claims it does. Add a quirk to fix this.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Keith Packard <keithp@keithp.com> Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The main lock_is_held() user is lockdep_assert_held(), avoid false
assertions in lockdep_off() sections by unconditionally reporting the
lock is taken.
[ the reason this is important is a lockdep_assert_held() in ttwu()
which triggers a warning under lockdep_off() as in printk() which
can trigger another wakeup and lock up due to spinlock
recursion, as reported and heroically debugged by Arne Jansen ]
cdc_ncm 2-1.4:1.6: no reset_resume for driver cdc_ncm?
cdc_ncm 2-1.4:1.7: no reset_resume for driver cdc_ncm?
cdc_ncm 2-1.4:1.6: usb0: unregister 'cdc_ncm' usb-0000:00:1d.0-1.4, CDC NCM
This is important for the Ericsson F5521gw GSM/UMTS modem.
Otherwise modemmanager looses the fact that the cdc_ncm and cdc_acm devices
belong together.
The cdc_ether module does the same.
Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In both trigger_scan and sched_scan operations, we were checking for
the SSID length before assigning the value correctly. Since the
memory was just kzalloc'ed, the check was always failing and SSID with
over 32 characters were allowed to go through.
This was causing a buffer overflow when copying the actual SSID to the
proper place.
This bug has been there since 2.6.29-rc4.
Signed-off-by: Luciano Coelho <coelho@ti.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit 69e3cea8d5fd526 ("powerpc/smp: Make start_secondary_resume
available to all CPU variants") introduced start_secondary_resume to
misc_32.S, however it uses a 64-bit instruction which is not valid on
32-bit platforms. Use 'stw' instead.
Reported-by: Richard Cochran <richardcochran@gmail.com> Tested-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The following patch sets the MaxPayload setting to match the parent
reading when inserting a PCIE card into a hotplug slot. On our system,
the upstream bridge is set to 256, but when inserting a card, the card
setting defaults to 128. As soon as I/O is performed to the card it
starts receiving errors since the payload size is too small.
Now, uart_update_termios is empty, so it's time to remove it. We no
longer need a live tty in .dtr_rts. So this should prune all the bugs
where tty is zeroed in port->tty during tty_port_block_til_ready.
There is one thing to note. We don't set ASYNC_NORMAL_ACTIVE now. It's
because this is done already in tty_port_block_til_ready.
In .dtr_rts we do:
uart_set_mctrl(uport, TIOCM_DTR | TIOCM_RTS)
and call uart_update_termios. It does:
uart_set_mctrl(port, TIOCM_DTR | TIOCM_RTS)
once again. As the only callsite of uart_update_termios is .dtr_rts,
remove the uart_set_mctrl from uart_update_termios to not set it twice.
We should not fiddle with speed and cflags in .dtr_rts hook. Actually
we might not have tty at that moment already.
So move the console cflag copy and speed setup into uart_startup.
Actually the speed setup is already there, but we need to call it
unconditionally (uart_startup is called from uart_open with hw_init =
0).
This means we move uart_change_speed before dtr/rts setup in .dtr_rts.
But this should not matter as the setup should be called after
uart_change_speed anyway.
Before: After:
dtr/rts setup (dtr_rts) uart_change_speed (startup)
uart_change_speed (update_termios) dtr/rts setup (dtr_rts)
dtr/rts setup (update_termios) dtr/rts setup (update_termios)
The second setup will dismiss with the next patch.
Al Viro observes that in the hugetlb case, handle_mm_fault() may return
a value of the kind ENOSPC when its caller is expecting a value of the
kind VM_FAULT_SIGBUS: fix alloc_huge_page()'s failure returns.
zd1211 devices register 'EP 4 OUT' endpoint as Interrupt type on USB 2.0:
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x04 EP 4 OUT
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 1
However on USB 1.1 endpoint becomes Bulk:
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x04 EP 4 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 0
Problem here is that usb_bulk_msg() on interrupt endpoint selfcorrects the
call and changes requested pipe to interrupt type (see usb_bulk_msg).
However with usb_interrupt_msg() on bulk endpoint does not correct the
pipe type to bulk, but instead URB is submitted with interrupt type pipe.
So pre-2.6.39 used usb_bulk_msg() and therefore worked with both endpoint
types, however in 2.6.39 usb_interrupt_msg() with bulk endpoint causes
ohci_hcd to fail submitted URB instantly with -ENOSPC and preventing zd1211rw
from working with OHCI.
Fix this by detecting endpoint type and using correct endpoint/pipe types
for URB. Also fix asynchronous zd_usb_iowrite16v_async() to use right
URB type on 'EP 4 OUT'.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In some cases we can read wrong temperature value. If after that
temperature value will not be updated to good one, we badly configure
tx power parameters and device is unable to send a data.
The current temperature range check of MSR_IA32_TEMPERATURE_TARGET
seems too strict to me, some TjMax values documented in
Documentation/hwmon/coretemp wouldn't pass. Relax the check so that
all the documented values pass.
Commit a321cedb12904114e2ba5041a3673ca24deb09c9 excludes CPU models 0xe, 0xf,
0x16, and 0x1a from TjMax temperature adjustment, even though several of those
CPUs are known to have TiMax other than 100 degrees C, and even though the code
in adjust_tjmax() explicitly handles those CPUs and points to a Web document
listing several of the affected CPU IDs.
Reinstate original TjMax adjustment if TjMax can not be determined using the
IA32_TEMPERATURE_TARGET register.
https://bugzilla.kernel.org/show_bug.cgi?id=32582
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com> Cc: Huaxu Wan <huaxu.wan@linux.intel.com> Cc: Carsten Emde <C.Emde@osadl.org> Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Yong Wang <yong.y.wang@linux.intel.com> Cc: Rudolf Marek <r.marek@assembler.cz> Cc: Fenghua Yu <fenghua.yu@intel.com> Tested-by: Jean Delvare <khali@linux-fr.org> Acked-by: Jean Delvare <khali@linux-fr.org> Acked-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The ath9k driver subtracts 3 dBm to the txpower as with two radios the
signal power is doubled.
The resulting value is assigned in an u16 which overflows and makes
the card work at full power.
in two more places. I grepped the ath tree and didn't find any others.
Signed-off-by: Daniel Halperin <dhalperi@cs.washington.edu> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Whenever there is a channel width change from 40 Mhz to 20 Mhz,
the hardware is reconfigured to ht20. Meantime before doing
the rate control updation, the packets are being transmitted are
selected rate with IEEE80211_TX_RC_40_MHZ_WIDTH.
While transmitting ht40 rate packets in ht20 mode is causing
baseband panic with AR9003 based chips.
In certain circumstances, we can get an oops from a torn down device.
Most notably this is from CD roms trying to call scsi_ioctl. The root
cause of the problem is the fact that after scsi_remove_device() has
been called, the queue is fully torn down. This is actually wrong
since the queue can be used until the sdev release function is called.
Therefore, we add an extra reference to the queue which is released in
sdev->release, so the queue always exists.
The 'max_part' parameter controls the number of maximum partition
a nbd device can have. However if a user specifies very large
value it would exceed the limitation of device minor number and
can cause a kernel oops (or, at least, produce invalid device
nodes in some cases).
In addition, specifying large 'nbds_max' value causes same
problem for the same reason.
On my desktop, following command results to the kernel bug:
d4dc210f69 (block: don't block events on excl write for non-optical
devices) added dereferencing of bdev->bd_disk to test
GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE; however, bdev->bd_disk can be
%NULL if open failed which can lead to an oops.
Test the flag after testing open was successful, not before.
Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: David Miller <davem@davemloft.net> Tested-by: David Miller <davem@davemloft.net> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
UBIFS leaks memory on error path in 'ubifs_jnl_update()' in case of write
failure because it forgets to free the 'struct ubifs_dent_node *dent' object.
Although the object is small, the alignment can make it large - e.g., 2KiB
if the min. I/O unit is 2KiB.
Sometimes VM asks the shrinker to return amount of objects it can shrink,
and we return the ubifs_clean_zn_cnt in that case. However, it is possible
that this counter is negative for a short period of time, due to the way
UBIFS TNC code updates it. And I can observe the following warnings sometimes:
shrink_slab: ubifs_shrinker+0x0/0x2b7 [ubifs] negative objects to delete nr=-8541616642706119788
This patch makes sure UBIFS never returns negative count of objects.
commit c56e58537d504706954a06570b4034c04e5b7500 breaks SMP support in PPC_47x chip.
secondary_ti must be set to current thread info before callin kick_cpu or else
start_secondary_47x will jump into void when trying to return to c-code.
In the current setup secondary_ti is initialized before the CPU idle task is started
and only the boot core will start. I am not sure this is the correct solution, but it
makes SMP possible in my chip.
Note! The HOTPLUG support probably need some fixing to, There is no trampoline code
available in head_44x.S - start_secondary_resume?
The comment in domain_remove_one_dev_info() states "No need to compare
PCI domain; it has to be the same". But for the si_domain that isn't
going to be true, as it consists of all the PCI devices that are
identity mapped thus multiple PCI domains can be in si_domain. The
code needs to validate the PCI domain too.
Signed-off-by: Mike Habeck <habeck@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When using the 1:1 (identity) PCI DMA remapping, PCI Host Bridge devices
that do not use the IOMMU causes a kernel panic. Fix that by not
inserting those devices into the si_domain.
Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Mike Habeck <habeck@sgi.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The __intel_map_single function is not honoring the passed in DMA mask.
This results in not using the coherent DMA mask when called from
intel_alloc_coherent().
Signed-off-by: Mike Travis <travis@sgi.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Reviewed-by: Mike Habeck <habeck@sgi.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Mike Travis and Mike Habeck reported an issue where iova allocation
would return a range that was larger than a device's dma mask.
https://lkml.org/lkml/2011/3/29/423
The dmar initialization code will reserve all PCI MMIO regions and copy
those reservations into a domain specific iova tree. It is possible for
one of those regions to be above the dma mask of a device. It is typical
to allocate iovas with a 32bit mask (despite device's dma mask possibly
being larger) and cache the result until it exhausts the lower 32bit
address space. Freeing the iova range that is >= the last iova in the
lower 32bit range when there is still an iova above the 32bit range will
corrupt the cached iova by pointing it to a region that is above 32bit.
If that region is also larger than the device's dma mask, a subsequent
allocation will return an unusable iova and cause dma failure.
Simply don't cache an iova that is above the 32bit caching boundary.
Reported-by: Mike Travis <travis@sgi.com> Reported-by: Mike Habeck <habeck@sgi.com> Acked-by: Mike Travis <travis@sgi.com> Tested-by: Mike Habeck <habeck@sgi.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When there are a large count of PCI devices, and the pass through
option for iommu is set, much time is spent in the identity_mapping
function hunting though the iommu domains to check if a specific
device is "identity mapped".
Speed up the function by checking the cached info to see if
it's mapped to the static identity domain.
Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Mike Habeck <habeck@sgi.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The identity mapping code appears to make the assumption that if the
devices dma_mask is greater than 32bits the device can use identity
mapping. But that is not true: take the case where we have a 40bit
device in a 44bit architecture. The device can potentially receive a
physical address that it will truncate and cause incorrect addresses
to be used.
Instead check to see if the device's dma_mask is large enough
to address the system's dma_mask.
Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Mike Habeck <habeck@sgi.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit a97590e5 added unlinking domains from iommus to reciprocate the
iommu from domains unlinking that was already done. We actually want
to only do this for device domains and never for the static
identity map domain or VM domains. The SI domain is special and
never freed, while VM domain->id lives in their own special address
space, separate from iommu->domain_ids.
In the current code, a VM can get domain->id zero, then mark that
domain unused when unbound from pci-stub. This leads to DMAR
write faults when the device is re-bound to the host driver.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We typically batch unmaps to be lazily flushed out at
regular intervals. When we destroy a domain, we need
to force a flush of these lazy unmaps to be sure none
reference the domain we're about to free.
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=35062 Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When invalid parameters are passed to apparmor_setprocattr a NULL deref
oops occurs when it tries to record an audit message. This is because
it is passing NULL for the profile parameter for aa_audit. But aa_audit
now requires that the profile passed is not NULL.
Fix this by passing the current profile on the task that is trying to
setprocattr.
Signed-off-by: Kees Cook <kees@ubuntu.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In order to make lazyinit eat approx. 10% of io bandwidth at max, we
are sleeping between zeroing each single inode table. For that purpose
we are using timer which wakes up thread when it expires. It is set
via add_timer() and this may cause troubles in the case that thread
has been woken up earlier and in next iteration we call add_timer() on
still running timer hence hitting BUG_ON in add_timer(). We could fix
that by using mod_timer() instead however we can use
schedule_timeout_interruptible() for waiting and hence simplifying
things a lot.
This commit exchange the old "waiting mechanism" with simple
schedule_timeout_interruptible(), setting the time to sleep. Hence we
do not longer need li_wait_daemon waiting queue and others, so get rid
of it.
We need to take reference to the s_li_request after we take a mutex,
because it might be freed since then, hence result in accessing old
already freed memory. Also we should protect the whole
ext4_remove_li_request() because ext4_li_info might be in the process of
being freed in ext4_lazyinit_thread().
Disk event code automatically blocks events on excl write. This is
primarily to avoid issuing polling commands while burning is in
progress. This behavior doesn't fit other types of devices with
removeable media where polling commands don't have adverse side
effects and door locking usually doesn't exist.
This patch introduces new genhd flag which controls the auto-blocking
behavior and uses it to enable auto-blocking only on optical devices.
There's a race window in xen_drop_mm_ref, where remote cpu may exit
dirty bitmap between the check on this cpu and the point where remote
cpu handles drop request. So in drop_other_mm_ref we need check
whether TLB state is still lazy before calling into leave_mm. This
bug is rarely observed in earlier kernel, but exaggerated by the
commit 831d52bc153971b70e64eccfbed2b232394f22f8
("x86, mm: avoid possible bogus tlb entries by clearing prev mm_cpumask after switching mm")
which clears bitmap after changing the TLB state. the call trace is as below:
Tested-by: Maoxiaoyun<tinnycloud@hotmail.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
[v1: Fleshed out the git description a bit] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When we parse the raw E820, the Xen hypervisor can set "E820_RAM"
to "E820_UNUSABLE" if the mem=X argument is used. As such we
should _not_ consider the E820_UNUSABLE as an 1-1 identity
mapping, but instead use the same case as for E820_RAM.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
git commit 24bdb0b62cc82120924762ae6bc85afc8c3f2b26 (xen: do not create
the extra e820 region at an addr lower than 4G) does not take into
account that ifdef CONFIG_X86_32 instead of e820_end_of_low_ram_pfn()
find_low_pfn_range() is called (both calls are from arch/x86/kernel/setup.c).
find_low_pfn_range() behaves correctly and does not require change in
xen_extra_mem_start initialization. Additionally, if xen_extra_mem_start
is initialized in the same way as ifdef CONFIG_X86_64 then memory hotplug
support for Xen balloon driver (under development) is broken.
Signed-off-by: Daniel Kiper <dkiper@net-space.pl> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
.. when applicable. We need to track in the p2m_mfn and
p2m_mfn_p the MFNs and pointers, respectivly, for the P2M entries
that are allocated for the identity mappings. Without this,
a PV domain with an E820 that triggers the 1-1 mapping to kick in,
won't be able to be restored as the P2M won't have the identity
mappings.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
TI816X (common name for DM816x/C6A816x/AM389x family) devices configured
to boot as PCIe Endpoint have class code = 0. This makes kernel PCI bus
code to skip allocating BARs to these devices resulting into following
type of error when trying to enable them:
"Device 0000:01:00.0 not available because of resource collisions"
The device cannot be operated because of the above issue.
This patch adds a ID specific (TI VENDOR ID and 816X DEVICE ID based)
'early' fixup quirk to replace class code with
PCI_CLASS_MULTIMEDIA_VIDEO as class.
Currently, the call to nfs4_schedule_session_recovery() will actually just
result in a test of the lease when what we really want is to force a
session reset.
Currently, if the server returns NFS4ERR_EXPIRED in reply to a READ or
WRITE, but the RENEW test determines that the lease is still active, we
fail to recover and end up looping forever in a READ/WRITE + RENEW death
spiral.
The TCP connection state code depends on the state_change() callback
being called when the SYN_SENT state is set. However the networking layer
doesn't actually call us back in that case.
None of the latest GPUs had this hooked up, this is necessary for
correct operation in a lot of cases, however we should test this on a few
GPUs in these families as we've had problems in this area before.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On g4x, user interrupt in BSD ring is missed.
This is because though g4x and ironlake share the same bsd_ring,
their interrupt control interfaces have _two_ differences.
1.different irq enable/disable functions:
On g4x are i915_enable_irq and i915_disable_irq.
On ironlake are ironlake_enable_irq and ironlake_disable_irq.
2.different irq flag:
On g4x user interrupt flag in BSD ring on is I915_BSD_USER_INTERRUPT.
On ironlake is GT_BSD_USER_INTERRUPT
Old bsd_ring_get/put_irq call ring_get_irq and ring_get_irq.
ring_get_irq and ring_put_irq only call ironlake_enable/disable_irq.
So comes the irq miss on g4x.
To fix this, as other rings' code do, conditionally call different
functions(i915_enable/disable_irq and ironlake_enable/disable_irq)
and use different interrupt flags in bsd_ring_get/put_irq.
When finding or allocating a ram disk device, brd_probe() did not take
partition numbers into account so that it can result to a different
device. Consider following example (I set CONFIG_BLK_DEV_RAM_COUNT=4
for simplicity) :
$ sudo modprobe brd max_part=15
$ ls -l /dev/ram*
brw-rw---- 1 root disk 1, 0 2011-05-25 15:41 /dev/ram0
brw-rw---- 1 root disk 1, 16 2011-05-25 15:41 /dev/ram1
brw-rw---- 1 root disk 1, 32 2011-05-25 15:41 /dev/ram2
brw-rw---- 1 root disk 1, 48 2011-05-25 15:41 /dev/ram3
$ sudo mknod /dev/ram4 b 1 64
$ sudo dd if=/dev/zero of=/dev/ram4 bs=4k count=256
256+0 records in
256+0 records out 1048576 bytes (1.0 MB) copied, 0.00215578 s, 486 MB/s
namhyung@leonhard:linux$ ls -l /dev/ram*
brw-rw---- 1 root disk 1, 0 2011-05-25 15:41 /dev/ram0
brw-rw---- 1 root disk 1, 16 2011-05-25 15:41 /dev/ram1
brw-rw---- 1 root disk 1, 32 2011-05-25 15:41 /dev/ram2
brw-rw---- 1 root disk 1, 48 2011-05-25 15:41 /dev/ram3
brw-r--r-- 1 root root 1, 64 2011-05-25 15:45 /dev/ram4
brw-rw---- 1 root disk 1, 1024 2011-05-25 15:44 /dev/ram64
After this patch, /dev/ram4 - instead of /dev/ram64 - was
accessed correctly.
In addition, 'range' passed to blk_register_region() should
include all range of dev_t that RAMDISK_MAJOR can address.
It does not need to be limited by partition numbers unless
'rd_nr' param was specified.
The 'max_part' parameter controls the number of maximum partition
a brd device can have. However if a user specifies very large
value it would exceed the limitation of device minor number and
can cause a kernel panic (or, at least, produce invalid device
nodes in some cases).
On my desktop system, following command kills the kernel. On qemu,
it triggers similar oops but the kernel was alive:
$ sudo modprobe brd max_part=100000
BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
IP: [<ffffffff81110a9a>] sysfs_create_dir+0x2d/0xae
PGD 7af1067 PUD 7b19067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file:
CPU 0
Modules linked in: brd(+)
It's currently exposed only through /proc which, besides requiring
screen-scraping, doesn't allow userspace to distinguish between two
identical ATM adapters with different ATM indexes. The ATM device index
is required when using PPPoATM on a system with multiple ATM adapters.
Signed-off-by: Dan Williams <dcbw@redhat.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
While running fsx on tmpfs with a memhog then swapoff, swapoff was hanging
(interruptibly), repeatedly failing to locate the owner of a 0xff entry in
the swap_map.
Although shmem_writepage() does abandon when it sees incoming page index
is beyond eof, there was still a window in which shmem_truncate_range()
could come in between writepage's dropping lock and updating swap_map,
find the half-completed swap_map entry, and in trying to free it,
leave it in a state that swap_shmem_alloc() could not correct.
Arguably a bug in __swap_duplicate()'s and swap_entry_free()'s handling
of the different cases, but easiest to fix by moving swap_shmem_alloc()
under cover of the lock.
More interesting than the bug: it's been there since 2.6.33, why could
I not see it with earlier kernels? The mmotm of two weeks ago seems to
have some magic for generating races, this is just one of three I found.
With yesterday's git I first saw this in mainline, bisected in search of
that magic, but the easy reproducibility evaporated. Oh well, fix the bug.
The v6 and v7 implementations of flush_kern_dcache_area do not align
the passed MVA to the size of a cacheline in the data cache. If a
misaligned address is used, only a subset of the requested area will
be flushed. This has been observed to cause failures in SMP boot where
the secondary_data initialised by the primary CPU is not cacheline
aligned, causing the secondary CPUs to read incorrect values for their
pgd and stack pointers.
This patch ensures that the base address is cacheline aligned before
flushing the d-cache.
Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Integrity errors need to be passed to the owner of the integrity
metadata for processing. Consequently EILSEQ should be passed up the
stack.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds a check that a block device has a request function
defined before it is used. Otherwise, misconfiguration can cause an oops.
Because we are allowing devices with zero size e.g. an offline multipath
device as in commit 2cd54d9bedb79a97f014e86c0da393416b264eb3
("dm: allow offline devices") there needs to be an additional check
to ensure devices are initialised. Some block devices, like a loop
device without a backing file, exist but have no request function.
Reproducer is trivial: dm-mirror on unbound loop device
(no backing file on loop devices)
Thanks to the reviews and comments by Rafael, James, Mark and Andi.
Here's version 2 of the patch incorporating your comments and also some
update to my previous patch comments.
I noticed that before entering idle state, the menu idle governor will
look up the current pm_qos target value according to the list of qos
requests received. This look up currently needs the acquisition of a
lock to access the list of qos requests to find the qos target value,
slowing down the entrance into idle state due to contention by multiple
cpus to access this list. The contention is severe when there are a lot
of cpus waking and going into idle. For example, for a simple workload
that has 32 pair of processes ping ponging messages to each other, where
64 cpu cores are active in test system, I see the following profile with
37.82% of cpu cycles spent in contention of pm_qos_lock:
A better approach will be to cache the updated pm_qos target value so
reading it does not require lock acquisition as in the patch below.
With this patch the contention for pm_qos_lock is removed and I saw a
2.2X increase in throughput for my message passing workload.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: James Bottomley <James.Bottomley@suse.de> Acked-by: mark gross <markgross@thegnar.org> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Cpuidle menu governor is using u32 as a temporary datatype for storing
nanosecond values which wrap around at 4.294 seconds. This causes errors
in predicted sleep times resulting in higher than should be C state
selection and increased power consumption. This also breaks cpuidle
state residency statistics.
Signed-off-by: Tero Kristo <tero.kristo@nokia.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
i8k uses lahf to read the flag register in 64-bit code; early x86-64
CPUs, however, lack this instruction and we get an invalid opcode
exception at runtime.
Use pushf to load the flag register into the stack instead.
Signed-off-by: Luca Tettamanti <kronos.it@gmail.com> Reported-by: Jeff Rickman <jrickman@myamigos.us> Tested-by: Jeff Rickman <jrickman@myamigos.us> Tested-by: Harry G McGavran Jr <w5pny@arrl.net> Cc: Massimo Dal Zotto <dz@debian.org> Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Since this cred was not created with copy_creds(), it needs to get
initialized. Otherwise use of syscall(__NR_keyctl, KEYCTL_SESSION_TO_PARENT);
can lead to a NULL deref. Thanks to Robert for finding this.
But introduced by commit 47a150edc2a ("Cache user_ns in struct cred").
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com> Reported-by: Robert Święcki <robert@swiecki.net> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>