Skip to content
Snippets Groups Projects
  1. Jul 17, 2019
    • Joel Fernandes (Google)'s avatar
      kernel/pid.c: convert struct pid count to refcount_t · f57e515a
      Joel Fernandes (Google) authored
      struct pid's count is an atomic_t field used as a refcount.  Use
      refcount_t for it which is basically atomic_t but does additional
      checking to prevent use-after-free bugs.
      
      For memory ordering, the only change is with the following:
      
       -	if ((atomic_read(&pid->count) == 1) ||
       -	     atomic_dec_and_test(&pid->count)) {
       +	if (refcount_dec_and_test(&pid->count)) {
       		kmem_cache_free(ns->pid_cachep, pid);
      
      Here the change is from: Fully ordered --> RELEASE + ACQUIRE (as per
      refcount-vs-atomic.rst) This ACQUIRE should take care of making sure the
      free happens after the refcount_dec_and_test().
      
      The above hunk also removes atomic_read() since it is not needed for the
      code to work and it is unclear how beneficial it is.  The removal lets
      refcount_dec_and_test() check for cases where get_pid() happened before
      the object was freed.
      
      Link: http://lkml.kernel.org/r/20190701183826.191936-1-joel@joelfernandes.org
      
      
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Reviewed-by: default avatarAndrea Parri <andrea.parri@amarulasolutions.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Elena Reshetova <elena.reshetova@intel.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: KJ Tsanaktsidis <ktsanaktsidis@zendesk.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f57e515a
    • Oleg Nesterov's avatar
      signal: simplify set_user_sigmask/restore_user_sigmask · b772434b
      Oleg Nesterov authored
      task->saved_sigmask and ->restore_sigmask are only used in the ret-from-
      syscall paths.  This means that set_user_sigmask() can save ->blocked in
      ->saved_sigmask and do set_restore_sigmask() to indicate that ->blocked
      was modified.
      
      This way the callers do not need 2 sigset_t's passed to set/restore and
      restore_user_sigmask() renamed to restore_saved_sigmask_unless() turns
      into the trivial helper which just calls restore_saved_sigmask().
      
      Link: http://lkml.kernel.org/r/20190606113206.GA9464@redhat.com
      
      
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Deepa Dinamani <deepa.kernel@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Eric Wong <e@80x24.org>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: David Laight <David.Laight@aculab.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b772434b
    • Elvira Khabirova's avatar
      ptrace: add PTRACE_GET_SYSCALL_INFO request · 201766a2
      Elvira Khabirova authored
      PTRACE_GET_SYSCALL_INFO is a generic ptrace API that lets ptracer obtain
      details of the syscall the tracee is blocked in.
      
      There are two reasons for a special syscall-related ptrace request.
      
      Firstly, with the current ptrace API there are cases when ptracer cannot
      retrieve necessary information about syscalls.  Some examples include:
      
       * The notorious int-0x80-from-64-bit-task issue. See [1] for details.
         In short, if a 64-bit task performs a syscall through int 0x80, its
         tracer has no reliable means to find out that the syscall was, in
         fact, a compat syscall, and misidentifies it.
      
       * Syscall-enter-stop and syscall-exit-stop look the same for the
         tracer. Common practice is to keep track of the sequence of
         ptrace-stops in order not to mix the two syscall-stops up. But it is
         not as simple as it looks; for example, strace had a (just recently
         fixed) long-standing bug where attaching strace to a tracee that is
         performing the execve system call led to the tracer identifying the
         following syscall-exit-stop as syscall-enter-stop, which messed up
         all the state tracking.
      
       * Since the introduction of commit 84d77d3f ("ptrace: Don't allow
         accessing an undumpable mm"), both PTRACE_PEEKDATA and
         process_vm_readv become unavailable when the process dumpable flag is
         cleared. On such architectures as ia64 this results in all syscall
         arguments being unavailable for the tracer.
      
      Secondly, ptracers also have to support a lot of arch-specific code for
      obtaining information about the tracee.  For some architectures, this
      requires a ptrace(PTRACE_PEEKUSER, ...) invocation for every syscall
      argument and return value.
      
      ptrace(2) man page:
      
      long ptrace(enum __ptrace_request request, pid_t pid,
                  void *addr, void *data);
      ...
      PTRACE_GET_SYSCALL_INFO
             Retrieve information about the syscall that caused the stop.
             The information is placed into the buffer pointed by "data"
             argument, which should be a pointer to a buffer of type
             "struct ptrace_syscall_info".
             The "addr" argument contains the size of the buffer pointed to
             by "data" argument (i.e., sizeof(struct ptrace_syscall_info)).
             The return value contains the number of bytes available
             to be written by the kernel.
             If the size of data to be written by the kernel exceeds the size
             specified by "addr" argument, the output is truncated.
      
      [ldv@altlinux.org: selftests/seccomp/seccomp_bpf: update for PTRACE_GET_SYSCALL_INFO]
        Link: http://lkml.kernel.org/r/20190708182904.GA12332@altlinux.org
      Link: http://lkml.kernel.org/r/20190510152842.GF28558@altlinux.org
      
      
      Signed-off-by: default avatarElvira Khabirova <lineprinter@altlinux.org>
      Co-developed-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Signed-off-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Eugene Syromyatnikov <esyr@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Greentime Hu <greentime@andestech.com>
      Cc: Helge Deller <deller@gmx.de>	[parisc]
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: kbuild test robot <lkp@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      201766a2
    • Weitao Hou's avatar
      kernel: fix typos and some coding style in comments · 65f50f25
      Weitao Hou authored
      fix lenght to length
      
      Link: http://lkml.kernel.org/r/20190521050937.4370-1-houweitaoo@gmail.com
      
      
      Signed-off-by: default avatarWeitao Hou <houweitaoo@gmail.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Colin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      65f50f25
  2. Jul 16, 2019
  3. Jul 15, 2019
  4. Jul 14, 2019
  5. Jul 13, 2019
    • Yuyang Du's avatar
      locking/lockdep: Fix lock used or unused stats error · 68d41d8c
      Yuyang Du authored
      
      The stats variable nr_unused_locks is incremented every time a new lock
      class is register and decremented when the lock is first used in
      __lock_acquire(). And after all, it is shown and checked in lockdep_stats.
      
      However, under configurations that either CONFIG_TRACE_IRQFLAGS or
      CONFIG_PROVE_LOCKING is not defined:
      
      The commit:
      
        09180651 ("locking/lockdep: Consolidate lock usage bit initialization")
      
      missed marking the LOCK_USED flag at IRQ usage initialization because
      as mark_usage() is not called. And the commit:
      
        886532ae ("locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING")
      
      further made mark_lock() not defined such that the LOCK_USED cannot be
      marked at all when the lock is first acquired.
      
      As a result, we fix this by not showing and checking the stats under such
      configurations for lockdep_stats.
      
      Reported-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarYuyang Du <duyuyang@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: arnd@arndb.de
      Cc: frederic@kernel.org
      Link: https://lkml.kernel.org/r/20190709101522.9117-1-duyuyang@gmail.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      68d41d8c
    • Peter Zijlstra's avatar
      sched/core: Fix preempt warning in ttwu · e3d85487
      Peter Zijlstra authored
      
      John reported a DEBUG_PREEMPT warning caused by commit:
      
        aacedf26 ("sched/core: Optimize try_to_wake_up() for local wakeups")
      
      I overlooked that ttwu_stat() requires preemption disabled.
      
      Reported-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Tested-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: aacedf26 ("sched/core: Optimize try_to_wake_up() for local wakeups")
      Link: https://lkml.kernel.org/r/20190710105736.GK3402@hirez.programming.kicks-ass.net
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e3d85487
    • Alexander Shishkin's avatar
      perf/core: Fix exclusive events' grouping · 8a58ddae
      Alexander Shishkin authored
      
      So far, we tried to disallow grouping exclusive events for the fear of
      complications they would cause with moving between contexts. Specifically,
      moving a software group to a hardware context would violate the exclusivity
      rules if both groups contain matching exclusive events.
      
      This attempt was, however, unsuccessful: the check that we have in the
      perf_event_open() syscall is both wrong (looks at wrong PMU) and
      insufficient (group leader may still be exclusive), as can be illustrated
      by running:
      
        $ perf record -e '{intel_pt//,cycles}' uname
        $ perf record -e '{cycles,intel_pt//}' uname
      
      ultimately successfully.
      
      Furthermore, we are completely free to trigger the exclusivity violation
      by:
      
         perf -e '{cycles,intel_pt//}' -e '{intel_pt//,instructions}'
      
      even though the helpful perf record will not allow that, the ABI will.
      
      The warning later in the perf_event_open() path will also not trigger, because
      it's also wrong.
      
      Fix all this by validating the original group before moving, getting rid
      of broken safeguards and placing a useful one to perf_install_in_context().
      
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: mathieu.poirier@linaro.org
      Cc: will.deacon@arm.com
      Fixes: bed5b25a ("perf: Add a pmu capability for "exclusive" events")
      Link: https://lkml.kernel.org/r/20190701110755.24646-1-alexander.shishkin@linux.intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8a58ddae
    • Peter Zijlstra's avatar
      perf/core: Fix race between close() and fork() · 1cf8dfe8
      Peter Zijlstra authored
      
      Syzcaller reported the following Use-after-Free bug:
      
      	close()						clone()
      
      							  copy_process()
      							    perf_event_init_task()
      							      perf_event_init_context()
      							        mutex_lock(parent_ctx->mutex)
      								inherit_task_group()
      								  inherit_group()
      								    inherit_event()
      								      mutex_lock(event->child_mutex)
      								      // expose event on child list
      								      list_add_tail()
      								      mutex_unlock(event->child_mutex)
      							        mutex_unlock(parent_ctx->mutex)
      
      							    ...
      							    goto bad_fork_*
      
      							  bad_fork_cleanup_perf:
      							    perf_event_free_task()
      
      	  perf_release()
      	    perf_event_release_kernel()
      	      list_for_each_entry()
      		mutex_lock(ctx->mutex)
      		mutex_lock(event->child_mutex)
      		// event is from the failing inherit
      		// on the other CPU
      		perf_remove_from_context()
      		list_move()
      		mutex_unlock(event->child_mutex)
      		mutex_unlock(ctx->mutex)
      
      							      mutex_lock(ctx->mutex)
      							      list_for_each_entry_safe()
      							        // event already stolen
      							      mutex_unlock(ctx->mutex)
      
      							    delayed_free_task()
      							      free_task()
      
      	     list_for_each_entry_safe()
      	       list_del()
      	       free_event()
      	         _free_event()
      		   // and so event->hw.target
      		   // is the already freed failed clone()
      		   if (event->hw.target)
      		     put_task_struct(event->hw.target)
      		       // WHOOPSIE, already quite dead
      
      Which puts the lie to the the comment on perf_event_free_task():
      'unexposed, unused context' not so much.
      
      Which is a 'fun' confluence of fail; copy_process() doing an
      unconditional free_task() and not respecting refcounts, and perf having
      creative locking. In particular:
      
        82d94856 ("perf/core: Fix lock inversion between perf,trace,cpuhp")
      
      seems to have overlooked this 'fun' parade.
      
      Solve it by using the fact that detached events still have a reference
      count on their (previous) context. With this perf_event_free_task()
      can detect when events have escaped and wait for their destruction.
      
      Debugged-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Reported-by: default avatar <syzbot+a24c397a29ad22d86c98@syzkaller.appspotmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: Mark Rutland's avatarMark Rutland <mark.rutland@arm.com>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 82d94856 ("perf/core: Fix lock inversion between perf,trace,cpuhp")
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1cf8dfe8
  6. Jul 12, 2019
  7. Jul 10, 2019
  8. Jul 09, 2019
  9. Jul 08, 2019
  10. Jul 07, 2019
  11. Jul 06, 2019
  12. Jul 04, 2019
    • Jann Horn's avatar
      ptrace: Fix ->ptracer_cred handling for PTRACE_TRACEME · 6994eefb
      Jann Horn authored
      
      Fix two issues:
      
      When called for PTRACE_TRACEME, ptrace_link() would obtain an RCU
      reference to the parent's objective credentials, then give that pointer
      to get_cred().  However, the object lifetime rules for things like
      struct cred do not permit unconditionally turning an RCU reference into
      a stable reference.
      
      PTRACE_TRACEME records the parent's credentials as if the parent was
      acting as the subject, but that's not the case.  If a malicious
      unprivileged child uses PTRACE_TRACEME and the parent is privileged, and
      at a later point, the parent process becomes attacker-controlled
      (because it drops privileges and calls execve()), the attacker ends up
      with control over two processes with a privileged ptrace relationship,
      which can be abused to ptrace a suid binary and obtain root privileges.
      
      Fix both of these by always recording the credentials of the process
      that is requesting the creation of the ptrace relationship:
      current_cred() can't change under us, and current is the proper subject
      for access control.
      
      This change is theoretically userspace-visible, but I am not aware of
      any code that it will actually break.
      
      Fixes: 64b875f7 ("ptrace: Capture the ptracer's creds not PT_PTRACE_CAP")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6994eefb
  13. Jul 03, 2019
Loading