Getting Arbitrary Code Execution from fopen's 2nd Argument
Published:
Introduction
Recently I was in charge of setting problems of CODE BLUE CTF 2019 Finals. One of my problems, Wire Hetimarl was “weird” in the sense that you had to give an eye to the 2nd argument of fopen (that is, a mode like rb) for the perfect solution. How can that argument, which is seemingly and almost always useless for exploitation, be a trigger point? Here, let me show you an example.
First, let’s put the following two files under /home/user :
As you may notice, GCONV_PATH and ,ccs=payload are to blame for this incident. What are they in the first place? I guess most of you never saw them before.
According to the man page, glibc’s fopen has several extended features:
Glibc notes The GNU C library allows the following extensions for the string specified in mode:
c (since glibc 2.3.3) Do not make the open operation, or subsequent read and write operations, thread cancellation points. This flag is ignored for fdopen().
e (since glibc 2.7) Open the file with the O_CLOEXEC flag. See open(2) for more information. This flag is ignored for fdopen().
m (since glibc 2.3) Attempt to access the file using mmap(2), rather than I/O system calls (read(2), write(2)). Currently, use of mmap(2) is attempted only for a file opened for reading.
x Open the file exclusively (like the O_EXCL flag of open(2)). If the file already exists, fopen() fails, and sets errno to EEXIST. This flag is ignored for fdopen().
In addition to the above characters, fopen() and freopen() support the following syntax in mode:
,ccs=string
The given string is taken as the name of a coded character set and the stream is marked as wide-oriented. Thereafter, internal conversion functions convert I/O to and from the character set string. If the ,ccs=string syntax is not specified, then the wide- orientation of the stream is determined by the first file operation. If that operation is a wide-character operation, the stream is marked wide-oriented, and functions to convert to the coded character set are loaded.
Uh-huh? So what I have done with ,ccs=payload was just specify the coded character set for the file. But how did it go so far as to pop a shell? This time I’m gonna quote glibc’s source code:
if (__wcsmbs_named_conv (&fcts, ccs[2] == '\0' ? upstr (ccs, cs + 5) : ccs) != 0) { /* Something went wrong, we cannot load the conversion modules. This means we cannot proceed since the user explicitly asked for these. */ (void) _IO_file_close_it (fp); free (ccs); __set_errno (EINVAL); returnNULL; }
if (__gconv_find_transform (to, from, &result, &nsteps, 0) != __GCONV_OK) /* Loading the conversion step is not possible. */ returnNULL;
/* Maybe it is someday necessary to allow more than one step. Currently this is not the case since the conversions handled here are from and to INTERNAL and there always is a converted for that. It the directly following code is enabled the libio functions will have to allocate appropriate __gconv_step_data elements instead of only one. */ if (nsteps > 1) { /* We cannot handle this case. */ __gconv_close_transform (result, nsteps); result = NULL; } else *nstepsp = nsteps;
int __gconv_find_transform (constchar *toset, constchar *fromset, struct __gconv_step **handle, size_t *nsteps, int flags) { constchar *fromset_expand; constchar *toset_expand; int result;
/* Ensure that the configuration data is read. */ __gconv_load_conf ();
...
/* See whether the names are aliases. */ fromset_expand = do_lookup_alias (fromset); toset_expand = do_lookup_alias (toset);
...
result = find_derivation (toset, toset_expand, fromset, fromset_expand, handle, nsteps);
/* Release the lock. */ __libc_lock_unlock (__gconv_lock);
/* The following code is necessary since `find_derivation' will return GCONV_OK even when no derivation was found but the same request was processed before. I.e., negative results will also be cached. */ return (result == __GCONV_OK ? (*handle == NULL ? __GCONV_NOCONV : __GCONV_OK) : result); }
/* The main function: find a possible derivation from the `fromset' (either the given name or the alias) to the `toset' (again with alias). */ staticint find_derivation (constchar *toset, constchar *toset_expand, constchar *fromset, constchar *fromset_expand, struct __gconv_step **handle, size_t *nsteps) { structderivation_step *first, *current, **lastp, *solution = NULL; int best_cost_hi = INT_MAX; int best_cost_lo = INT_MAX; int result;
...
/* The task is to find a sequence of transformations, backed by the existing modules - whether builtin or dynamically loadable -, starting at `fromset' (or `fromset_expand') and ending at `toset' (or `toset_expand'), and with minimal cost. For computer scientists, this is a shortest path search in the graph where the nodes are all possible charsets and the edges are the transformations listed in __gconv_modules_db. For now we use a simple algorithm with quadratic runtime behaviour. A breadth-first search, starting at `fromset' and `fromset_expand'. The list starting at `first' contains all nodes that have been visited up to now, in the order in which they have been visited -- excluding the goal nodes `toset' and `toset_expand' which get managed in the list starting at `solution'. `current' walks through the list starting at `first' and looks which nodes are reachable from the current node, adding them to the end of the list [`first' or `solution' respectively] (if they are visited the first time) or updating them in place (if they have have already been visited). In each node of either list, cost_lo and cost_hi contain the minimum cost over any paths found up to now, starting at `fromset' or `fromset_expand', ending at that node. best_cost_lo and best_cost_hi represent the minimum over the elements of the `solution' list. */ ...
Did you grasp the situation? So, when we give a coded character set, glibc manages to provide the way of translation between the given set and the internally used set (sometimes it attempts a breadth-first search actually! pretty interesting).
In a nutshell, GCONV_PATH is an environment variable for changing the configuration of this translation mechanism:
/* First see whether we should use the cache. */ if (__gconv_load_cache () == 0) { /* Yes, we are done. */ __set_errno (save_errno); return; } ...
iconv/gconv_cache.c
1 2 3 4 5 6 7 8 9 10 11 12 13
int __gconv_load_cache (void) { int fd; structstat64st; structgconvcache_header *header;
/* We cannot use the cache if the GCONV_PATH environment variable is set. */ __gconv_path_envvar = getenv ("GCONV_PATH"); if (__gconv_path_envvar != NULL) return-1; ...
That means, if we can set GCONV_PATH as an arbitrary value, then we can forge an arbitrary path of converting coded character sets. But how does this matter? To answer this, we need to look into find_derivation deeper.
/* The main function: find a possible derivation from the `fromset' (either the given name or the alias) to the `toset' (again with alias). */ staticint find_derivation (constchar *toset, constchar *toset_expand, constchar *fromset, constchar *fromset_expand, struct __gconv_step **handle, size_t *nsteps) { ...
if (solution != NULL) { /* We really found a way to do the transformation. */
/* Choose the best solution. This is easy because we know that the solution list has at most length 2 (one for every possible goal node). */ if (solution->next != NULL) { structderivation_step *solution2 = solution->next;
/* Now build a data structure describing the transformation steps. */ result = gen_steps (solution, toset_expand ?: toset, fromset_expand ?: fromset, handle, nsteps); } ...
staticint gen_steps (struct derivation_step *best, constchar *toset, constchar *fromset, struct __gconv_step **handle, size_t *nsteps) { ... #ifndef STATIC_GCONV if (current->code->module_name[0] == '/') { /* Load the module, return handle for it. */ struct __gconv_loaded_object *shlib_handle = __gconv_find_shlib (current->code->module_name);
/* Open the gconv database if necessary. A non-negative return value means success. */ struct __gconv_loaded_object * __gconv_find_shlib (constchar *name) { ... /* Try to load the shared object if the usage count is 0. This implies that if the shared object is not loadable, the handle is NULL and the usage count > 0. */ if (found != NULL) { if (found->counter < -TRIES_BEFORE_UNLOAD) { assert (found->handle == NULL); found->handle = __libc_dlopen (found->name); if (found->handle != NULL) { found->fct = __libc_dlsym (found->handle, "gconv"); if (found->fct == NULL) { /* Argh, no conversion function. There is something wrong here. */ __gconv_release_shlib (found); found = NULL; } else { found->init_fct = __libc_dlsym (found->handle, "gconv_init"); found->end_fct = __libc_dlsym (found->handle, "gconv_end"); ...
Oh, there we can see __libc_dlopen and __libc_dlsym ! Finally we figured out that glibc heavily employs dynamic libraries in order to realize the translation of encodings, and my PoC took advantage of this mechanism.
Is this dangerous?
Not at all I guess. There are two reasons:
There is virtually no situation where attackers can take control of the 2nd argument of fopen. It should be a constant almost always.
GCONV_PATH is considered as a “dangerous” environment variable like LD_PRELOAD. Actually glibc drops it off for setuid binaries(see sysdeps/generic/unsecvars.h).
But nevertheless it is possible to abuse this mechanism perhaps, in the operations related to iconv, not with fopen. I don’t know. I set this problem just because it was interesting. Thanks.