PIPS
|
#include <stdio.h>
#include "genC.h"
#include "linear.h"
#include "ri.h"
#include "effects.h"
#include "ri-util.h"
#include "effects-util.h"
#include "misc.h"
#include "effects-generic.h"
#include "effects-simple.h"
#include "control.h"
#include "callgraph.h"
#include "pipsdbm.h"
#include "accel-util.h"
#include "resources.h"
#include "properties.h"
#include "prettyprint.h"
Go to the source code of this file.
Functions | |
static const char * | clean_prefix (const char *full_name, const char *bad_prefix) |
Return a pointer on the first char after the bad_prefix. More... | |
static const char * | get_clean_mod_name (const char *mod_name) |
Trying to get only the original function name without prefix. More... | |
string | build_outline_name (const char *base_prefix, const char *mod_name) |
Build the outline function name. More... | |
static bool | mark_loop_to_outline (const statement s) |
static void | gpu_ify_statement (statement s, int depth, const char *mod_name) |
Transform a loop nest into a GPU or accelerator-like kernel. More... | |
bool | gpu_ify (const string mod_name) |
Transform all the parallel loop nests of a module into smaller independent functions suitable for GPU-style accelerators. More... | |
Variables | |
static list | loop_nests_to_outline |
A simple phase that outlines parallel loops onto GPU. More... | |
static const char * | kernel_prefix = 0 |
These are the possibles prefixes for outline stuff, they are computed from a property and the current module name. More... | |
static const char * | wrapper_prefix = 0 |
static const char * | launcher_prefix = 0 |
static const char * | fwrapper_prefix = 0 |
string build_outline_name | ( | const char * | base_prefix, |
const char * | mod_name | ||
) |
Build the outline function name.
Warning! Do not modify this file that is automatically generated!
base_prefix | ase_prefix |
mod_name | od_name |
Definition at line 78 of file gpu-ify.c.
References build_new_top_level_module_name(), concatenate(), free(), get_bool_property(), prefix, and strdup().
Referenced by get_next_task_name(), and gpu_ify_statement().
|
static |
Return a pointer on the first char after the bad_prefix.
Definition at line 45 of file gpu-ify.c.
References full_name.
Referenced by get_clean_mod_name().
|
static |
Trying to get only the original function name without prefix.
Definition at line 59 of file gpu-ify.c.
References clean_prefix(), fwrapper_prefix, get_string_property(), kernel_prefix, launcher_prefix, and wrapper_prefix.
Referenced by gpu_ify().
Transform all the parallel loop nests of a module into smaller independent functions suitable for GPU-style accelerators.
What can be done is more detailed in gpu_ify_statement(). The various functions are generated or not according to different properties.
module_name | is the name of the module to work on. |
Outline the previous marked loop nests. First put the statements to outline in the good order:
Clean module name from prefix
mod_name | od_name |
Definition at line 347 of file gpu-ify.c.
References compute_callees(), db_get_memory_resource(), DB_PUT_MEMORY_RESOURCE, depth_of_parallel_perfect_loop_nest(), entity_name, FOREACH, gen_free_list(), gen_nreverse(), gen_null(), gen_recurse, get_clean_mod_name(), get_current_module_entity(), get_current_module_statement(), global_name_to_user_name(), gpu_ify_statement(), loop_nests_to_outline, mark_loop_to_outline(), module_statement, NIL, PIPS_PHASE_POSTLUDE, PIPS_PHASE_PRELUDE, reset_cumulated_rw_effects(), set_cumulated_rw_effects(), STATEMENT, and statement_domain.
Transform a loop nest into a GPU or accelerator-like kernel.
s | is the parallel loop-nest statement |
depth | is the number of loop in the loop nest to be taken out as the GPU iterators |
Several properties can be used to change the behviour of this function, as explained in pipsmake-rc
For example is depth = 2 and s is: for(i = 1; i <= 499; i += 1) for(j = 1; j <= 499; j += 1) save[i][j] = 0.25*(space[i-1][j]+space[i+1][j]+space[i][j-1]+space[i][j+1]);
it generates something like: [...] If the GPU_USE_LAUNCHER property is true, this kind of function is generated: void p4a_kernel_launcher_0(float_t save[501][501], float_t space[501][501]) { int i; int j; for(i = 1; i <= 499; i += 1) for(j = 1; j <= 499; j += 1)
p4a_kernel_wrapper_0(save, space, i, j); }
If the GPU_USE_WRAPPER property is true, this kind of function is generated: void p4a_kernel_wrapper_0(float_t save[501][501], float_t space[501][501], int i, int j) { To be assigned to a call to P4A_vp_0: i To be assigned to a call to P4A_vp_1: j p4a_kernel_0(save, space, i, j); }
If the GPU_USE_KERNEL property is true, this kind of function is generated: void p4a_kernel_0(float_t save[501][501], float_t space[501][501], int i, int j) { save[i][j] = 0.25*(space[i-1][j]+space[i+1][j]+space[i][j-1]+space[i][j+1]); }
Other properties modify the behaviour: GPU_USE_KERNEL_INDEPENDENT_COMPILATION_UNIT, GPU_USE_LAUNCHER_INDEPENDENT_COMPILATION_UNIT, GPU_USE_WRAPPER_INDEPENDENT_COMPILATION_UNIT, GPU_COORDINATE_INTRINSICS_FORMAT, GPU_USE_FORTRAN_WRAPPER
Look at pipsmake-rc documentation.
If we want to outline a kernel:
First outline the innermost code (the kernel itself) to avoid spoiling its memory effects if we start with the outermost code first. The kernel name with a prefix defined in the GPU_KERNEL_PREFIX property:
Do we need to insert a wrapper phase to reconstruct iteration coordinates from hardware intrinsics?
Add index initialization from GPU coordinates, in the reverse order since we use insert_comments_to_statement() to avoid furthering the first statement from its original comment:
Add a comment to know what to do later:
Map the inner loop index (numbered i) with the lower GPU coordinate (numbered depth - 1 - i)). In this way, if the code was cache-friendly, it should remain GPU-memory friendly
Build the intrinsics of this form: P4A_vp_<depth - 1 - i>
Add a comment in the form of
To be replaced with a call to P4A_vp_1: j
that may replaced by a post-processor later by
j = P4A_vp_1(); or whatever according to the target accelerator
Then outline the innermost code again (the kernel wrapper) that owns the kernel call. The kernel wrapper name with a prefix defined in the GPU_WRAPPER_PREFIX property:
Here we check if we had requested to outline a kernel previously, and we ensure that if the wrapper wasn't generated in a new compilation unit, then it should be added in the same compilation unit as the kernel. It won't be declared in the compilation unit, but if the kernel have been generated in a new compilation unit, there is no PARSED_CODE resource available and thus we can't use AddEntityToCompilationUnit()
Outline the kernel launcher with a prefix defined in the GPU_LAUNCHER_PREFIX property:
Definition at line 199 of file gpu-ify.c.
References asprintf, build_new_top_level_module_name(), build_outline_name(), c_module_p(), comment(), concatenate(), CONS, db_get_memory_resource(), DB_PUT_FILE_RESOURCE, depth, entity_user_name(), free(), fwrapper_prefix, get_bool_property(), get_current_module_entity(), get_string_property(), gpu_loop_nest_annotate_on_statement(), ifdebug, insert_comments_to_statement(), kernel_prefix, launcher_prefix, NIL, outliner(), perfectly_nested_loop_index_at_depth(), perfectly_nested_loop_to_body_at_depth(), pips_debug, print_statement(), set_bool_property(), STATEMENT, strdup(), string_undefined, string_undefined_p, and wrapper_prefix.
Referenced by gpu_ify().
An interesting loop must be parallel first...
We recurse on statements instead of loops in order to pick informations on the statement itself, such as pragmas
Since we only outline outermost loop-nest, stop digging further in this statement:
Definition at line 120 of file gpu-ify.c.
References CONS, depth_of_parallel_perfect_loop_nest(), ifdebug, loop_nests_to_outline, pips_debug, print_statement(), STATEMENT, statement_loop_p(), and statement_number.
Referenced by gpu_ify().
|
static |
Definition at line 41 of file gpu-ify.c.
Referenced by get_clean_mod_name(), and gpu_ify_statement().
|
static |
These are the possibles prefixes for outline stuff, they are computed from a property and the current module name.
Definition at line 38 of file gpu-ify.c.
Referenced by get_clean_mod_name(), and gpu_ify_statement().
|
static |
Definition at line 40 of file gpu-ify.c.
Referenced by get_clean_mod_name(), and gpu_ify_statement().
|
static |
A simple phase that outlines parallel loops onto GPU.
Ronan Store the loop nests found that meet the spec to be executed on a GPU. Use a list and not a set or hash_map to have always the same order .Ker yell@ hpc- proje ct.c om
Definition at line 32 of file gpu-ify.c.
Referenced by gpu_ify(), and mark_loop_to_outline().
|
static |
Definition at line 39 of file gpu-ify.c.
Referenced by get_clean_mod_name(), and gpu_ify_statement().