/*
Copyright (c) 2015, Intel Corporation

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:

    * Redistributions of source code must retain the above copyright
      notice, this list of conditions and the following disclaimer.
    * Redistributions in binary form must reproduce the above
      copyright notice, this list of conditions and the following
      disclaimer in the documentation and/or other materials provided
      with the distribution.
    * Neither the name of Intel Corporation nor the names of its
      contributors may be used to endorse or promote products
      derived from this software without specific prior written
      permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
*/


----------------------------------------
ISx is Scalable Integer Sort Application 
----------------------------------------

* ISx is a new scalable integer sort application designed for co-design 
  in the exascale era, scalable to large numbers of nodes.

* ISx belongs to the class of bucket sort algorithms which perform an 
  all-to-all communication pattern.

* ISx is inspired by the NAS Parallel Benchmark Integer Sort and its OpenSHMEM
  implementation of University of Houston. ISx addresses identified shortcomings 
  of the NPB IS.

* ISx is a highly modular application implemented in the OpenSHMEM parallel 
  programming model and supports both strong and weak scaling studies.

* ISx uses an uniform random key distribution and guarantees load balance.  

* ISx includes a verification stage.

* ISx is not a benchmark. It does not define fixed problems that can be used 
  to rank systems. Furthermore ISx has not been optimzed for the features 
  of any particular system.

* ISx has been presented at the PGAS 2015 conference 


References:
ISx, a Scalable Integer Sort for Co-design in the Exascale Era. 
Ulf Hanebutte and Jacob Hemstad. Proc. Ninth Conf. on Partitioned Global Address Space 
Programming Models (PGAS). Washington, DC. Sep 2015. http://hpcl.seas.gwu.edu/pgas15/
http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=7306013

Information about the NAS Parallel Benchmarks may be found here:
https://www.nas.nasa.gov/publications/npb.html

The OpenSHMEM NAS Parallel Benchmarks 1.0a by the HPCTools Group University of Houston
can be downloaded at http://www.openshmem.org/site/Downloads/Examples


STRONG SCALING (isx.strong): Total number of keys are fixed and the number of keys per PE
are reduced with increasing number of PEs
 Invariants: Total number of keys, max key value
 Variable:   Number of keys per PE, Bucket width

WEAK SCALING (isx.weak): The number of keys per PE is fixed and the total number of keys
grow with increasing number of PEs
 Invariants: Number of keys per PE, max key value
 Variable:   Total Number of Keys, Bucket width 

WEAK_ISOBUCKET (isx.weak_iso): Same as WEAK except the maximum key value grows with the 
number of PEs to keep bucket width constant This option is provided in effort to 
keep the amount of time spent in the local sort per PE constant. Without this option,
the local sort time reduces with growing numbers of PEs due to a shrinking histogram 
improving cache performance.
 Invariants: Number of keys per PE, bucket width
 Variable:   Total number of keys, max key value


-------------------------------------------
Compiling and Executing the ISx Application
-------------------------------------------

Compilation options:

*Note: Update INCS and LDFLAGS at the top of the Makefile for your path info.
 
make
- Compiles with basic flags

make debug
- Compiles with all debug flags, including -DDEBUG which enables verbose debugging print statements.

make optimized
- Compiles with optimization flags, including -DNDEBUG, which disables assert statements.


The params.h file has various definitions that may be modified to change application options.

Usage: ./bin/isx.strong <total_num_keys>  <log_file>
       ./bin/isx.weak  <keys_per_pe> <log_file>
       ./bin/isx.weak_iso <keys_per_pe> <log_file>

The log file stores the verbose timing information for the run. Each row of the file corresponds 
to a single PE's timing results for each component of the application, for every iteration of computation. 
Reuse of the same output file will concatenate results. 

example command lines (assuming aprun) for 2^27 keys (=134217728)

Strong:
 aprun -n 24 -N 4 ./bin/isx.strong 134217728 output_strong
 
Weak:
 aprun -n 24 -N 4 ./bin/isx.weak 134217728 output_weak
 
Weak_iso:
 Note that the iso-bucket width is specified in params.h
 aprun -n 24 -N 4 ./bin/isx.weak_iso 134217728 output_weak_iso

Note: timing measurements (see timer.c) are obtained by calls to clock_gettime
with the clk_id argument set to CLOCK_MONOTONIC. However, not all systems support this clk_id.
For such situations, clk_id should be changed to CLOCK_REALTIME, which is supported by all systems.

For backwards compatibility the params.h contains #define OPENSHMEM_COMPLIANT which can be undefined to fall back to early APIs


*NOTE*: We recommend using fpro (found under /bin/collector) initially, as it will try and set up a proper launch script.

Sample Fpro Example (Run from $ESP_ROOT/bin/collector): ./fpro -j pbs -r "R1234" -n 1 -p 2 -l 1 $ESP_ROOT/examples/SHMEM/ISx/bin/isx.strong 134217728 output_strong
Note the "-l 1".  This is important, as it dictates whether or not you are running a legacy (SHMEM) app or not (iSHMEM).
This example is a legacy app, and therefore has "-l 1" set.

Please ensure the ESP_TRACE_PATH variable points to an existing directory.
Also note you should replace with your own PBS reservation #.
