Ranking of SAS Programs

ubaid darwaish

When I got my new computer at office, I was too eager to install SAS on it. Much of my excitement was because it was a brand spanking new machine, with much better configuration(that is what I was told) than my vintage one. I opened up one of my old programs and there I hit the run button. I was talking to a colleague as the program was running, I couldn’t help but kept glancing at my screen waiting for it to finish.

Every second faded my excitement and when the excitement was zero, the program still didn’t finish:it was still running, now from zero excitement it was pushing me towards frustration building mode, every second added to my frustration. When finally it finished I was frustrated enough to quickly go through the logs and deduce that it took it 20 minutes more to finish than my dear old machine. My curiosity took over, and I had a look at the configuration of the new machine and found out:that even though it had two additional cores , it was no where in terms of memory compared to my old computer. Luckily the guy who just assembled my new machine was still around, I requested him to come over to my desk. I made him sit in my chair and showed him the two logs(from old machine and new machine). He was looking with surprise,but not at logs, he was looking at me.

Obviously those logs made no sense to him, to me he was log blind. I tried to explain that the supposedly fast(new) machine that he just assembled for me is not fast at all, Its not even equivalent to the machine that I had. He shook his head in disagreement, he said that is not possible. He told me that it has got 2 more cpu’s then what I used to have, I argued saying, but it has got less memory. He replied that won’t make a difference. When I failed to convince him, he left my desk, convinced that I am mad.

The SAS logs that are generated from SAS programs, have got a wealth of information. A SAS programmer can tell you everything you want to know about a program from its log. But then again you have to be a SAS programmer first. At times we wish that a business person or the infrastructure team (as in my case) could understand what is there in the SAS log. There is nothing wrong with their power of understanding, but its just because the log doesn’t make any sense to them, they are not able to make any sense of it.

So I decided to prove my point, I decided to make the infrastructure guy understand the SAS log. And for that to happen I had to extract only that piece of information from the SAS logs which is required, and present it to him in a way that makes sense to him. Isn’t that what we do, try to make sense out of data.

So here goes my macro that does it for you, but before diving into the code… I want to highlight one prime area of application of this macro. In many migration projects when we migrate from one platform to another, we need to establish to stakeholders that the new platform is much better than the legacy platform and it often has to be on the scale of performance. This macro ingests SAS logs and extracts the information about user-cpu-time,system-cpu-time,real-time and memory. Based on all these parameters I am assigning a rank to the program and based one the rank the same program scores on new platform(computer in my case) you can decide whether there was a boost in performance or not. THE RANK COMPARISON BETWEEN A SAME PROGRAM ON TWO DIFFERENT PLATFORMS WILL MAKE MORE SENSE… AND NOT COMPARING RANKS OF TWO DIFFERENT PROGRAMS. That was the purpose in mind, while writing this macro.

/*Fileref*/

FILENAME INDATA PIPE 'ls /home/ubaid/logs';

/*Getting all file names for fileref with file count*/

DATA FILE_LIST;
LENGTH FNAME $100.;
INFILE INDATA TRUNCOVER;
INPUT FNAME $100.;
CALL SYMPUT('NO_OF_FILES',_N_);
RUN;

/*Macro for Ranking Starts*/

%macro Ranking_Model;

/*Run in loop for number of files in fileref*/

%do i=1 %to &NO_OF_FILES;

/*Retrieve filenames (one at a time) */

DATA _NULL_;
    SET FILE_LIST;
IF _N_=&I;
    CALL SYMPUT('FILEIN',FNAME);
RUN;

/*Read Log for file n*/

DATA WORK.LOG;
    LENGTH
        F1               $ 336      ;
    FORMAT
        F1               $CHAR336.  ;
    INFORMAT
        F1               $CHAR336.  ;
    INFILE "/home/ubaid/logs/&filein" MISSOVER  DSD ;
    INPUT
        F1               : $CHAR336.;
RUN;


/*Observation count per file*/

DATA _NULL_;
    SET LOG;
    CALL SYMPUT('OBS',(_N_-20));
RUN;


/*Mine the data and highlight(Flag)*/

Data X;
Set LOG(FIRSTOBS=&obs.);
    if SUBSTR(F1,7,9)='real time' THEN  DO; 
real_time=SUBSTR(F1,27,15);Flag=1;      END;

    if SUBSTR(F1,7,13)='user cpu time' THEN  DO; 
user_cpu_time=SUBSTR(F1,27,15);Flag=1;       END;

    if SUBSTR(F1,7,15)='system cpu time' THEN  DO;
sys_cpu_time=SUBSTR(F1,27,15);Flag=1;          END;

    if SUBSTR(F1,7,6)='memory' THEN  DO; 
memory=SUBSTR(F1,27,15);Flag=1;      END;
RUN;

/*Skimm the highlighted(actual data) from rest(not of any use)*/

PROC SQL;
    CREATE TABLE Y AS SELECT * FROM X WHERE Flag=1;
QUIT;

/*Ranking algorithm begins*/


Data Heart;
Set Y(DROP=F1 Flag);
/*Replace 'seconds' with space*/
    real_time=TRANWRD(real_time,'seconds','');
    user_cpu_time=TRANWRD(user_cpu_time,'seconds','');
    sys_cpu_time=TRANWRD(sys_cpu_time,'seconds','');
    mem=INPUT(TRANWRD(memory,'k',''),best10.);
    
/*Extracting hours,minute,seconds*/

/*for real time*/

rt_strt=real_time;
loc_rt1=find(rt_strt,":");
if loc_rt1 >0 then do ;


_rt_x=substr(rt_strt,1,(loc_rt1-1));
rt_nxt=substr(rt_strt,(loc_rt1+1));
loc_rt2=find(rt_nxt,":");
if loc_rt2 >0 then do ;
_rt_y=substr(rt_nxt,1,(loc_rt2-1));
end;
if loc_rt1 >0 and loc_rt2>0 then do; 
_rt_s=substr(rt_nxt,(loc_rt2+1),2);
_rt_h=_rt_x;
_rt_m=_rt_y;
end;

if loc_rt1 >0 and loc_rt2=0 then do;
_rt_s=substr(rt_strt,(loc_rt1+1),2);
_rt_h=0;
_rt_m=_rt_x;
end;
end;
if loc_rt1 =0 and loc_rt2 in (0,.) then do;
_rt_s=substr(rt_strt,1,(FIND(rt_strt,".")-1));
_rt_h=0;
_rt_m=0;
end;



_rt_full=SUM((_rt_h*60*60),(_rt_m*60),(_rt_s));


/*user time*/


uct_strt=user_cpu_time;

loc_uct1=find(uct_strt,":");

if loc_uct1 >0 then do ;


_uct_x=substr(uct_strt,1,(loc_uct1-1));
uct_nxt=substr(uct_strt,(loc_uct1+1));
loc_uct2=find(uct_nxt,":");
if loc_uct2 >0 then do ;
_uct_y=substr(uct_nxt,1,(loc_uct2-1));
end;
if loc_uct1 >0 and loc_uct2>0 then do; 
_uct_s=substr(uct_nxt,(loc_uct2+1),2);
_uct_h=_uct_x;
_uct_m=_uct_y;
end;
if loc_uct1 >0 and loc_uct2=0 then do;
_uct_s=substr(uct_strt,(loc_uct1+1),2);
_uct_h=0;
_uct_m=_uct_x;
end;
end;
if loc_uct1 =0 and loc_uct2 in (0,.) then do;
_uct_s=substr(uct_strt,1,(FIND(uct_strt,".")-1));
_uct_h=0;
_uct_m=0;

end;

_uct_full=SUM((_uct_h*60*60),(_uct_m*60),(_uct_s));





/*system time*/


sct_strt=sys_cpu_time;

loc_sct1=find(sct_strt,":");

if loc_sct1 >0 then do ;


_sct_x=substr(sct_strt,1,(loc_sct1-1));
sct_nxt=substr(sct_strt,(loc_sct1+1));
loc_sct2=find(sct_nxt,":");
if loc_sct2 >0 then do ;
_sct_y=substr(sct_nxt,1,(loc_sct2-1));
end;
if loc_sct1 >0 and loc_sct2>0 then do; 
_sct_s=substr(sct_nxt,(loc_sct2+1),2);
_sct_h=_sct_x;
_sct_m=_sct_y;
end;
if loc_sct1 >0 and loc_sct2=0 then do;
_sct_s=substr(sct_strt,(loc_sct1+1),2);
_sct_h=0;
_sct_m=_sct_x;
end;
end;

if loc_sct1 =0 and loc_sct2 in (0,.) then do;
_sct_s=substr(sct_strt,1,(FIND(sct_strt,".")-1));
_sct_h=0;
_sct_m=0;
end;

_sct_full=SUM((_sct_h*60*60)+(_sct_m*60)+(_sct_s));

run;

/*heart ends*/


DATA A;
    mem_sum=.;
    ret_sum=.;
    uct_sum=.;
    sct_sum=.;
RUN;


PROC SQL;
    UPDATE A SET mem_sum=(SELECT SUM(mem) FROM HEART);
    UPDATE A SET ret_sum=(SELECT SUM(_rt_full) FROM HEART)+1;
    UPDATE A SET uct_sum=(SELECT SUM(_uct_full) FROM HEART)+1;
    UPDATE A SET sct_sum=(SELECT SUM(_sct_full) FROM HEART)+1;
QUIT;

/*The ranking here onwards is absolutely hypothetical and for demonstration only*/
DATA B;
SET A;
    mem_=10**(length(COMPRESS(PUT(mem_sum,$10.)))-2);
    ret_=10**(length(COMPRESS(PUT(ret_sum,$10.)))-2);
    uct_=10**(length(COMPRESS(PUT(uct_sum,$10.)))-2);
    sct_=10**(length(COMPRESS(PUT(sct_sum,$10.)))-2);
RUN;

DATA C(KEEP=mem_rank ret_rank uct_rank sct_rank o_rank Program);
SET B;
    mem_rank=round((mem_sum/mem_)/10);
    ret_rank=round(ret_sum/ret_)-9;
    uct_rank=round(uct_sum/uct_)-9;
    sct_rank=round(sct_sum/sct_)-9;
    o_rank=round((sum(mem_rank,ret_rank,uct_rank,sct_rank)/4));
    Program=UPCASE(SUBSTR("&filein.",1,(LENGTH("&filein.")-4)));
RUN;



/*Ranking algorithm ends*/

/*Create final Ranking Data Set*/

                        %If &i=1 %THEN %DO;
            DATA RANKING_MODEL;
            RETAIN Program;
            SET C;
            RUN;
                        %END;
/*If exists append*/

                                %ELSE %DO;
                                            PROC APPEND BASE=RANKING_MODEL DATA=C;
                                            RUN;
                                %END;
/*Macro ends*/


%end;


/*Sort the final data set*/

PROC SORT DATA=RANKING_MODEL(RENAME=(Program=Program_Nm));
BY o_rank;
RUN;

/*Suffix for the filename*/

Data _NULL_;
call symput('DT',put(datetime(),datetime16.));
run;

 

/*Create report in PDF*/

OPTIONS NONUMBER;

ODS PDF FILE="/home/ubaid/Ranking_Report_&DT..pdf" DPI=100 COLOR=GRAY STYLE=SASWEB;

FOOTNOTE1 "Source Code/Algorithm by Ubaid Darwaish.";
FOOTNOTE2 "Generated (&_SASSERVERNAME, &SYSSCPL) on %TRIM(%QSYSFUNC(DATE(), NLDATE20.)) at %TRIM(%SYSFUNC(TIME(), TIMEAMPM12.))";

PROC REPORT DATA=RANKING_MODEL NOWD ;
COLUMN Program_Nm ret_rank sct_rank uct_rank mem_rank o_rank;
DEFINE o_rank/DISPLAY;

                    COMPUTE o_rank;

/*Conditional background formatting*/

if o_rank <=35 then call define(_row_,"style","style={background=green}");
else if 35< o_rank <=40 then call define(_row_,"style","style={background=orange}");
else if 40< o_rank then call define(_row_,"style","style={background=red}");

                    ENDCOMP;

/*LOGO inclusion*/

                    COMPUTE BEFORE _PAGE_ / LEFT  

   STYLE=[PREIMAGE='/home/ubaid/ud.png'  
          FONT_WEIGHT=BOLD 
          FONT_SIZE=5 
          FOREGROUND=cx993300]; 
   LINE "Rank Program, ©."; 

                    ENDCOMP; 
RUN;


Footnote1;
Footnote2;
ODS PDF CLOSE;


/*Program ends*/


%MEND;

Sample Output of the Program:-

P.S:-
I was hunting for real-time information only, so the logic to decide on user-cpu-time system-cpu-time, memory and overall rankis absolutely for demonstration purposes. The actual aim is to extract the information, and to present it. You might very well choose to display the information as is without assigning any ranks; that will also suffice. This macro is using a hypothetical logic to assign a rank based on variables like memory, user-cpu-time,system-cpu-time and overall rank.

Contributed by: Ubaid Darwaish