SAS Automation like any other automation can be used to solve numerous business problems. In this post we will focus on SAS migration, using SAS automation. They say, “Pain is inevitable but suffering is optional”. Every migration from one SAS platform to another comes with its due share of pain. There is no one who is immune to that pain, be it a SAS User , a SAS Administrator or somebody in Business expecting the reports; they all feel it at one point or the other. But how much of a pain one needs to endure can be drastically reduced, if the migration is carried out the right way. It can be reduced to an extent where the pain itself becomes optional.
When we migrate from one SAS Platform to another (from one SAS Server to another), we need to migrate the SAS content and configuration from an old platform to new platform. SAS provides us with two main utilities to make the process easier i.e. SAS Migration Utility and SAS Deployment Wizard. We can use these to migrate the metadata content, configuration files, ACTs etc.
But that is only half of the picture, the other part of the puzzle is where it gets complicated and huge migration projects are kicked off. That part is the migration of the user data, folders, files, programs, projects etc. from old platform to another. This where we can utilize the power of SAS automation.
The SAS users typically have got their own allocated space on the server that they use for storing their datasets, files, reports, programs etc. (mostly SAN/NAS). While migrating from one SAS platform to another these objects need to be migrated as well. The migration can never be complete unless these objects successfully land on the new server. Now to manually migrate these contents is not just hectic but error prone, that is where SAS automation comes to rescue.
The utility that we will create will use SAS automation to address exactly that, it will enable us to migrate the objects in an automated fashion from one server to another, creating audit books and copybooks during the process. It will automate the migration process using SAS.
Some of the items that we will be able to migrate using this utility include: SAS Datasets, Programs, Projects, Folders,Zipped Files, Zipped datasets, Text files, Excel Files, CSV files, Logs, Reports etc.
We will be using SAS Connect technology to move the SAS datasets between two servers. SAS/CONNECT software is a SAS client/server toolset that establishes connections between two computers with different operating systems. You need to ensure that the encryption algorithm between the Source and Target Server is compatible. You can read more about SAS Connect here: http://support.sas.com/documentation/cdl/en/connref/69581/HTML/default/p12j8b6w2nuxpon15xu0fqsisfkk.htm
Let us outline some of the key areas we need to focus on, and discuss how they will be addressed by this macro. Also we will discuss at a very high level how this macro works.
One needs to specify two paths to this macro. The path at source server (from where to migrate) and the path at Target Server (where we are migrating to). Once the source folder has been specified the macro will do the listing of all the contents within that folder. It will split the contents into buckets like: – Folders, Files, Datasets, and Zipped Datasets.
Folder mapping: – We assume that inside the path specified at source there will be a standard folder structure , let’s call that Level 1 folder structure (code,data,extra,logs,output). Similarly inside the path specified at Target there will be a Level 1 folder structure(programs,input,own,logs,output,reports). Now the folders (Level 1) in old server and new server might not be same. So we need to make sure that we map the folders between new and old server, such that the data is migrated to correct folders. We will do that by creating a dataset wherein we will map the folders of one server with another, and then later using that dataset.
Sub Folders: – Users might have created 100’s of sub folders inside Level 1 folders. Now these folder need to be in place in the target for data migration to happen. The macro will create the shell script to create the similar folders structure inside the Level 1 folder structure at target based on the folder mapping supplied.
Sub Folders at Source:
Sub folders created at Target:
In the example above : Datafolder at source, has corresponding folder namely input at target.
Files other than datasets: – These files will include SAS programs, text files, excel sheets etc. The macro will create a shell script to copy all of these files from source folder into target folder based on the folder mapping provided.
Datasets: – All the datasets will be identified and then transported from Source to Target. What is important to note is the fact that these datasets cannot be simply copied, but need to be transported. This is because of the OS encoding needs to change from Source encoding (AIX) to target encoding (Linux) in this case.
Zipped Datasets: – Many users have their old datasets zipped. Now these datasets can neither be copied as files because they also need to be transported (OS encoding to be changed), nor directly transported using SAS connect because they don’t have .sas7bdat extension anymore. So these datasets need to be unzipped, transported and then zipped again at target. Also the unzipped datasets at source need to be removed after transport. The macro will do exactly that.
Also what it will do for datasets is that it will generate an audit book. The audit book will contain the number of observations of a dataset at source, number of observations at target, number of variables at source, number of variables at target, path at source, path at target. This is to ensure that nothing went wrong while transporting the datasets i.e. data integrity was maintained while using SAS automation as a method for migration.
Apart from that the macro will also generate the change guide to help users with the code changes. This is where one can see the real benefit of automation, to make life easy. The code change after the migration is inevitable, especially if the paths are not the same between old and new server. To help users with that process, the change guide will tell users exactly to go to which line of which program to change the proc export or libname statement.
Please note:-
Once the folders at target have been created using the shell script. We need to notify the macro that it can proceed. We will do that by dropping a trigger file at a certain location. Once the trigger file is received the macro will proceed further.
SAS automation -Migration macro code:-
/******************************************************************************** BEGIN MACRO HEADER *************************************************************************** Name: ubaid Author: Ubaid Darwaish Contact: - Created: 2016-NOV-18 Version: 3 OS: UNIX, Linux SAS: 9.2/9.4 Language: English Sections: 0) Macro Header 4) Text mining 1) Global Declaration 5) Bucket split 2) Mapping 6) Datasets 3) Listing 7) Zipped Datasets Purpose: Migrate the datasets from one SAS server to another using SAS Automation. Disclaimer: This program is provided "as-is". The user is responsible for testing the software on their platform. The user agrees that the author will not, under any circumstances, be held accountable for any damages of any type that result from using this program. Overview: The program(not macro) will take two primary arguments from the user, the source and the target. The source will be scanned for all directories and files. The directories that are present in source in addition to the default directories (as specified in the Map dataset) will be listed and also a shell script will be prepared to create all those directories in the order they exist(at source), at target.The files inside the source directory will be classified in three buckets: Datasets, Zipped Datasets and Files (SAS Programs and other ASCII files). The reason for segregation of Datasets is primarily because they need to be converted from one OS to another. Also the zipped datasets need to be unzipped first, then converted and moved across and then finally zipped again. The program will generate the shell script that can be invoked from Target server to zip all the datasets that were originally zipped at source. Batch Mode: The program can be run in batch mode, the user can use -log option to create/redirect the log. Enterprise: Using Enterprise Guide, the user must either use PROC PRINTTO to export the log to a specific location or export the log as a step in the project. Dependency: The code needs to run from the source server . Output: Data Sets: Map - The Mapping between the Source and Target Server X - Dump of the listing of Source Folder. Y - Listing of files other than those in logs folder Drop - Listings of files in the logs folder Zed - Path/Dataset name extracted from listing dump Ze - Mapped old folder names with new folder names Z - Full qualified paths with old names being replaced with new folder names Folder - List of user created folders(not default)/ Script to create folders File - List of all files other than folders A - List of SAS datasets B - List of files that are not datasets C - List of all datasets with extensions stripped Zip - List of Zipped datasets Datasets - List of unzipped datasets Source_QC - Audit at Source S_Quality_Check - Consolidated audit at source Target_QC - Audit at Target T_Quality_Check - Consolidated audit at target Source_QC_Z - Audit at Source for zipped dataset S_Quality_Check_Z -Consolidated audit at source for zipped dataset Target_QC_Z - Audit at Target for zipped dataset T_Quality_Check_Z -Consolidated audit at target for zipped dataset Zip_Target -Script to zip datasets at target that were originally zipped Files: ud_list - Listing of files and folders in the source path ud_for_folders - Script to create directories at target ud_quality_check_s - Audit log at Source ud_quality_check_t - Audit log at Target ud_zip_target - Script to Zip datasets /* Variable Reference: Macro Var in order of occurence: rhost - Remote host, full qualified name along with the port no for SAS/Connect (The target/destination server eg: xyz.abc.com 7550) userid - Username for remote server passwrd - Password for remote server s_path - Path at source (Folder that needs to be migrated) expo_path - Path for scripts (Directory Creation/Zip) at Source t_path - Path at target (Folder where contents needs to be migrated) lib_s - Path at Source where from dataset is being copied lib_t - Path at Target where dataset is being copied to ds_name - Dataset that is being moved n1 - No of datasets n3 - No of zipped datasets source - Source path of zipped dataset target - Target path of zipped dataset file_source- Zipped dataset name file_target- Zipped dataset name without zip extension ds_name - Zipped dataset pure name staging - Fodler at source to temporary hold unzipped data rem - Full qualified path and dataset name that needs to be removed as it was unzipped (C)aution: The user needs to have full read/write permissions at both source and target.Apart from files a directory will also be created to hold zipped datasets temporarily while they are being transported * Comment: Check for errors before continuing. ; Arguments: The Macro takes no arguments as all the variables are captured globally. References ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ Revisions ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ Date Author Comments ¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯ YYYY-MM-DD III Please use this format and insert new entries above. ******************************************************************************* END MACRO HEADER *****************************************************************************/; %put %sysfunc(getoption(work)); options symbolgen; /*Target Server details for SAS Connect*/ %let rhost=abc.xyz.mycomapny.com 17551; /*Your user id*/ %let userid=uabid; /*Your password*/ %let passwrd=Pass123; OPTIONS REMOTE=rhost COMAMID=TCP ; SIGNON rhost user=&userid passwd="&passwrd"; /*Path on Source Server i.e. Source directory*/ /*End the statement with the forward slash i.e. ../xyz/*/ %Let s_Path=/home/ubaid/sas/; /*Path on Target Server i.e. Target Path*/ /*End the statement with the forward slash i.e. ../xyz/*/ %Let t_path=/sasdata/myserver/work/ubaid/; /*Level1 Mapping between two server folders*/ Data Map; length SSource STarget $15; input SSource STarget; infile datalines dlm=','; datalines; abt,projects export,xcomout sascode,jobs sasmacros,macros mart,mart saslookups,lookups sasformats,formats results,results staging,staging landing,landing NA,error NA,logs NA,lost+found NA,override extra,own NA,schedule archive,scripts NA,trigger ; run; /*Macro Starts*/ %macro ubaid; /*Retrieve color coding*/ %macro dummy;%mend dummy; /*Generate Folder for Migration stuff*/ X "mkdir &s_Path/extra/ud_4_migration"; /*Path for exporting scripts'*/ %Let expo_path=&s_path/extra/ud_4_migration; /*Getting the listing*/ DATA _NULL_; X "find &s_PATH -ls >&expo_path/ud_list.txt"; run; /*Import the file*/ DATA X; LENGTH PATH $300; INFILE "&expo_path/ud_list.txt" truncover; INPUT PATH $300.; RUN; /*Extracting Path information*/ DATA ZED(KEEP=DATASET_NAME SOURCE_PATH TP L1_FOLD); SET X; DATASET_NAME=REVERSE(SUBSTR(REVERSE(PATH),1,FIND(REVERSE(PATH),'/')-1)); _SOURCE_PATH=SUBSTR(PATH,FIND(TRIM(PATH),'/')); SOURCE_PATH=REVERSE(SUBSTR(REVERSE(_SOURCE_PATH),FIND(REVERSE(_SOURCE_PATH),'/')+1)) ; L1=SUBSTR(SOURCE_PATH,LENGTH("&s_path")+1,LENGTH(SOURCE_PATH)); L1_FOLD=TRANWRD(SUBSTR(L1,1,FIND(L1,"/")),"/",""); TP="&t_path"||SUBSTR(SOURCE_PATH,LENGTH("&s_path")+1,LENGTH(SOURCE_PATH)); RUN; PROC SQL; CREATE TABLE ZE AS SELECT A.*,B.STarget as T_FOLD FROM ZED A LEFT JOIN MAP B ON A.L1_FOLD=B.SSource; QUIT; DATA Z(Keep=DATASET_NAME SOURCE_PATH TARGET_PATH); SET ZE; IF L1_FOLD^='' and T_FOLD='' then T_FOLD="own"; L3=SUBSTR(TP,(LENGTH("&t_path")+LENGTH(L1_FOLD)+2)); TARGET_PATH=TRANWRD("&t_path"||"/"||trim(T_FOLD)||"/"||trim(l3),"/ /",""); run; /*SEPARATE OUT THE FOLDERS AND FILES*/ DATA FOLDER FILE; SET Z; IF FIND(DATASET_NAME,'.')=0 THEN OUTPUT FOLDER; ELSE OUTPUT FILE; RUN; /*Sort the data*/ PROC SORT DATA=FILE; BY DATASET_NAME; RUN; /*Identify datasets and seperate them out*/ DATA A; SET FILE; IF FIND(DATASET_NAME,'.sas7bdat')>0 ; RUN; /*Sort the data for datasets*/ PROC SORT DATA=A; BY DATASET_NAME; RUN; /*Identify the files other than datasets*/ DATA B; MERGE FILE(in=a) A(in=b); BY DATASET_NAME; IF A AND NOT B; RUN; /*Subtract the extension from the dataset name*/ DATA C; LENGTH DS_NAME $50; SET A; DS_NAME=SUBSTR(DATASET_NAME,1,(FIND(DATASET_NAME,'.')-1)); RUN; /*Identifying zipped datasets*/ DATA ZIP; SET C; IF FIND(DATASET_NAME,'.sas7bdat.gz')>0; RUN; /*Remaining datasets*/ DATA DATASETS; SET C; IF FIND(DATASET_NAME,'.sas7bdat.gz')=0; RUN; DATA FOLDER; SET FOLDER; ID=LENGTH(TARGET_PATH); COMMAND="mkdir "||"'"||STRIP(TARGET_PATH)||"/"||STRIP(DATASET_NAME)||"'"; RUN; PROC SQL; DELETE from FOLDER WHERE (DATASET_NAME in (Select SServer from Map) or DATASET_NAME=''); Quit; PROC SORT DATA=FOLDER; BY ID; RUN; PROC EXPORT DATA=FOLDER(KEEP=COMMAND) OUTFILE="&expo_path/ud_for_folders.sh" DBMS=TAB REPLACE; PUTNAMES=NO; RUN; /*B contains files, DATASETS Datasets, ZIP Zipped datasets*/ /*Wait for for the trigger file ud_trigg_fld.txt*/ %let rc=1; %do %while (&rc); data _null_; rc=system("rm &expo_path/ud_trigg_fld.txt"); call symput('rc',compress(rc)); sleep=sleep(60,1); run; %put rc=&rc; %put &systime; %put Wait for for the trigger file ud_trigg_fld.txt at &expo_path; %end; %put Trrigger file recieved.Proceeding with data copy; %put &systime; *||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||* !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!*/; ********************************************************* + + + FOR DATASETS + + + *********************************************************; /* For every single dataset*/ *&&&&&&&&&*; /*No of datsets*/ DATA _NULL_; SET DATASETS; CALL SYMPUT('N1',_N_); RUN; %put NOTE: Datasets to be transported=&N1.; %do j=1 %to &n1; /*One dataset at a time, extract dataset name,Source Path,Target Path*/ DATA _NULL_; SET DATASETS; if _N_=&j; CALL SYMPUT('lib_s',TRIM(Source_Path)); CALL SYMPUT('lib_t',TRIM(Target_Path)); CALL SYMPUT('ds_name',TRIM(DS_NAME)); run; %put libs=&lib_s; %put libt=&lib_t; %put dataset=&ds_name; %put dataset no &j of &n1; /*Libname Source*/ LIBNAME source "&lib_s"; /*Quality Stats at Source*/ PROC SQL; CREATE TABLE SOURCE_QC AS SELECT MEMNAME AS DS_NAME,"&lib_s" as PATH,NOBS as NOBS_S,NVAR as NVAR_S FROM SASHELP.VTABLE WHERE LIBNAME="SOURCE" AND MEMNAME=UPCASE("&DS_NAME"); QUIT; PROC APPEND BASE=S_QUALITY_CHECK DATA=SOURCE_QC FORCE; RUN; /*SAS Connect*/ LIBNAME target "&lib_t" server=rhost; DATA target.&ds_name; SET source.&ds_name; run; PROC SQL; CREATE TABLE TARGET_QC AS SELECT MEMNAME AS DS_NAME,"&lib_t" as PATH,NOBS as NOBS_S,NVAR as NVAR_S FROM SASHELP.VTABLE WHERE LIBNAME="TARGET" AND MEMNAME=UPCASE("&DS_NAME"); QUIT; PROC APPEND BASE=T_QUALITY_CHECK DATA=TARGET_QC FORCE; RUN; %end; /*For j*/*/; PROC EXPORT DATA=S_QUALITY_CHECK OUTFILE="&expo_path/ud_quality_check_s.txt" DBMS=TAB REPLACE; PUTNAMES=YES; RUN; PROC EXPORT DATA=T_QUALITY_CHECK OUTFILE="&expo_path/ud_quality_check_t.txt" DBMS=TAB REPLACE; PUTNAMES=YES; RUN; %put NOTE: The files are avialable at &expo_path; *|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||* !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!*/; ********************************************************* + + + FOR ZIPPED DATASETS + + + *********************************************************; DATA _NULL_; SET ZIP; CALL SYMPUT('N3',_N_); RUN; %put NOTE: Zipped Files to be transferred=&N3.; /*For every single file*/ %do k=1 %to &n3; /*One file at a time, extract file name,Source Path,Target Path*/ DATA _NULL_; SET ZIP; if _N_=&k; CALL SYMPUT('source',TRIM(Source_Path)); CALL SYMPUT('target',TRIM(Target_Path)); CALL SYMPUT('file_Source',TRIM(Dataset_Name)); CALL SYMPUT('file_Target',SUBSTR(TRIM(Dataset_Name),1,(LENGTH(TRIM(Dataset_Name) )-3))); CALL SYMPUT('ds_name',TRIM(DS_Name)); RUN; %Put Source=&source; %Put Target=⌖ %Put File_Source=&file_Source; %Put File_Target=&file_Target; %Put DS_Name=&DS_Name; %let staging=&expo_path; %PUT Unzipping &file_Source File &k of &n3; /*Unzipping the files */ DATA _NULL_; RC3=SYSTEM("gunzip -c &Source/&file_source > &staging/&file_target"); PUT RC3=; Run; %Let lib_s=&staging; %Let lib_t=⌖ %put dataset=&ds_name; %put dataset no &k of &n3; /*Libname Source*/ LIBNAME source "&lib_s"; /*Quality Stats at Source*/ PROC SQL; CREATE TABLE SOURCE_QC_Z AS SELECT MEMNAME AS DS_NAME,"&lib_s" as PATH,NOBS as NOBS_S,NVAR as NVAR_S FROM SASHELP.VTABLE WHERE LIBNAME="SOURCE" AND MEMNAME=UPCASE("&DS_NAME"); QUIT; PROC APPEND BASE=S_QUALITY_CHECK_Z DATA=SOURCE_QC_Z; RUN; /*SAS Connect*/ LIBNAME target "&lib_t" server=rhost; DATA target.&ds_name; SET source.&ds_name; run; PROC SQL; CREATE TABLE TARGET_QC_Z AS SELECT MEMNAME AS DS_NAME,"&lib_t" as PATH,NOBS as NOBS_S,NVAR as NVAR_S FROM SASHELP.VTABLE WHERE LIBNAME="TARGET" AND MEMNAME=UPCASE("&DS_NAME"); QUIT; PROC APPEND BASE=T_QUALITY_CHECK_Z DATA=TARGET_QC_Z FORCE; RUN; %let rem=&lib_s/&file_Target; %put &rem; DATA _NULL_; X "rm &rem"; run; %end; /*For k*/ /*Zip at Target*/ Data ZIP_Target(Keep=Script); SET ZIP; Script="gzip "||trim(target_path)||substr(trim(dataset_name),1, length(trim(dataset_name))-3); run; PROC EXPORT DATA=ZIP_Target OUTFILE="&expo_path/ud_zip_target.sh" DBMS=TAB REPLACE; PUTNAMES=NO; RUN; /*FOR FILES*/ /*Provide with System account details and also the IP address of Target Server*/ DATA FOR_FILES(KEEP=COMMAND); SET B; COMMAND="sudo scp "||STRIP(SOURCE_PATH)||"/"||STRIP(DATASET_NAME)||" systemaccount@127.100.10.78:"||STRIP(TARGET_PATH)||"/"||STRIP(DATASET_NAME); run; PROC EXPORT DATA=FOR_FILES OUTFILE="&expo_path/ud_for_files.sh" DBMS=CSV REPLACE; PUTNAMES=NO; RUN; /*Generate Copy book for Codes*/ DATA CP0; SET B; IF FIND(DATASET_NAME,'.sas')>0 ; RUN; /*No of files */ DATA _NULL_; SET CP0; CALL SYMPUT('N',_N_); RUN; %put NOTE: Total SAS Programs=&N.; /*For every single file*/ %do i=1 %to &n; /*Extract file name,path for each file*/ DATA _NULL_; SET CP0; if _N_=&i; CALL SYMPUT('path',TRIM(Source_Path)); CALL SYMPUT('file',TRIM(Dataset_Name)); run; %put path=&path; %put program name=&file; /*Read one program at a time*/ DATA READ; LENGTH CODE $300; INFILE "&path/&file." truncover; INPUT CODE $300.; RUN; /*Look out for paths*/ DATA TEMP; SET READ; CODE2=FIND(CODE,"/",(FIND(CODE,"/")+1)); Line_No=_N_; run; /*Final dataset-Skimming the comments and no paths*/ DATA PATHS(drop=CODE2); length path $200 program $100 code $200; retain path program line_no code; SET TEMP; IF CODE2 >0 and FIND(CODE,"/*")=0 and FIND(CODE,"*/")=0; Program="&file"; Path="&path"; run; /*Accumulate*/ PROC APPEND BASE=ALL_PATHS DATA=PATHS FORCE; RUN; /*thats it*/ %end; /*Export*/ PROC EXPORT DATA=ALL_PATHS OUTFILE="&expo_path/ud_copybook_code.txt" dbms=tab REPLACE; PUTNAMES=YES; RUN; /*Macro ends*/; %mend; /*Macro call*/ %ubaid();
Once we have successfully migrated content from one SAS server to another , its time to evaluate the performance of the new server when compared to legacy server. Following post can be of good help: https://www.thedarwaish.com/ranking-of-sas-programs-ubaid-darwaish/
Contributed by: Ubaid Darwaish