Using Metal C to Learn Assembler Techniques

By | September 16, 2013

By comparing the output from the XL C/C++ Compiler that is generated for a sample piece of C code and varying the ARCHITECTURE and TUNE compiler options, we can both learn some new assembler opcodes for newer hardware and also learn some optimization techniques.

Currently (September 2013), the ARCHITECTURE value can vary from 0 to 10 and the TUNE value can vary from 0 to 10 as well.  However, the TUNE value cannot be smaller than ARCHITECTURE.  In addition, it appears that ARCHITECTURE must be at least 5 to support the METAL option.  Also, note the link to the manuals here only shows a maximum value of 9 for each because the manual has not been updated to show support for the EC12 & BC12 systems.  The OPTIMIZE level, which can vary from 0 to 3, obviously affects the code generation as well.

So, I picked an IBM sample C program, one that would compile successfully with the METAL option and compiled it using various options putting the generated assembler source into a PDS so I could compare the results of the various compiles.

//*                                                                 
// JCLLIB ORDER=(CBC.SCCNPRC)                                       
//*                                                                 
//METALTST PROC CP='''ARCH(05) TUNE(05) OPT(0)''',MEMBER=A05T05O0   
//*                                                                 
//CC       EXEC EDCC,                                               
// INFILE=SYS1.SAMPLIB(CUNSCSMC),                                   
//  CPARM='METAL NOSEARCH SEARCH(/usr/include/metal/)',             
// CPARM2=&CP,                                                      
// CPARM3='LSEARCH(//''''SYS1.SCUNHF'''')'                          
//COMPILE.SYSLIN DD DSN=XXXXXXX.SRC.ASM(&MEMBER),                    
// DISP=SHR,DCB=BLKSIZE=0                                           
//*                                                                 
//         PEND                                                     
//*                                                                 
//CC EXEC METALTST,CP='''ARCH(5)  TUNE(5)  OPT(0)''',MEMBER=A05T05O0
//CC EXEC METALTST,CP='''ARCH(6)  TUNE(6)  OPT(0)''',MEMBER=A06T06O0
//CC EXEC METALTST,CP='''ARCH(7)  TUNE(7)  OPT(0)''',MEMBER=A07T07O0
//CC EXEC METALTST,CP='''ARCH(8)  TUNE(8)  OPT(0)''',MEMBER=A08T08O0
//CC EXEC METALTST,CP='''ARCH(9)  TUNE(9)  OPT(0)''',MEMBER=A09T09O0
//CC EXEC METALTST,CP='''ARCH(10) TUNE(10) OPT(0)''',MEMBER=A10T10O0
//CC EXEC METALTST,CP='''ARCH(10) TUNE(10) OPT(1)''',MEMBER=A10T10O1
//CC EXEC METALTST,CP='''ARCH(10) TUNE(10) OPT(2)''',MEMBER=A10T10O2
//CC EXEC METALTST,CP='''ARCH(10) TUNE(10) OPT(3)''',MEMBER=A10T10O3

Even at ARCH(5), the compiler uses instructions I didn’t learn in school such as LARL and NILL.

I noticed first of all that the compiler nicely put the C source line number into columns 74-79 of the generated assembler so it is easy to see when instructions have been reordered. Also, instructions that have been inlined seem to be marked with a plus sign in column 73. OPT(0) leaves the generated code in the same order as the C source.  As you increase the OPT value you can see how assembler code gets moved around (and changed somewhat).

For example, line 00057 generates a lot of code:

*   CUNBCPRM MyCharParm = {CUNBCPRM_DEFAULT};                            000057
         LA    14,1                                                      000057
         STY   14,4320(0,13)           MyCharParm_tagCUNBCPRM_Version    000057
         LA    14,176                                                    000057
         STY   14,4324(0,13)           MyCharParm_tagCUNBCPRM_Length     000057
         LA    15,0                                                      000057
         STY   15,4328(0,13)           MyCharParm_tagCUNBCPRM_Res1       000057
         STY   15,4332(0,13)           MyCharParm_tagCUNBCPRM_Src_Buf_PX 000057
               tr                                                        000057
         STY   15,4336(0,13)           MyCharParm_tagCUNBCPRM_Src_Buf_AX 000057
               LET                                                       000057
         STY   15,4340(0,13)           MyCharParm_tagCUNBCPRM_Src_Buf_LX 000057
               en                                                        000057
         STY   15,4344(0,13)           MyCharParm_tagCUNBCPRM_Res2       000057
         STY   15,4348(0,13)           MyCharParm_tagCUNBCPRM_Targ_Buf_X 000057
               Ptr                                                       000057
         STY   15,4352(0,13)           MyCharParm_tagCUNBCPRM_Targ_Buf_X 000057
               ALET                                                      000057
         STY   15,4356(0,13)           MyCharParm_tagCUNBCPRM_Targ_Buf_X 000057
               Len                                                       000057
         MVIY  4360(13),0                                                000057
         MVIY  4361(13),0                                                000057
         MVIY  4362(13),0                                                000057
         MVIY  4363(13),0                                                000057
         MVIY  4364(13),0                                                000057
         MVIY  4365(13),0                                                000057

Etc. Etc. Etc.  At OPT(3), this code is all over the place, both before & after the declaration, but before MyCharParm is actually used on line 000083. OPT(3) also seems to make more use of MVHI instead of MVIY.

This code movement presumably takes advantage of pipelining by executing instructions when there would otherwise be a stall for a memory reference. This instruction movement is not something you would do yourself, at least not to this extent, if you were coding the assembler yourself.

As a side note, reviewing all this code generated for statement 000057 points out how worthwhile it could be to go through this exercise.  Changing the declaration to static eliminates all of this code.

*   static CUNBCPRM MyCharParm = {CUNBCPRM_DEFAULT};                     000057

Comparing ARCH levels

Using the ISPF COMPARE command against from the source  generated at one ARCH level at OPT(0) against the previous one, the main differences I see are between ARCH(6) and ARCH(7) where SLR/IC is replaced with LLC:

and going from ARCH(7) to ARCH(8) where LA/ST is replace with MVHI:

Increasing the OPT level at ARCH(10) from 0 to 1, I see a string of bit settings LLC/OILL/STC replaced with a single OI, which is very nice:

There are quite a few other changes, but this was the most obvious.

I didn’t see any changes going to OPT(1) to OPT(2).  Going from OPT(2) to OPT(3) there was a lot more reordering, so much so that it is difficult to tell what else may have changed.

This is just a sampling of the insights I’ve gotten from this endeavor. I think this is a useful exercise to get acquainted with new assembler instruction and to see the benefit of using well maintained, up-to-date compilers and how they perform optimization.

Leave a Reply

Your email address will not be published. Required fields are marked *