When compiling imgpipe benchmark for two-clustered machine with 16 registers, the compilation never finishes. (specifically it keeps on compiling jpeg/jcmaster.c) We stopped it manually after one day. For 32 and more registers, there is no problem, compilation ends successfully. Below you can find the fmmdump and the compilation command.
Reason for this may be that for this particular code, it may be possible, at some point, to fill all the issue width with operations which may potentially use issuewidth*3 = 24 registers. And some more registers may be needed for intercluster copy operations. At the end it may be the case that we need more than 32 registers in total. But still, by inserting extra loads and stores or by comprimising from 100% usage of issue width, compiler should be able to compile it for even less number of registers.
Should we wait more for the compiler to finish? Is there a way to derive a safe number of registers which makes sure that there won't be such infinite compile times? For example for the given configuration, Issuewidth*3 doesn't work to guess this safe number.
This is the compilation command that doesn't end.
- Code: Select all
/opt/vex/FC4/bin/cc -O3 -H3 -prefetch -DVEX_RESTRICT -DJAMMED -width 2 -fmm=auto.mm -fmmdump -c -o jpeg/jcmaster.o jpeg/jcmaster.c
This is the fmmdump of the mentioned configuration:
- Code: Select all
RES: IssueWidth 8
RES: MemLoad 8
RES: MemStore 8
RES: MemPft 1
RES: IssueWidth.0 4
RES: Alu.0 4
RES: Mpy.0 2
RES: CopySrc.0 1
RES: CopyDst.0 1
RES: Memory.0 1
RES: IssueWidth.1 4
RES: Alu.1 4
RES: Mpy.1 2
RES: CopySrc.1 1
RES: CopyDst.1 1
RES: Memory.1 1
DEL: AluR.0 0
DEL: Alu.0 0
DEL: CmpBr.0 1
DEL: CmpGr.0 0
DEL: Select.0 0
DEL: Multiply.0 1
DEL: Load.0 2
DEL: LoadLr.0 3
DEL: Store.0 0
DEL: Pft.0 0
DEL: Asm1L.0 0
DEL: Asm2L.0 0
DEL: Asm3L.0 0
DEL: Asm4L.0 0
DEL: Asm1H.0 1
DEL: Asm2H.0 1
DEL: Asm3H.0 1
DEL: Asm4H.0 1
DEL: CpGrGR.0 1
DEL: CpGrBr.0 1
DEL: CpBrGr.0 0
DEL: CpGrLr.0 2
DEL: CpLrGr.0 0
DEL: Spill.0 0
DEL: Restore.0 2
DEL: RestoreLr.0 3
DEL: AluR.1 0
DEL: Alu.1 0
DEL: CmpBr.1 1
DEL: CmpGr.1 0
DEL: Select.1 0
DEL: Multiply.1 1
DEL: Load.1 2
DEL: LoadLr.1 3
DEL: Store.1 0
DEL: Pft.1 0
DEL: Asm1L.1 0
DEL: Asm2L.1 0
DEL: Asm3L.1 0
DEL: Asm4L.1 0
DEL: Asm1H.1 1
DEL: Asm2H.1 1
DEL: Asm3H.1 1
DEL: Asm4H.1 1
DEL: CpGrGR.1 1
DEL: CpGrBr.1 1
DEL: CpBrGr.1 0
DEL: CpGrLr.1 2
DEL: CpLrGr.1 0
REG: $r0 16
REG: $b0 8
REG: $b1 8
REG: $r1 16
Thank you,
Onur