Pipelining & RISCs

Introduction to Pipelining

Pipelining and parallelism are 2 methods used to achieve concurrency.
·      -   Pipelining increases concurrency by dividing a computation into a number of steps.
·      -  Parallelism is the use of multiple resources to increase concurrency.

Pipelining is a key implementation technique used to build fast processors that can be seen in RISC architecture. It allows the execution of multiple instructions to overlap in time.
In a non-pipelined processing, by contrast, the next data/instruction is processed after the entire processing of the previous data/instruction is complete.



Pipelined Laundry: Start work ASAP
LESSONS:
  • Pipelining doesn’t help latency of single task, it helps throughput of entire workload
  • Multiple tasks operating simultaneously using different resources
  • Potential speedup = Number pipe stages
  • Pipeline rate limited by slowest pipeline stage
  • Unbalanced lengths of pipe stages reduces speedup
Instruction of Pipelining
Typical instruction execution sequences:
     fetch, decode, read, execute, write, etc
In a non-pipelined CPU, instructions are performed “one at a time”.ie. before an instruction is begun, the preceding instruction is completed.
To implement instruction pipelining, desirable features of (instruction set) IS:
o   all instructions same length
o   registers specified in same place in instruction
o   memory operands only in loads or stores, i.e. RISC

Two Stage Instruction Pipeline



Pipelining of Unequal Stages
Important for pipelining where stages are unequal:
     Always take the largest of the stage delay to be the cycle time
     No stage overlaps and latency must be constant
     Ensure that instruction overlap is the same as the cycle time else get timing diagram is wrong
Pipiline Performance
         n:number of instructions
         k: stages in pipeline
         t : cycletime (time in seconds needed to advance a set of instructions one stage through the pipeline)
         Tk: total time for pipelining
         T0 : total time without pipelining
Total time for equal stages
         For n instructions, with k equal stages and delay of t for each stage
Total time with no pipelining, T0 = nkt
Total time with pipelining, Tk = (k + (n-1))t

Speedup
         Speedup of a k-stage pipeline for n instructions :
                                                                        S         = T0 / Tk
                                                                = nk t  / ((k+(n-1)) t
                                                                 à k (for large n)
Throughput
         Pipelined Throughput= n/Tk (n)
         Non-pipeline Throughput = n/To (n)
Limits to Pipelining
         Factors that limits performance enhancement:
    Unequal duration/delay of stages
    Conditional branch instruction or interrupts. Ex:
        Instruction 3 is a conditional branch to instruction 15
        No instructions completed during time units 9-12. This is performance penalty incurred because we could not anticipate the branch
        Flushing of pipeline
        Pipelined operation cannot be maintained in the presence of branch or jump instructions.
Hazards as limitations to pipelining
         3 types of hazards:
-        Resource hazards : HW cannot support this combination of instructions (single person to fold and put clothes away, washer-drier)
-        Data hazards: Instruction depends on result of prior instruction still in the pipeline
-        Data dependencies example
                                                                A = B + C
                                                                D = E + A
                                                                C = G x H
-        Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps).
RISC: Reduced Instruction Set Computers
         Major advances in computer :
    The family concept
        Separates architecture from implementation
    Microprogrammed control unit
    Cache memory
    Solid State RAM
    Microprocessors
    Pipelining
        Introduces parallelism into fetch execute cycle
    Multiple processors
CISC and RISC
         The next step: Reduced Instruction Set Computer in processor architecture
         Key features of CISC:
    Large number of predefined instructions making high level programming languages easy to design and implement.
    Supports microprogramming to simplify computer architecture
    Key features of RISC
    Limited and simple instruction set
    Large number of general purpose registers or use of compiler technology to optimize register use.
    Emphasis on optimizing the instruction pipeline
Arguments for CISC
A rich instruction set should simplify the compiler by having instructions which match the high-level language instructions.
Since the programs are smaller in size, they have better performance



Drawbacks of CISC
-CPU complexity
-System size and cost
-Complex machine instructions may not match high-level language statements exactly, in which case they may be of little use
CISC characteristics
         Varying number of instructions per cycle
         Small number of general purpose registers
         More addressing modes
         More instruction formats : fewer instructions can be used to implement a given task
         Use microcode
         Variable length instruction
         Simplified compiler: microprogram instructions could be written to match constructs of high level languages
RISC Characteristics
         One instruction per cycle
         Register to register operations
         Few, simple addressing modes
         Few, simple instruction formats
         Hardwired design (no microcode)
         Fixed instruction format
         More compile time/effort

RISC vs. CISC