• Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

Flexible Architectures in Communication Security Application

Extracts from this document...

Introduction

Master of Science in Computer Science and Engineering Thesis: Flexible Architectures in Communication Security Application Bryan Chong Advanced Computer Architecture Laboratory University of Michigan Ann Arbor, MI 48109 Table of Contents Acknowledgement............................................................................................ 5 Abstract ........................................................................................................ 7 Chapter 1 Introduction..................................................................................... 9 Section 1.1 Cryptography............................................................................. 9 Section 1.2 Contribution of This Thesis ...........................................................11 Chapter 2 The Nature of Cryptography ................................................................13 Chapter 3 Cipher Kernel Analysis ......................................................................17 Section 3.1 Cipher Analysis Tools..................................................................17 Section 3.2 Cipher Throughput Analysis ..........................................................18 Section 3.3 Bottleneck Analysis ....................................................................19 Section 3.4 Cipher Relative Run Time Cost.......................................................21 Section 3.5 Cipher Kernel Characterization .......................................................22 Chapter 4 Architectural Extensions.....................................................................25 Chapter 5 CryptoManiac Architecture..................................................................29 Section 5.1 System Architecture ....................................................................29 Section 5.2 Processing Element Architecture .....................................................31 Section 5.3 Instruction Set Architecture ...........................................................32 Section 5.4 Design Methodology ...................................................................33 Section 5.5 The Super Optimizer ...................................................................35 Section 5.6 Physical Design Characteristics.......................................................37 Chapter 6 Performance Analysis ........................................................................39 Section 6.1 Performance Analysis of ISA Extensions ...........................................39 Section 6.2 Performance Analysis of CryptoManiac.............................................42 Section 6.3 System Analysis of CryptoManiac ...................................................44 Chapter 7 Related Work..................................................................................47 Chapter 8 Conclusions and Future Work...............................................................49 References ....................................................................................................51 Page 5 4/22/01 Acknowledgement Credit for much of the work described in this thesis belongs to my advisor, Professor Todd Austin, for his insight, guidance, and patience. He provided for an excellent research environment, left me enough freedom to do things the way I thought they should be done, and was always available to discuss ideas and problems. I would also like to thank my committee members Professor Steve Reinhardt and Professor Gary Tyson for reviewing this document and serving on the defense committee. Other people that have worked on the CryptoManiac project include Chris Weaver for hardware design and synthesis support, Jerome Burke and John McDonald for earlier versions of ISA extensions code modifications. Page 7 4/22/01 Abstract The growth of the Internet as a vehicle for secure communication and electronic commerce has brought cryptographic processing performance to the forefront of high throughput system design. ...read more.

Middle

If the SBOX aliased bit is not set, SBOX instructions may execute in any order. As a result, these SBOX instructions need not enter the memory ordering buffer (the device that implements out-of-order load/store execution). The SBOX instructions simply enter the cache pipeline when a free port is available. With this implementation, SBOX instructions complete in 2 cycles, much faster than the 4 cycles required to implement SBOX accesses with load instructions. Our more aggressive SBOX implementation adds four SBOX caches to the microarchitecture. SBOX caches have a single tag (the table base address), making them a one line sector cache [16]. Each SBOX cache sector is 32-bytes in length (one data cache line). As shown in Figure 9, SBOX addresses are sent to the specified SBOX cache. The table indicator in the SBOX instruction allows the programmer to "schedule" the SBOX caches, specifying which cache contains a particular table. As a result, the underlying implementation need not implement a 4-ported 4k byte cache, but rather four faster singleported 1k byte SBOX caches. The instruction scheduler directs SBOX instructions to the correct SBOX cache based on the instruction opcode table specifier. The SBOX cache is virtually tagged, thus TLB resources are only required on misses. When the virtual tag does not match, the SBOX cache is flushed and the touched sector is fetched from the data cache. When the SBOXSYNC instruction is executed, all sector valid bits are cleared forcing subsequent SBOX instructions to re-fetch SBOX data from the data cache. On a task switch, the SBOX cache is flushed by invalidating its tag. No writeback is necessary as SBOX caches are read-only. The XBOX instruction implements a portion of a full 64-bit permutation. The operation takes two input registers. One register is the operand to permute; the other register is a permutation map that describes where each input operand bit is written in the destination. ...read more.

Conclusion

Rijndael, the new AES standard, runs 2.25 times faster on a 360MHz CryptoManiac. Our analysis of the original and optimized algorithms suggests that there is more opportunity to speed up cryptographic processing. We are considering improved functional unit designs as well as more aggressive circuit implementations. Our results make a very strong case for the deployment of cryptographic co-processors, however, we believe the results in this paper have stronger implications for the computer architecture community as a whole. With an additional 1% area (for an Alpha 21264 design), we were able to affect a 20% performance improvement over a broad class of cipher algorithms, with individual algorithms benefiting as much as 190%. This is a striking result considering that many commercial design teams use a rule of thumb that any optimization that returns 1% performance improvement for 1% area is a good one. This result is further underscored by the fact that our design is completely synthesized, if the talents of an experienced design team were marshaled to this task, the resulting design would be smaller, faster and lower power. Lisa Wu Page 50 4/22/01 The reason for these striking results is simple - an application specific processor design can achieve a level of efficiency that is impossible for general purpose designs to attain. Our application specific design contains none of the baggage necessary to execute non-cryptographic workloads, making the resulting design smaller and lower power. In addition, our limited application domain creates opportunities to optimize the implementation, yielding superior performance results. Going forward, we are working to assess the cost of programmability in the CryptoManiac. A dedicated Rijndael implementation is under development that will be compared to the design presented in this paper. We are going to show the comparison between the cost of hardware programmability (FPGA), software programmability (CryptoManiac), and no programmability (hardware-only version of Rijndael). In addition, we are developing application specific processors for other application domains. Through this work we hope to demonstrate that application specific optimization can be a powerful tool for improving system performance and cost. ...read more.

The above preview is unformatted text

This student written piece of work is one of many that can be found in our AS and A Level Computer Science section.

Found what you're looking for?

  • Start learning 29% faster today
  • 150,000+ documents available
  • Just £6.99 a month

Not the one? Search for your essay title...
  • Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

See related essaysSee related essays

Related AS and A Level Computer Science essays

  1. Different ways of data capture

    Monthly for a year To prevent mix up give each copy a serial number and keep a logbook. BACKUP HARDWARE For small quantities of data, removable discs are simplest e.g. Zip drives 100Mb Super disk drives take 120Mb and can read ordinary 1.44Mb disk.

  2. Computing Project

    blank but rest of Them Have not 'The Matrix' 'Rest left Blank' An error message should pop up As Expected Add DVD's Price Per Day field Left Blank and rest have not '�3' 'Rest left Blank' And Error message should pop up As expected Add DVD's When the same Film

  1. Processor Scheduling

    / 4 = 3.75 Shortest Remaining Time First This form of processing method will ensure that each process entering with the shortest remaining time will be completed first. This then allows the processor to hand out interrupts until the scheduling is completed.

  2. What is transaction processing?

    place a blank bank card and type in the accessed number enabling access to an individuals account. Via the use of the internet hacking a targets PC via the use of a trogon or sub seven program could enable the hacker to access personal details such as credit card numbers,

  1. Programming Techniques

    A more complex selection statement is the case statement (sometimes called a switch statement). This statement causes the computer to select one of a list of options, for example: Case Menuchoice of 'A' : writeln('you have chosen Add') 'C' : writeln('You have chosen change') 'D' : writeln('You have chosen Delete')

  2. This report aimed to replicate Stroop's (1935) experiment. Using the repeated measures design and ...

    other set will the times to complete test B (explained in the procedure on the next page). A standard set of instructions can be found on the first page of the web site. The independent variable will be the colour interference and the dependant variable will be the time it takes for the participant to verbally respond.

  1. Free essay

    Hardware and Functions of a Micro Processor

    B Here is the logic table Input Output A B C S 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 0 Below can be seen the representations used for the logic gates in drawing form.

  2. Describe the functions of each section within the shops and Head Office, paying particular ...

    how it is captured * Team details from completed order form from a mail order brochure entered onto data screen on computer using input devices such as keyboard and mouse * Product colour from completed order form from a mail order brochure entered onto data screen on computer using input

  • Over 160,000 pieces
    of student written work
  • Annotated by
    experienced teachers
  • Ideas and feedback to
    improve your own work