• Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

Flexible Architectures in Communication Security Application

Extracts from this document...

Introduction

Master of Science in Computer Science and Engineering Thesis: Flexible Architectures in Communication Security Application Bryan Chong Advanced Computer Architecture Laboratory University of Michigan Ann Arbor, MI 48109 Table of Contents Acknowledgement............................................................................................ 5 Abstract ........................................................................................................ 7 Chapter 1 Introduction..................................................................................... 9 Section 1.1 Cryptography............................................................................. 9 Section 1.2 Contribution of This Thesis ...........................................................11 Chapter 2 The Nature of Cryptography ................................................................13 Chapter 3 Cipher Kernel Analysis ......................................................................17 Section 3.1 Cipher Analysis Tools..................................................................17 Section 3.2 Cipher Throughput Analysis ..........................................................18 Section 3.3 Bottleneck Analysis ....................................................................19 Section 3.4 Cipher Relative Run Time Cost.......................................................21 Section 3.5 Cipher Kernel Characterization .......................................................22 Chapter 4 Architectural Extensions.....................................................................25 Chapter 5 CryptoManiac Architecture..................................................................29 Section 5.1 System Architecture ....................................................................29 Section 5.2 Processing Element Architecture .....................................................31 Section 5.3 Instruction Set Architecture ...........................................................32 Section 5.4 Design Methodology ...................................................................33 Section 5.5 The Super Optimizer ...................................................................35 Section 5.6 Physical Design Characteristics.......................................................37 Chapter 6 Performance Analysis ........................................................................39 Section 6.1 Performance Analysis of ISA Extensions ...........................................39 Section 6.2 Performance Analysis of CryptoManiac.............................................42 Section 6.3 System Analysis of CryptoManiac ...................................................44 Chapter 7 Related Work..................................................................................47 Chapter 8 Conclusions and Future Work...............................................................49 References ....................................................................................................51 Page 5 4/22/01 Acknowledgement Credit for much of the work described in this thesis belongs to my advisor, Professor Todd Austin, for his insight, guidance, and patience. He provided for an excellent research environment, left me enough freedom to do things the way I thought they should be done, and was always available to discuss ideas and problems. I would also like to thank my committee members Professor Steve Reinhardt and Professor Gary Tyson for reviewing this document and serving on the defense committee. Other people that have worked on the CryptoManiac project include Chris Weaver for hardware design and synthesis support, Jerome Burke and John McDonald for earlier versions of ISA extensions code modifications. Page 7 4/22/01 Abstract The growth of the Internet as a vehicle for secure communication and electronic commerce has brought cryptographic processing performance to the forefront of high throughput system design. ...read more.

Middle

If the SBOX aliased bit is not set, SBOX instructions may execute in any order. As a result, these SBOX instructions need not enter the memory ordering buffer (the device that implements out-of-order load/store execution). The SBOX instructions simply enter the cache pipeline when a free port is available. With this implementation, SBOX instructions complete in 2 cycles, much faster than the 4 cycles required to implement SBOX accesses with load instructions. Our more aggressive SBOX implementation adds four SBOX caches to the microarchitecture. SBOX caches have a single tag (the table base address), making them a one line sector cache [16]. Each SBOX cache sector is 32-bytes in length (one data cache line). As shown in Figure 9, SBOX addresses are sent to the specified SBOX cache. The table indicator in the SBOX instruction allows the programmer to "schedule" the SBOX caches, specifying which cache contains a particular table. As a result, the underlying implementation need not implement a 4-ported 4k byte cache, but rather four faster singleported 1k byte SBOX caches. The instruction scheduler directs SBOX instructions to the correct SBOX cache based on the instruction opcode table specifier. The SBOX cache is virtually tagged, thus TLB resources are only required on misses. When the virtual tag does not match, the SBOX cache is flushed and the touched sector is fetched from the data cache. When the SBOXSYNC instruction is executed, all sector valid bits are cleared forcing subsequent SBOX instructions to re-fetch SBOX data from the data cache. On a task switch, the SBOX cache is flushed by invalidating its tag. No writeback is necessary as SBOX caches are read-only. The XBOX instruction implements a portion of a full 64-bit permutation. The operation takes two input registers. One register is the operand to permute; the other register is a permutation map that describes where each input operand bit is written in the destination. ...read more.

Conclusion

Rijndael, the new AES standard, runs 2.25 times faster on a 360MHz CryptoManiac. Our analysis of the original and optimized algorithms suggests that there is more opportunity to speed up cryptographic processing. We are considering improved functional unit designs as well as more aggressive circuit implementations. Our results make a very strong case for the deployment of cryptographic co-processors, however, we believe the results in this paper have stronger implications for the computer architecture community as a whole. With an additional 1% area (for an Alpha 21264 design), we were able to affect a 20% performance improvement over a broad class of cipher algorithms, with individual algorithms benefiting as much as 190%. This is a striking result considering that many commercial design teams use a rule of thumb that any optimization that returns 1% performance improvement for 1% area is a good one. This result is further underscored by the fact that our design is completely synthesized, if the talents of an experienced design team were marshaled to this task, the resulting design would be smaller, faster and lower power. Lisa Wu Page 50 4/22/01 The reason for these striking results is simple - an application specific processor design can achieve a level of efficiency that is impossible for general purpose designs to attain. Our application specific design contains none of the baggage necessary to execute non-cryptographic workloads, making the resulting design smaller and lower power. In addition, our limited application domain creates opportunities to optimize the implementation, yielding superior performance results. Going forward, we are working to assess the cost of programmability in the CryptoManiac. A dedicated Rijndael implementation is under development that will be compared to the design presented in this paper. We are going to show the comparison between the cost of hardware programmability (FPGA), software programmability (CryptoManiac), and no programmability (hardware-only version of Rijndael). In addition, we are developing application specific processors for other application domains. Through this work we hope to demonstrate that application specific optimization can be a powerful tool for improving system performance and cost. ...read more.

The above preview is unformatted text

This student written piece of work is one of many that can be found in our AS and A Level Computer Science section.

Found what you're looking for?

  • Start learning 29% faster today
  • 150,000+ documents available
  • Just £6.99 a month

Not the one? Search for your essay title...
  • Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

See related essaysSee related essays

Related AS and A Level Computer Science essays

  1. Control Unit, Memory Unit, and Arithmetic Logic Unit. The CPU or Central Processing ...

    You are advised not to use this as an example of storage in ROM. A typical question will ask for an example of what is stored in ROM and RAM. The safest answers are the bootstrap being stored in ROM and user software and data being stored in RAM. 2.

  2. Computing Project

    Is added twice An error message should pop up As expected 22 Form Description of test Test Data Used Expected outcome Actual Outcome Page of Report 23 Add DVD's Removing A DVD DVD should be deleted from the table As expected 24 Rent Out A DVD Renting Out A DVD

  1. Business blue print document for the implementation of SAP R/3 (4.6B) payroll at Mastek ...

    The payroll will calculate the wage amounts accordingly. Deputation to any other country (Taking place other than first day of the month) Deputation has been defined in two ways at Mastek: - 1. Temporary Assignment - Where the employee continues to get his base country's salary. 2. Repatriation - Where the employee leaves the base country's organization joins sister concern in a different country.

  2. Primary or Secondary Storage.

    can be written onto many times like a floppy disk. Magnetic Tapes Magnetic Tape is now mainly used as a backup medium. It is a cheap medium for backing up hard disks on both microcomputers and mainframes.

  1. Free essay

    Hardware and Functions of a Micro Processor

    B Here is the logic table Input Output A B C S 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 0 Below can be seen the representations used for the logic gates in drawing form.

  2. Fetch Decode and Execute

    But how does the processor know what action it is supposed to take with the data? Well, in the conversion process the data is assigned what is known as an operand. This is a data that tells the processor whether it is to print, multiply add and so on.

  1. Review of the article "Supply Chain Analysis at Volkswagen of America".

    it would increase customer responsiveness and minimize the costs of the total system. The study team tried to look for potential new locations for distribution centers and their opening sequence. The expectations were that by combining dealer and distribution center inventory, the new system would increase the possibility of supplying

  2. Form and Function In Design Technology

    The first large-scale machines to take advantage of this transistor technology were early supercomputers, Stretch by IBM and LARC by Sperry-Rand. These computers, both developed for atomic energy laboratories, could handle an enormous amount of data, a capability much in demand by atomic scientists.

  • Over 160,000 pieces
    of student written work
  • Annotated by
    experienced teachers
  • Ideas and feedback to
    improve your own work