• Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

Flexible Architectures in Communication Security Application

Extracts from this document...

Introduction

Master of Science in Computer Science and Engineering Thesis: Flexible Architectures in Communication Security Application Bryan Chong Advanced Computer Architecture Laboratory University of Michigan Ann Arbor, MI 48109 Table of Contents Acknowledgement............................................................................................ 5 Abstract ........................................................................................................ 7 Chapter 1 Introduction..................................................................................... 9 Section 1.1 Cryptography............................................................................. 9 Section 1.2 Contribution of This Thesis ...........................................................11 Chapter 2 The Nature of Cryptography ................................................................13 Chapter 3 Cipher Kernel Analysis ......................................................................17 Section 3.1 Cipher Analysis Tools..................................................................17 Section 3.2 Cipher Throughput Analysis ..........................................................18 Section 3.3 Bottleneck Analysis ....................................................................19 Section 3.4 Cipher Relative Run Time Cost.......................................................21 Section 3.5 Cipher Kernel Characterization .......................................................22 Chapter 4 Architectural Extensions.....................................................................25 Chapter 5 CryptoManiac Architecture..................................................................29 Section 5.1 System Architecture ....................................................................29 Section 5.2 Processing Element Architecture .....................................................31 Section 5.3 Instruction Set Architecture ...........................................................32 Section 5.4 Design Methodology ...................................................................33 Section 5.5 The Super Optimizer ...................................................................35 Section 5.6 Physical Design Characteristics.......................................................37 Chapter 6 Performance Analysis ........................................................................39 Section 6.1 Performance Analysis of ISA Extensions ...........................................39 Section 6.2 Performance Analysis of CryptoManiac.............................................42 Section 6.3 System Analysis of CryptoManiac ...................................................44 Chapter 7 Related Work..................................................................................47 Chapter 8 Conclusions and Future Work...............................................................49 References ....................................................................................................51 Page 5 4/22/01 Acknowledgement Credit for much of the work described in this thesis belongs to my advisor, Professor Todd Austin, for his insight, guidance, and patience. He provided for an excellent research environment, left me enough freedom to do things the way I thought they should be done, and was always available to discuss ideas and problems. I would also like to thank my committee members Professor Steve Reinhardt and Professor Gary Tyson for reviewing this document and serving on the defense committee. Other people that have worked on the CryptoManiac project include Chris Weaver for hardware design and synthesis support, Jerome Burke and John McDonald for earlier versions of ISA extensions code modifications. Page 7 4/22/01 Abstract The growth of the Internet as a vehicle for secure communication and electronic commerce has brought cryptographic processing performance to the forefront of high throughput system design. ...read more.

Middle

If the SBOX aliased bit is not set, SBOX instructions may execute in any order. As a result, these SBOX instructions need not enter the memory ordering buffer (the device that implements out-of-order load/store execution). The SBOX instructions simply enter the cache pipeline when a free port is available. With this implementation, SBOX instructions complete in 2 cycles, much faster than the 4 cycles required to implement SBOX accesses with load instructions. Our more aggressive SBOX implementation adds four SBOX caches to the microarchitecture. SBOX caches have a single tag (the table base address), making them a one line sector cache [16]. Each SBOX cache sector is 32-bytes in length (one data cache line). As shown in Figure 9, SBOX addresses are sent to the specified SBOX cache. The table indicator in the SBOX instruction allows the programmer to "schedule" the SBOX caches, specifying which cache contains a particular table. As a result, the underlying implementation need not implement a 4-ported 4k byte cache, but rather four faster singleported 1k byte SBOX caches. The instruction scheduler directs SBOX instructions to the correct SBOX cache based on the instruction opcode table specifier. The SBOX cache is virtually tagged, thus TLB resources are only required on misses. When the virtual tag does not match, the SBOX cache is flushed and the touched sector is fetched from the data cache. When the SBOXSYNC instruction is executed, all sector valid bits are cleared forcing subsequent SBOX instructions to re-fetch SBOX data from the data cache. On a task switch, the SBOX cache is flushed by invalidating its tag. No writeback is necessary as SBOX caches are read-only. The XBOX instruction implements a portion of a full 64-bit permutation. The operation takes two input registers. One register is the operand to permute; the other register is a permutation map that describes where each input operand bit is written in the destination. ...read more.

Conclusion

Rijndael, the new AES standard, runs 2.25 times faster on a 360MHz CryptoManiac. Our analysis of the original and optimized algorithms suggests that there is more opportunity to speed up cryptographic processing. We are considering improved functional unit designs as well as more aggressive circuit implementations. Our results make a very strong case for the deployment of cryptographic co-processors, however, we believe the results in this paper have stronger implications for the computer architecture community as a whole. With an additional 1% area (for an Alpha 21264 design), we were able to affect a 20% performance improvement over a broad class of cipher algorithms, with individual algorithms benefiting as much as 190%. This is a striking result considering that many commercial design teams use a rule of thumb that any optimization that returns 1% performance improvement for 1% area is a good one. This result is further underscored by the fact that our design is completely synthesized, if the talents of an experienced design team were marshaled to this task, the resulting design would be smaller, faster and lower power. Lisa Wu Page 50 4/22/01 The reason for these striking results is simple - an application specific processor design can achieve a level of efficiency that is impossible for general purpose designs to attain. Our application specific design contains none of the baggage necessary to execute non-cryptographic workloads, making the resulting design smaller and lower power. In addition, our limited application domain creates opportunities to optimize the implementation, yielding superior performance results. Going forward, we are working to assess the cost of programmability in the CryptoManiac. A dedicated Rijndael implementation is under development that will be compared to the design presented in this paper. We are going to show the comparison between the cost of hardware programmability (FPGA), software programmability (CryptoManiac), and no programmability (hardware-only version of Rijndael). In addition, we are developing application specific processors for other application domains. Through this work we hope to demonstrate that application specific optimization can be a powerful tool for improving system performance and cost. ...read more.

The above preview is unformatted text

This student written piece of work is one of many that can be found in our AS and A Level Computer Science section.

Found what you're looking for?

  • Start learning 29% faster today
  • 150,000+ documents available
  • Just £6.99 a month

Not the one? Search for your essay title...
  • Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

See related essaysSee related essays

Related AS and A Level Computer Science essays

  1. Control Unit, Memory Unit, and Arithmetic Logic Unit. The CPU or Central Processing ...

    You are advised not to use this as an example of storage in ROM. A typical question will ask for an example of what is stored in ROM and RAM. The safest answers are the bootstrap being stored in ROM and user software and data being stored in RAM. 2.

  2. GCSE I.T Security Case Study - Riverside Leisure Centre

    This will consist of: A description of their data, What it contains the purposes for which it is being processed, Why it is being processed people to whom it may be disclosed, Who is allowed to see it the name of the organisation that is actually carrying out the processing of their data.

  1. Computing Project

    As expected 9 Register a Member Title left blank but rest of the fields are not No Title An error message should pop up As Expected 10 Register a Member Forename left blank but rest of the fields are not No Forename An error message should pop up As Expected

  2. Primary or Secondary Storage.

    New types of CD are being produced which can be written to by the user. These include the WORM (Write Once Read Many) - this can have data written onto it once only but other erasable CDs are now available where CDs (CD-RW Compact disk Read Write)

  1. What is transaction processing?

    personal reference numbers etc in order to gain money, details or personal information etc. Another disadvantage is that we as a society have become more reliant of this form of transaction and if problems arise, such as mechanical failure we could be find ourselves unable to carry out transaction in

  2. Business blue print document for the implementation of SAP R/3 (4.6B) payroll at Mastek ...

    The payday is the last working day of the month. The chart attached herewith will make the wage type understanding more clearly. The wage types shown in this chart are the wage types used by Mastek. Handling of various scenarios through Payroll - India Hiring (Taking place other than first of the month)

  1. Form and Function In Design Technology

    Throughout the early 1960's, there were a number of commercially successful second generation computers used in business, universities, and government from companies such as Burroughs, Control Data, Honeywell, IBM, Sperry-Rand, and others. These second generation computers were also of solid state design, and contained transistors in place of vacuum tubes.

  2. Describe the hardware and software used to create and edit graphic images and compare ...

    There are limitations to using Illustrator. One of the main limitations of vector image is time consuming and also specific talent must be needed to create it. Therefore, it may not be suitable for all people and users will need to learn a lot of the software before they can produce good quality work.

  • Over 160,000 pieces
    of student written work
  • Annotated by
    experienced teachers
  • Ideas and feedback to
    improve your own work