• Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

Flexible Architectures in Communication Security Application

Extracts from this document...

Introduction

Master of Science in Computer Science and Engineering Thesis: Flexible Architectures in Communication Security Application Bryan Chong Advanced Computer Architecture Laboratory University of Michigan Ann Arbor, MI 48109 Table of Contents Acknowledgement............................................................................................ 5 Abstract ........................................................................................................ 7 Chapter 1 Introduction..................................................................................... 9 Section 1.1 Cryptography............................................................................. 9 Section 1.2 Contribution of This Thesis ...........................................................11 Chapter 2 The Nature of Cryptography ................................................................13 Chapter 3 Cipher Kernel Analysis ......................................................................17 Section 3.1 Cipher Analysis Tools..................................................................17 Section 3.2 Cipher Throughput Analysis ..........................................................18 Section 3.3 Bottleneck Analysis ....................................................................19 Section 3.4 Cipher Relative Run Time Cost.......................................................21 Section 3.5 Cipher Kernel Characterization .......................................................22 Chapter 4 Architectural Extensions.....................................................................25 Chapter 5 CryptoManiac Architecture..................................................................29 Section 5.1 System Architecture ....................................................................29 Section 5.2 Processing Element Architecture .....................................................31 Section 5.3 Instruction Set Architecture ...........................................................32 Section 5.4 Design Methodology ...................................................................33 Section 5.5 The Super Optimizer ...................................................................35 Section 5.6 Physical Design Characteristics.......................................................37 Chapter 6 Performance Analysis ........................................................................39 Section 6.1 Performance Analysis of ISA Extensions ...........................................39 Section 6.2 Performance Analysis of CryptoManiac.............................................42 Section 6.3 System Analysis of CryptoManiac ...................................................44 Chapter 7 Related Work..................................................................................47 Chapter 8 Conclusions and Future Work...............................................................49 References ....................................................................................................51 Page 5 4/22/01 Acknowledgement Credit for much of the work described in this thesis belongs to my advisor, Professor Todd Austin, for his insight, guidance, and patience. He provided for an excellent research environment, left me enough freedom to do things the way I thought they should be done, and was always available to discuss ideas and problems. I would also like to thank my committee members Professor Steve Reinhardt and Professor Gary Tyson for reviewing this document and serving on the defense committee. Other people that have worked on the CryptoManiac project include Chris Weaver for hardware design and synthesis support, Jerome Burke and John McDonald for earlier versions of ISA extensions code modifications. Page 7 4/22/01 Abstract The growth of the Internet as a vehicle for secure communication and electronic commerce has brought cryptographic processing performance to the forefront of high throughput system design. ...read more.

Middle

If the SBOX aliased bit is not set, SBOX instructions may execute in any order. As a result, these SBOX instructions need not enter the memory ordering buffer (the device that implements out-of-order load/store execution). The SBOX instructions simply enter the cache pipeline when a free port is available. With this implementation, SBOX instructions complete in 2 cycles, much faster than the 4 cycles required to implement SBOX accesses with load instructions. Our more aggressive SBOX implementation adds four SBOX caches to the microarchitecture. SBOX caches have a single tag (the table base address), making them a one line sector cache [16]. Each SBOX cache sector is 32-bytes in length (one data cache line). As shown in Figure 9, SBOX addresses are sent to the specified SBOX cache. The table indicator in the SBOX instruction allows the programmer to "schedule" the SBOX caches, specifying which cache contains a particular table. As a result, the underlying implementation need not implement a 4-ported 4k byte cache, but rather four faster singleported 1k byte SBOX caches. The instruction scheduler directs SBOX instructions to the correct SBOX cache based on the instruction opcode table specifier. The SBOX cache is virtually tagged, thus TLB resources are only required on misses. When the virtual tag does not match, the SBOX cache is flushed and the touched sector is fetched from the data cache. When the SBOXSYNC instruction is executed, all sector valid bits are cleared forcing subsequent SBOX instructions to re-fetch SBOX data from the data cache. On a task switch, the SBOX cache is flushed by invalidating its tag. No writeback is necessary as SBOX caches are read-only. The XBOX instruction implements a portion of a full 64-bit permutation. The operation takes two input registers. One register is the operand to permute; the other register is a permutation map that describes where each input operand bit is written in the destination. ...read more.

Conclusion

Rijndael, the new AES standard, runs 2.25 times faster on a 360MHz CryptoManiac. Our analysis of the original and optimized algorithms suggests that there is more opportunity to speed up cryptographic processing. We are considering improved functional unit designs as well as more aggressive circuit implementations. Our results make a very strong case for the deployment of cryptographic co-processors, however, we believe the results in this paper have stronger implications for the computer architecture community as a whole. With an additional 1% area (for an Alpha 21264 design), we were able to affect a 20% performance improvement over a broad class of cipher algorithms, with individual algorithms benefiting as much as 190%. This is a striking result considering that many commercial design teams use a rule of thumb that any optimization that returns 1% performance improvement for 1% area is a good one. This result is further underscored by the fact that our design is completely synthesized, if the talents of an experienced design team were marshaled to this task, the resulting design would be smaller, faster and lower power. Lisa Wu Page 50 4/22/01 The reason for these striking results is simple - an application specific processor design can achieve a level of efficiency that is impossible for general purpose designs to attain. Our application specific design contains none of the baggage necessary to execute non-cryptographic workloads, making the resulting design smaller and lower power. In addition, our limited application domain creates opportunities to optimize the implementation, yielding superior performance results. Going forward, we are working to assess the cost of programmability in the CryptoManiac. A dedicated Rijndael implementation is under development that will be compared to the design presented in this paper. We are going to show the comparison between the cost of hardware programmability (FPGA), software programmability (CryptoManiac), and no programmability (hardware-only version of Rijndael). In addition, we are developing application specific processors for other application domains. Through this work we hope to demonstrate that application specific optimization can be a powerful tool for improving system performance and cost. ...read more.

The above preview is unformatted text

This student written piece of work is one of many that can be found in our AS and A Level Computer Science section.

Found what you're looking for?

  • Start learning 29% faster today
  • 150,000+ documents available
  • Just £6.99 a month

Not the one? Search for your essay title...
  • Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

See related essaysSee related essays

Related AS and A Level Computer Science essays

  1. Marked by a teacher

    The systems development life cycle (SDLC) is a conceptual model used in project management ...

    3 star(s)

    User confusion of prototype and finished system: Users can begin to think that a prototype, intended to be thrown away, is actually a final system that merely needs to be finished or polished. (They are, for example, often unaware of the effort needed to add error-checking and security features which a prototype may not have.)

  2. Peer reviewed

    Organisational system security - The possible security issues which exist within the FilmPoster.com system

    3 star(s)

    to overload or shut down a service they usually attack and target web servers aiming to make a website unavailable and no longer accessible to the users. The most common way of attack used is by sending traffic and overloading a computer and using botnet to flood a web server with request.

  1. Review of the article "Supply Chain Analysis at Volkswagen of America".

    The output of the MIP was a location scenario. The objective function consisted of two factors: total transportation cost (mileage, modes of transportation, and truck load factors) and, fixed and overhead costs of Dc's and processing centers. Truck load factors referred to the average number of vehicles that a truck

  2. Different ways of data capture

    Backup devices come with special software that helps you to select which files to copy, when and how. KEEPING BACKUP SAFE Backup copies should be ion fire proof safes preferably off site. How long should you keep backups? * Daily backups for a week * Weekly for a month *

  1. The purpose of this coursework is to design a network for a small to ...

    This can be achieved by using a file server designed to share data and store files for each department Servers There will be 3 dedicated servers configured in the main HQ each performing different functions as the new branch expands servers could also be implemented there also.

  2. Unit 10 Server-side scripting of web pages part 2

    Serial Advanced Technology Attachment (SATA) - Is a computer bus interface for connecting host bus adapters to mass storage devices such as hard disk drives and optical drives. SATA speeds are far greater than IDE and the size difference means they are easier to manage and don't restrict air flow, so most modern computers tend to use SATA over IDE.

  1. BTEC National in IT Organisational systems security - Software and network security (P3,M2,D1)

    new viruses or when a virus has replicated itself in a special manner). Use of Virtual Private Networks (VPN): This is an established connection between two computer systems over a public network such as the internet.

  2. Small Office Network Implementation - hardware and security.

    or cable. Depending on the broadband provider, you might be required to use additional equipment such as a cable or DSL modem or a residential gateway. * Configuration simplicity- most home or small networks are not managed by an information technology (IT)

  • Over 160,000 pieces
    of student written work
  • Annotated by
    experienced teachers
  • Ideas and feedback to
    improve your own work