• Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

Flexible Architectures in Communication Security Application

Extracts from this document...

Introduction

Master of Science in Computer Science and Engineering Thesis: Flexible Architectures in Communication Security Application Bryan Chong Advanced Computer Architecture Laboratory University of Michigan Ann Arbor, MI 48109 Table of Contents Acknowledgement............................................................................................ 5 Abstract ........................................................................................................ 7 Chapter 1 Introduction..................................................................................... 9 Section 1.1 Cryptography............................................................................. 9 Section 1.2 Contribution of This Thesis ...........................................................11 Chapter 2 The Nature of Cryptography ................................................................13 Chapter 3 Cipher Kernel Analysis ......................................................................17 Section 3.1 Cipher Analysis Tools..................................................................17 Section 3.2 Cipher Throughput Analysis ..........................................................18 Section 3.3 Bottleneck Analysis ....................................................................19 Section 3.4 Cipher Relative Run Time Cost.......................................................21 Section 3.5 Cipher Kernel Characterization .......................................................22 Chapter 4 Architectural Extensions.....................................................................25 Chapter 5 CryptoManiac Architecture..................................................................29 Section 5.1 System Architecture ....................................................................29 Section 5.2 Processing Element Architecture .....................................................31 Section 5.3 Instruction Set Architecture ...........................................................32 Section 5.4 Design Methodology ...................................................................33 Section 5.5 The Super Optimizer ...................................................................35 Section 5.6 Physical Design Characteristics.......................................................37 Chapter 6 Performance Analysis ........................................................................39 Section 6.1 Performance Analysis of ISA Extensions ...........................................39 Section 6.2 Performance Analysis of CryptoManiac.............................................42 Section 6.3 System Analysis of CryptoManiac ...................................................44 Chapter 7 Related Work..................................................................................47 Chapter 8 Conclusions and Future Work...............................................................49 References ....................................................................................................51 Page 5 4/22/01 Acknowledgement Credit for much of the work described in this thesis belongs to my advisor, Professor Todd Austin, for his insight, guidance, and patience. He provided for an excellent research environment, left me enough freedom to do things the way I thought they should be done, and was always available to discuss ideas and problems. I would also like to thank my committee members Professor Steve Reinhardt and Professor Gary Tyson for reviewing this document and serving on the defense committee. Other people that have worked on the CryptoManiac project include Chris Weaver for hardware design and synthesis support, Jerome Burke and John McDonald for earlier versions of ISA extensions code modifications. Page 7 4/22/01 Abstract The growth of the Internet as a vehicle for secure communication and electronic commerce has brought cryptographic processing performance to the forefront of high throughput system design. ...read more.

Middle

If the SBOX aliased bit is not set, SBOX instructions may execute in any order. As a result, these SBOX instructions need not enter the memory ordering buffer (the device that implements out-of-order load/store execution). The SBOX instructions simply enter the cache pipeline when a free port is available. With this implementation, SBOX instructions complete in 2 cycles, much faster than the 4 cycles required to implement SBOX accesses with load instructions. Our more aggressive SBOX implementation adds four SBOX caches to the microarchitecture. SBOX caches have a single tag (the table base address), making them a one line sector cache [16]. Each SBOX cache sector is 32-bytes in length (one data cache line). As shown in Figure 9, SBOX addresses are sent to the specified SBOX cache. The table indicator in the SBOX instruction allows the programmer to "schedule" the SBOX caches, specifying which cache contains a particular table. As a result, the underlying implementation need not implement a 4-ported 4k byte cache, but rather four faster singleported 1k byte SBOX caches. The instruction scheduler directs SBOX instructions to the correct SBOX cache based on the instruction opcode table specifier. The SBOX cache is virtually tagged, thus TLB resources are only required on misses. When the virtual tag does not match, the SBOX cache is flushed and the touched sector is fetched from the data cache. When the SBOXSYNC instruction is executed, all sector valid bits are cleared forcing subsequent SBOX instructions to re-fetch SBOX data from the data cache. On a task switch, the SBOX cache is flushed by invalidating its tag. No writeback is necessary as SBOX caches are read-only. The XBOX instruction implements a portion of a full 64-bit permutation. The operation takes two input registers. One register is the operand to permute; the other register is a permutation map that describes where each input operand bit is written in the destination. ...read more.

Conclusion

Rijndael, the new AES standard, runs 2.25 times faster on a 360MHz CryptoManiac. Our analysis of the original and optimized algorithms suggests that there is more opportunity to speed up cryptographic processing. We are considering improved functional unit designs as well as more aggressive circuit implementations. Our results make a very strong case for the deployment of cryptographic co-processors, however, we believe the results in this paper have stronger implications for the computer architecture community as a whole. With an additional 1% area (for an Alpha 21264 design), we were able to affect a 20% performance improvement over a broad class of cipher algorithms, with individual algorithms benefiting as much as 190%. This is a striking result considering that many commercial design teams use a rule of thumb that any optimization that returns 1% performance improvement for 1% area is a good one. This result is further underscored by the fact that our design is completely synthesized, if the talents of an experienced design team were marshaled to this task, the resulting design would be smaller, faster and lower power. Lisa Wu Page 50 4/22/01 The reason for these striking results is simple - an application specific processor design can achieve a level of efficiency that is impossible for general purpose designs to attain. Our application specific design contains none of the baggage necessary to execute non-cryptographic workloads, making the resulting design smaller and lower power. In addition, our limited application domain creates opportunities to optimize the implementation, yielding superior performance results. Going forward, we are working to assess the cost of programmability in the CryptoManiac. A dedicated Rijndael implementation is under development that will be compared to the design presented in this paper. We are going to show the comparison between the cost of hardware programmability (FPGA), software programmability (CryptoManiac), and no programmability (hardware-only version of Rijndael). In addition, we are developing application specific processors for other application domains. Through this work we hope to demonstrate that application specific optimization can be a powerful tool for improving system performance and cost. ...read more.

The above preview is unformatted text

This student written piece of work is one of many that can be found in our AS and A Level Computer Science section.

Found what you're looking for?

  • Start learning 29% faster today
  • 150,000+ documents available
  • Just £6.99 a month

Not the one? Search for your essay title...
  • Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

See related essaysSee related essays

Related AS and A Level Computer Science essays

  1. Marked by a teacher

    The systems development life cycle (SDLC) is a conceptual model used in project management ...

    3 star(s)

    This can lead them to expect the prototype to accurately model the performance of the final system when this is not the intent of the developers. Users can also become attached to features that were included in a prototype for consideration and then removed from the specification for a final system.

  2. Peer reviewed

    Organisational system security - The possible security issues which exist within the FilmPoster.com system

    3 star(s)

    to overload or shut down a service they usually attack and target web servers aiming to make a website unavailable and no longer accessible to the users. The most common way of attack used is by sending traffic and overloading a computer and using botnet to flood a web server with request.

  1. Computing Project

    to A Member DVD is rented out and added to the loans table As expected 25 Rent Out A DVD When a member is not selected but the

  2. Review of the article "Supply Chain Analysis at Volkswagen of America".

    However, because of the tremendous number of alternatives, there was a need for a systematic way of generating location scenarios. In order to reduce the number of alternative location scenarios to be evaluated, the team formulated a mixed integer program (MIP)

  1. The purpose of this coursework is to design a network for a small to ...

    loads but I would recommend the more expensive Xeon 5500 series processor as it has dual layer processing. This would aid the company for future growth. Memory - Servers should have a minimum of 4GB of RAM. I would recommend 4-8GB of Corsair DDR3 RAM for each server computer.

  2. Unit 10 Server-side scripting of web pages part 2

    Very much, I think this feature alone makes Windows 7 stand above the rest. #2 Taskbar A remake of the old taskbar that introduced some new features such as pinning, this allows you to pin a application to the taskbar (similar to Mac dock)

  1. BTEC National in IT Organisational systems security - Software and network security (P3,M2,D1)

    new viruses or when a virus has replicated itself in a special manner). Use of Virtual Private Networks (VPN): This is an established connection between two computer systems over a public network such as the internet.

  2. Small Office Network Implementation - hardware and security.

    For example, some buildings might have restrictions on installing cabling or requirements to use existing cabling. Other locations might restrict the use of wireless network devices due to electrical shielding or interference. * Security for the Internet connection- The Internet connection, the physical method of connecting one or more of

  • Over 160,000 pieces
    of student written work
  • Annotated by
    experienced teachers
  • Ideas and feedback to
    improve your own work