The Evolution of Concepts Designed to Optimize System Performance Over the Past 25 Years CAMS 310 MUMS Abstract Since the invention of the first computer, engineers have been conceptualizing and implementing ways to optimize system performance. The last 25 years have seen a rapid evolution of many of these concepts, particularly cache memory, virtual memory, pipelining, and reduced set instruction computing (RISC). Individual each one of these concepts has helped to increase speed and efficiency thus enhancing overall system performance. Most systems today make use of many, if not all of hose concepts.Arguments can be made to support the importance of any one of these concepts over one another, however, the use of cache memory or "caching" has been one of the most efficient and effective methods to increase system performance. Introduction Over the past 25 years, there has been much advancement in computer systems and architecture to improve system performance.

The development of concepts such as cache memory, virtual memory, pipelining, and reduced instruction set computing (RISC) have led to increases in speed and processing power, as well as optimization of CPU usage and energy efficiency.These concepts have evolved over the years, and continue to evolve and give rise to new concepts which enhance system performance at an almost exponential rate. Computers today are more powerful, and cheaper to manufacture and maintain than ever before. This paper will examine the evolution of, and current trends in improving computer system performance by exploring concepts such as cache memory, virtual memory, pipelining, and RISC, and assessing the impact these concepts have made, and continue to make on system performance.

Cache Memory The invention of the microprocessor allowed computers to become smaller and faster Han ever before by drastically shrinking the size of the central processing unit (COP]). At the same time computer memory has expanded rapidly as memory chips have become cheaper to manufacture. However, these new large, low-powered random access memory (RAM) chips do not work as fast as smaller higher powered RAM chips. As the microprocessor evolved and increased in speed, and memory continued to expand, the large low-powered RAM chips could not keep up with the COP].As a result, manufacturers started inserting smaller RAM chips between the CPU and the main memory to store instructions or data that was frequently accessed, allowing the CPU to function at full speed. These chips became known as cache RAM.

When an instruction or data that needs to be accessed is not in the cache, then the main memory needs to be accessed and the CPU has to stop and wait for the instruction to be fetched causing a delay in processing known as latency. The larger the size of the cache means an increase in cache hits, which keeps the CPU running at full speed.The first cache chips were external to the microprocessor and were installed on the motherboard and controlled by a memory cache controller. These ache chips used an architecture known as "write through," which meant that for write operations the memory cache controller updated RAM memory immediately. Today, the cache is on the microprocessor itself, removing the slowdown of wires between chips and operating at the internal clock speed of the CPU thus increasing the speed of execution drastically.

The larger the size of the memory, the longer it takes to be addressed by the COP], but the more data that can be stored in the cache means the less the main memory needs to be accessed, causing a bit of a conundrum. If the installed cache is too large, then the benefit of increased speed is lost because the CPU will take more time to address the instructions on the chip. To remedy this, engineers created levels of cache memory using different size cache chips. Level 1 (L 1), level 2 (LO), and level 3 (LB) cache chips are used to expand overall size of cache, while keeping the chip size small so that instructions can be addressed quickly.

The most frequently used instructions or data are stored on the smaller faster Al cache chips, and the less frequently used data or instructions are stored on he larger slower LO or LO cache chips, depending on the system. This method allows for a large amount of data to be stored in cache, while still allowing for fast access. Originally, only Al cache was installed on the microprocessor itself, while LO cache was external. Now most chips have Al and LO internal, and possibly LO as external or internal. Most microprocessors utilize two Al cache chips, one for instructions and one for data (Tortes, 2007).Modern cache uses an architecture known as "write back" where the CPU stores data in the cache memory and the memory cache controller only updates the RAM when a cache miss occurs.

Cache memory has done wonders for increasing system performance; however, cache size or the number of layers of cache is not a good indicator of cache performance. Double the cache size does not necessarily translate to twice the performance, Just as a higher clock rate doesn't translate to proportionately more performance. Benchmark tests are better indicators of microprocessor speed than clock rate or cache size specifications.Virtual Memory The execution of programs occurs mainly in the cache memory and main RAM memory because they are faster than the memory on the hard drive.

Modern cache ND RAM is vastly larger than ever before, however, the size of the cache and main memory is fixed. This presents a problem when the program size increases as it executes, eventually the program would run out of memory. This would mean programmers would have to edit their code every time they changed machines or added more memory (Lempel, 1999). Early computers had small amounts of RAM because storage technology was very expensive.Programmers had to store master copies of programs on a secondary storage system and pull pieces into RAM as needed. The process of deciding which pieces to pull and which parts of RAM to place was called "overlaying" (Deeding, 2012).

"It was estimated that most programmers spent halloo two-thirds of their time planning overlay sequences. A reliable method of automating it had potential to increase programmer productivity and reduce debugging by several fold" (Deeding, 2012). Thus, the concept of virtual memory was born. This concept makes use of the computer's hard drive when main memory runs out.

However, the hard drive is significantly slower than RAM so we want to keep most of the program functioning in RAM thus specialized hardware and software is needed to give the illusion of unlimited available fast memory (Lempel, 999). This hardware converts a "virtual" address too physical address in memory. Aside from virtually increasing memory size, virtual memory also provided three additional benefits, "it isolated users from each other, it allowed dynamic relocation of program pieces within RAM, and it provided read - write access control to individual pieces" (Deeding, 2012).It is for these reasons that "the story of virtual memory is not simply a story of progress in automatic storage allocation; it is a story of machines helping programmers to protect information, reuse and share objects, and link software components" (Deeding 2012). Pipelining In order to optimize system performance, "designers must find and fix bottlenecks" (Weaver, 2001). The more complex a system becomes, the more opportunities for the bottlenecks to occur.

A system that can only execute one instruction at a time would take a long time to execute multiple instructions.The solution to this is pipelining. "Pipelining is a technique used to improve the execution throughput of a CPU by using the processor resources in a more efficient manner" (Abraham, 2009). The basic principle is that each instruction is split into several independent stages.

These stages consist of fetching, decoding, executing, and writing. The CPU uses separate modules for each section of code. This allows multiple instructions to be executed in parallel, instead of the CPU having to wait until an entire instruction sequence completes prior to beginning a new instruction sequence.Pipelining plays a huge role in optimizing system performance by maximizing CPU usage and increasing speed of execution.

RISC The concept of RISC was designed in an effort to maximize system performance by reducing the number of transistors needed to store instructions and reducing the lock cycles required to execute each instruction. By reducing the number of transistors, RISC architecture makes more room for general use registers allow a reduction in the Cups interactions with memory (Chem., 2006).It was the RISC architecture that first made pipelining possible.

It took RISC based processing awhile to become competitive in the market because of a lack of software support, however, RISC based architecture now controls a commanding share of the mobile device market. This is largely due to the fact that the cost of RAM has dropped dramatically, ND compiler technology has become more sophisticated, thus making the RISC use of RAM and emphasis on software ideal (Chem., 2006).RISC architecture made it easy for manufacturers to build high performance systems at an affordable price, however many of those systems offered high performance in terms of some parameters at the expense of others (NAS, 1991). Still, "the streamlined internal architecture of RISC processors often results in reduced instruction cycle times when compared with CICS JPL's" (Mann, 1992). Conclusion Since the creation of the first computer many concepts have been developed to optimize system performance.

In the past 25 years many new concepts have arisen and evolved making computers exponentially faster and more powerful than ever before.The increasing desire for mobile technology has led to the need to "maximize energy-efficient performance, dynamically trading off performance and power to have the best performance while keeping power within specified limits" (Almandine, 2013). Two concepts that play a large role in this tradeoff are cache and memory behavior "since power optimizations may Jeopardize memory cell reliability' (Almandine, 2013). The growth of the World Wide Web has also increased the need to optimize system performance.