Memtest: Finding holes in the VM system *************************************** WIP *** Juan Quintela ============= Department of Computer Science ============================== Universidade da Coruņa ====================== quintela@dc.fi.udc.es ====================== Abstract: This paper describes the development of a test suite for the VM subsystem and several of the resulting programs in detail. A proposal for dealing with the shown bottlenecks are made. This suite of programs is called memtest. The suite is composed of several programs that generate different kind of IO and memory loads, such as writing big files (mmap*), using a lot of shared memory (ipc*), programs that do a lot of memory allocations/frees. This test suite is not used for benchmarking, it is used to find bottlenecks.In the presentation we discuss the goals of several tests. The part of the kernel they affect and what are the good ways to handle them.The tests that form the suite has been contributed by several people. The suite intend to be a place to put tests when anybody find a bottleneck and write a program to show it, then it is easy to make sure that future versions don't have that problem. 1 Introduction *=*=*=*=*=*=*=*= This paper describes the development of a test suite for the VM subsystem. This test suite was used to let people get for a single place programs for testing the system for errors, and a place where you can put code that found bugs in previous implementations and then see that we don't have the same problem again.In the section 2 is described the born of the memtest suite. In the next sections, I describe several of the tests, what they do and what problems they find, and what was the solution used to solve them. 2 Previous life *=*=*=*=*=*=*=*=* In the beginning, the author was an happy PhD. student that was working in his PhD. thesis. The programs related with his thesis stressed a lot the VM layer and made my linux machine die hard (die in the sense of Oops). He used the standard procedure in this cases: He wrote a nice Bug report to the Linux kernel mailing list waiting for the nice kernel hackers to fix his problem. That was in the 2.2 kernel era. But nothing happened. He continued working in his thesis, thinking that the problems will be solved in 2.4 with the new memory layer/page cache .... Each time that Linus get out a new kernel version, he tested the new kernel version, it normally solved some problems and other appeared ..... At the end (end???) of the 2.3 era (i.e. 2.3.99-preX), he found that his problems has not solved yet. Then he thought that it would be a good idea to try to help the kernel hackers to fix the problem. At the same time, it happened that Rik van Riel came to my University to give a couple of conferences. He was the right person to show the problems that he was having. He show him the problems, and he asked for a small program which reproduced the hangs. Memtest was born. After I have some programs written, I got other people who asked me to get more programs inside the suite. 3 The tests *=*=*=*=*=*=* One important thing about almost all the tests in memtest is that they are test to check that the VM layer behaves well, they are not benchmarks in the speed/space sense. It is good that this programs run well, but the important thing is that they should not run too bad. In the future, I will try to also add some speed tests to it, or at least to give some pointers to other benchmarks and the way to use them for searching for several bottlenecks. 4 mmap001 tests *=*=*=*=*=*=*=*=* The mmap tests are examples based in the works that I was doing in my PhD. mmap001 is a test that creates a file the size of the physical memory of the machine and then writes it sequentially. In the 2.3.99-preX kernel series, a machine with 128MB of RAM will stall during as much time as 30 seconds because in the kernel, it waited for starting writing something to disk until there was no more free space. At that point the kernel started to write asynchronously the dirty pages, but all the pages was dirty. Then it started to write asynchronously the whole memory of the machine and that took a lot of time to succeed. This test is one of the clearest examples that memtest is not a benchmark. This test is supposed that for the system, it should be the same mmap a big file and write to it sequentially than do normal writes to the same size file. This test is not needed to run very fast, but a normal user will not expect the whole system to stall for minutes. Once that the biggest stalls have been solved there was still problems that the kernel got loads around 15 while running this test, what is also not expected in a normal system. 5 mmap002 *=*=*=*=*=* This test is a continuation of the previous test. It mmap a file twice the size of the memory, then it writes sequentially the first half of the file. Then it opens an anonymous mapping and copies all the info from the file to the anonymous mapping. After that it copies again the shared mapping to the other half of the file. This test showed that the kernel at that moment begins to swap when it was too late. It waited for doing any swap, writing to disk, etc until there was no more clean pages, at that point, all the system was doing really bad (read trashing). The problem here, is that the system tried too hard caching pages (and specially dirty pages). Other problem was that all the process were having problems for doing allocations, when there was only a memory hog, and it was supposedly easy to detect it. Well, at the end it showed that it was not so easy to detect the memory hog, when there was only one, and the problem become really nasty when we had several memory hogs. This is another test, where all the heuristics of the kernel made it to work worst (i.e. we never reused a single page of the file, and we walk sequentially files bigger than physical memory. That made that a system that don't do any cache will be faster running that test. But for normal use, we prefer a system that do caching. The problem here is that we want to detect when we are accessing a file only sequentially and then not doing so much caching. One user could not wait a full speed write with mmap002, but he will also not wait that the system enter trashing when there is only a process that is doing linear copies of files. 6 ipc001 *=*=*=*=*= This test was created by Christoph Roland to test System V shared memory. It creates several shared memory segments and try to test that all the operations in the system (writes, attach, dettach, ...). 7 Future Work *=*=*=*=*=*=*=* There are several things that I have planned to do to make the memtest suite more useful: - Put more documentation for the suite, to make easy for other people to use it and more importantly, to understand the results, while they are wrong or correct. - Get more tests for the suite. Folks, I am accepting submissions of what are the tests/scripts/knowledge that you use for testing the system. - Write more documentation. - Make the tests more modular. The idea is to be able to get an easy way to simulate real loads. I want to make easy to define in an easy way tests like misc001, where there are one process making malloc()/free() for 1/3 of the memory, uses mmap() for other third of the memory and fwrite() for other file. That would be easy if the test where very simple, getting their parameters from the command line. - Have I told about having documentation to let other people doing the previous kind of thing. - Include benchmarks (or pointers to it) and document the way to use them for measuring specific things. Just now everybody is using some benchmarks for doing that, but each people use their small subset and there is no way to know what benchmark is used for measuring what, and what is the correct way to configure the benchmarks. One example in this regard is comment things like people use normally dbench 48 to measure the performance of the file system and the page cache. An idea of what means the parameter, what you should expect and the causes of previous pitfalls will be very useful. - Considering the possible integration of memtest with LTP: Linux Test project (http://oss.sgi.com/projects/ltp/). Their is a bigger and more ambitious project, but there is a bit more difficult to write a test for doing that. I am thinking about integrating both efforts (i.e. let them to do all the difficult work), or at least making an easy way to share code. - Create a way to run the tests non-interactively. By non interactively I mean that it should be possible to detect if the system is responsible or not under high load created by the tests. Just now, you have to guess if the system is more/less responsive with a change, and there is not a way to run all the tests at night and know at the following morning if some of the tests made the interactive response bad. Playing music and hearing the pikes don't work when you are out of the office. There is one program that does something similar for the low-latency tests, I could use if for a start. 8 Acknowledgements *=*=*=*=*=*=*=*=*=*= I want to thank several people and institutions for the help that they have giving me for doing memtest: - Rik van Riel: He explained me a lot of things and helped me a lot to begin working in the Linux Kernel. - `#kernelnewbies' at irc.openprejects.net: There is a lot of cool people connected there that helped me a lot with discussions and explanations about all the questions that I have about The Linux Kernel. - All the people that contributed code and ideas to the suite. - Conectiva and the Universidade da Coruņa for funding an SMP test machine that helped me to find holes, test bugs and develop the suite (Like in the real life, almost all the bugs shows up really faster when you are working with SMP). ------------------------------------------------------------------------------- This document was translated from LaTeX by HeVeA (http://pauillac.inria.fr/~maranget/hevea/index.html).