Versione in lingua italiana: Intervista a Rik van Riel
Rik van Riel was born in the Netherlands in 1978. He started using GNU/Linux in 1994 and hès one of the most active Linux developers right now. After he worked as Linux consulent for a dutch company, he was assumed by Conectiva S.A., the biggest Linux company of South America. His engagement for Linux is turned mainly at Virtual Memory's gestion. Together with Andrea Arcangeli he was responsible of kernels' 2.2 and 2.4 virtual memory gestion.
AS: Hey Rik. First of all I would like to thank you for your kind disposal.
RvR: Hello. I'm at your disposal.
With kernel 2.4.10 we have seen that Linus Torvalds has preferred Arcangelìs VM to yours. What do you think of his decision? And why has he made that?
It was a strange situation, first Linus ignores bugfixes by me and Alan for almost a year, then he complains we "didn't send" him the bugfixes and he replaces the VM of course. The new VM has better performance than the old VM for typical desktop systems ... but it fails horribly on more systems than the old VM. Redhat, for example, cannot ship the new VM in their distribution because it'll just fall apart for the database servers, some of their users run at least now my code is gone I no longer have to work together with Linus, which is a good thing ;)
Why is it a good thing?
With Linus out of the way, I can make a good VM. I no longer have to worry about what Linus likes or doesn't like. This is mostly important for intermediary code, where some of the "ingredients" to a VM are in place and others aren't yet in place. Such code can look ugly or pointless if you don't have the time to look at the design for a few days, so Linus tends to remove it ... even though it is needed to continue with development
The new VM has caused many developers' critics, that have seen in that so radical change the reason of an instability. Also your words have been very hard: "Look, the problem is that Linus is being an asshole and integrating conflicting ideas into both the VM and the VFS, without giving anybody prior notice and later blame others." Are you of the same mind?
Yes, though I guess I have to add that I have a lot of respect for Linus. He is a very user unfriendly source management system, but he is also very honest about it. One thing to fix this issue for me would be an automatic patch bot, a computer program to automatically resubmit patches to Linus. That way I can use Linus just as a source management program with intelligent feedback and I don't have to worry about having to re-send patches 50 times before they get into the kernel because that will be automatic ;)
I seem that there is not good blood between you lately, is there?
Actually I'm not holding a grudge or anything. The truth of the matter is that I'm just as stubborn as he is and I can't stand the way that "the Linus source control system" works. I suspect that once a patchbot is in place (a TCP-like system on top of the lossy Linus source control system) wèll both be much happier.
I hope that you can solve your problems. At a certain point also Alan Cox,after going on with using your VM has found Arcangelìs VM so safe to be used. What do you think of that?
Judging from the fact that Redhat is busy porting my old VM to 2.4.17 to use in their kernel RPM, I guess hès come back on that idea.
I'm going with following your development of OOM Killer and approaching to VM with the Reverse Mapping and I'm founding them very interesting, but perhaps difficult to realize them into an efficient way. Could you explain to our readers what do they work and why they are needed?
OK, I'll start with the OOM killer as that one is easy to explain.
Basically on a Linux system you have a certain amount of RAM and of swap. RAM is used for kernel data, for caching executables, libraries and data from the disk and for storing the memory programs use. As everybody knows, systems never have enough RAM, so we have to choose what data to keep in RAM and what data to move to swap space. However, in some situations systems do not have enough swap space and the system just doesn't have enough space to run all the programs which were started. In that situation we can either wait until somebody frees memory (not possible if everybody is waiting) or we need to free up space forcefully, for example by killing a process. Of course you do not want a random process to be killed, that could be init, syslogd or some other critical process. The OOM killer is responsible for detecting when the system is out of memory and swap and for selecting the right (read: least bad) process to kill.
The need for the reverse mapping (rmap) VM is much more complex to explain, but basically the old VM is facing a number of problems:
- we have different memory zones and need to keep free pages in each zone.
- pages can be shared by many processes, but we don't know which ones.
- we need to balance between evicting pages from the cache and evicting pages from processes.
These three problems basically boil down to one problem, we don't know how much a particular page in RAM is used or who is using it. In the old VM, for example, you can have the situation where programs are using pages in the 16 MB large ISA DMA zone, but you might need to scan all memory of all processes to free one page in that zone because you do not know who is using those pages.
The reverse mapping VM attempts to solve this problem by keeping track of who is using each page. That way, we can just scan the pages in the DMA zone and free one which isn't used much, meaning we need to scan at most the metadata for 16 MB of pages ... in practice a lot less than that. It also means we can scan processes and cache memory using the same algorithm, so it simplifies that part of the VM a lot.
Currently -rmap is still in development, but in most situations where the normal VM works ok the performance of -rmap is similar. Then there are some situations where the normal VM falls apart due to the problems described above; in those areas -rmap still has good performance.
This means that benchmarks probably won't show any big advantage of -rmap over the standard VM, but -rmap will make sure that your server can survive better if it gets slashdotted. It also restores good behaviour in a number of other situations where the standard VM just falls apart, most notably people have reported a more responsive desktop with -rmap. If you want to try it, you can download the -rmap patch at http://surriel.com/patches/ or access the bitkeeper tree at http://linuxvm.bkbits.net/.
How do you think to solve the swap storms problem on machines with few RAM under heavy load?
Well, let me begin with the bad news: there is no magic VM. If your programs need more RAM than what is available in the computer and they need it all at the same time, there is nothing the VM can do. However, if they only need a smaller amount of RAM at a time, or if you have multiple programs running and some of them fit in RAM, there are some possible solutions. If you have one program and most of the time it needs less RAM than what the system has, the VM can make the system perform decently by just chosing the right pages to swap out. If you have (for example) 5 programs that each need 30% of RAM, the only possible solution is to not run more than 3 at a time.
Luckily this is something where the VM could help, by just stopping two of the processes for a while and stopping two others later on, so each process gets a chance to run at full speed for part of the time. At the moment the only thing which is implemented in -rmap is the better pageout selection. Temporarily stopping some processes when the load gets too high is something I still need to work on. There is also a third mechanism, which is already partially present. When the system is low on RAM, we just don't do readahead. This means that while our disk IO is slow for a while, at least we don't make the situation worse than it is and make it possible for the pages which are in RAM to stay for a bit longer.
Someone thinks, as a possibile solution for the actual VM's problems in high loads, to switch to a local paging, when the number of free pages is... say 15% more than freepages.low, being *sure* every process has at least his working set in memory. What is your opinion about it?
That solution is very often proposed by people who don't think about the problem. However, I never found anybody capable of explaining exactly how that "solution" is supposed to fix the problem. This is one of the minor annoyances of VM development, hundreds of people come to me with "solutions" of which they don't understand how they are supposed to work or whether they are related to the problem at hand; occasionally people come forward with good ideas, but most of the time people just come with the same idea. If yoùre interested in discussing such ideas, yoùre always welcome to drop by in #kernelnewbies on irc.openprojects.net, just don't think yoùre the first person to think of a particular idea because most likely you aren't ;)
What is the Kernelnwebies project? How much time do you spend on it?
Kernelnewbies is a project to teach people about kernel hacking, or how the kernel works. The project is mostly about the Linux kernel, but of course discussions about other kernels are welcome, too. If people want to learn about the kernel, they can drop by at http://kernelnewbies.org/ or on IRC in #kernelnewbies at irc.openprojects.net . The Kernelnewbies project seems to have trained a few new kernel hackers and testers over the last few years. My computer is in the #kernelnewbies IRC channel all the time, I often spend time in #kernelnewbies while running tests.
What is the difference between your VM and Arcangelìs?
You want me to answer that question in how many books ? ;) Well, lets make a short answer. Andreàs VM is an attempt to improve the performance of the Linux VM without modifying the structure of the VM. He seems to succeed at it very well, but due to the fact that he doesn't modify the structure of the VM his VM still has the same fundamental problems as the Linux VM. My VM is an attempt to attack some of the fundamental problems the Linux VM has, at the moment still without too much performance tuning.
I think that to have two different types of running VM developed at the same time on two kernels is very interesting, because it could permit to have the best from both. What do you think about?
It's true in a way, but you have to be very careful about your definition of "best". For some people, "best" is the VM which runs fastest on a limited set of benchmarks. For other people, the "best" VM is the one which allows their computer to run, without falling into worst-case behaviour of a VM occasionally. I have to agree that it's good to have different ideas about the VM, though ;)
Anyone is free to use the VM that he prefers. I think that it is also called "free software" for that reason. :)
This is part of free software. The other part is that people are free to combine the good parts from various VMs to construct something even better.
Sure. Lately some kernel release have had some problems of maturity (I refer to the 2.4.11 version). Is that only simply a chance as I believe or is for other reasons?
I believe a main reason is that bugreports and bugfixes get lost all the time. Development of the Linux kernel happens on a very lossy medium, the linux kernel mailing list.
On the other hand, a bug database isn't workable since it would just fill up with old crap before you could blink your eyes and become just as unsearchable as a mailing list archive. To fix this problem we will need a new kind of solution since the only workable development forum is one where old stuff _does_ get out of sight quickly, otherwise developers would just be overwhelmed. This tool would retransmit the "lost packets" amongst the patches and automatically check whether they still apply or not. The problem would stay for bugreports, but I think that is unavoidable.
So could it be more frequent in the future?
That depends on what the developers do. If some system (either social or technical) is in place to make sure bugfixes don't get lost, I think the system will behave a lot better.
According to you can the planned measures in the DMCA influence the development of the Free Software? I refer in particular to the decision of not taking the security fixes in the CHANGELOG of the 2.2.20pre11
It could be the end of software development for the user, with the world going to develop software only to maximise profit for the "content providers". The same thing could happen to the computer as what happened to the TV, hardly any interesting programs left and programs designed to make the viewers sit through the commercials more willingly. On the other hand, computers are a very powerful tool and I doubt people would ever stop developing software just because Disney announces it wants to make more profits, so I don't think free software development will stop.
I believe that the trend is to optimize Linux for the very powerful machines (multiprocessor and with a lot of RAM). Do you agree with me?
No, not at all. The embedded Linux market seems to be much more active than development of Linux on high end servers. On the other hand, the high end server improvements tend to touch more code in the core kernel while a lot of the embedded systems people just keep separate patches and have no plans of merging their code with Linus.
Don't you believe that it could obstruct the spreading of Linux on the desktop market? What does Linux need yet to front that?
This is unrelated, the desktop market needs applications and isn't really dependant on further kernel development.
I believe that the appointment of Mr Marcelo Tosati as new mantainer has been an intelligent move: so it will be possible for you developers and Linus to concentrate on 2.5 in the best way so that the mistakes of 2.4 cannot be repeated. What is your opinion on that?
Marcelo may be able to avoid the mistakes of earlier 2.4 kernels in later 2.4 kernels, but I'm sure some of these exact same mistakes will be reintroduced during 2.5. Linus was right when he said that "[his] tree is nothing special" and "Linus doesn't scale". It is a lot more convenient and productive to do development in a separate tree, so bugfixes can be integrated quickly and Linus isn't further overloaded. I suspect we will want some semi-automated system to keep sending bugreports for already accepted code to Linus, so Linus can comfortably drop patches when he doesn't have time, knowing that they'll be sent again.
The 2.5 proposes many interesting targets. And you, what are preparing?
I'm working on making the VM better, not just more scalable to larger machines, but also adjusting itself for smaller machines. Most importantly, I want to make a VM which doesn't fall over when it is faced with a system load it wasn't optimised for, I want it to be robust.
I thank you again for your kindness in giving us the opportunity to know more from you.