Strictly speaking this is off topic since it is not KDE related, anyway it looks interesting for those who use suspend to disk in Linux, specially the
Tux on Ice flavor of the available suspend implementations for Linux.
I have been using Tux on Ice for years and also for years I have been suffering from segfaults in several programs after resume from disk. With my
old notebook I workarounded this problem by cleaning the swap partition (swapoff -a; swapon -a) before suspending. That works but, as you may be thinking, it is damn slow if there are hundreds of MB in swap partition.
When I bought my
current notebook I elimated any swap partition and started using a swap file instead. Resizing partitions is a slow process and not always safe, resizing a file is much easier (just delete it and recreate it :-)). Using swap file was something I wanted to do for a long time back then. Well, the swap partition problem was fixed but not the segfaults.
Last week I looked more deeply into the problem to figure out how to solve it for good or at least find a faster workaround. One thing that I noticed in my /var/log/message is the failed memory allocation below:
Jul 24 09:22:14 evolucao kernel: [81109.228445] [fglrx] IRQ 46 Disabled
Jul 24 09:22:15 evolucao kernel: [81109.228526] [fglrx] Preparing suspend fglrx in kernel.
Jul 24 09:22:15 evolucao kernel: [81109.228538] kworker/u:8: page allocation failure: order:10, mode:0x20
Jul 24 09:22:15 evolucao kernel: [81109.228540] Pid: 26030, comm: kworker/u:8 Tainted: P A O 3.4.6-lvs #9
Jul 24 09:22:15 evolucao kernel: [81109.228542] Call Trace:
Jul 24 09:22:16 evolucao kernel: [81109.228552] [] warn_alloc_failed+0x108/0x11d
Jul 24 09:22:16 evolucao kernel: [81109.228559] [] ? number.clone.1+0x129/0x229
Jul 24 09:22:16 evolucao kernel: [81109.228562] [] __alloc_pages_nodemask+0x61e/0x6c3
Jul 24 09:22:17 evolucao kernel: [81109.228566] [] cache_alloc_refill+0x276/0x4fb
Jul 24 09:22:17 evolucao kernel: [81109.228568] [] __kmalloc+0x9d/0x144
Jul 24 09:22:17 evolucao kernel: [81109.228617] [] ? KCL_MEM_SmallBufferAllocAtomic+0x19/0x1b [fglrx]
Jul 24 09:22:17 evolucao kernel: [81109.228640] [] KCL_MEM_SmallBufferAllocAtomic+0x19/0x1b [fglrx]
Jul 24 09:22:18 evolucao kernel: [81109.228669] [] libip_resume+0x253/0x5c0 [fglrx]
Jul 24 09:22:19 evolucao kernel: [81109.228704] [] ? mc_heap_get_reserved_blocks_info+0x17e/0x2a0 [fglrx]
Jul 24 09:22:19 evolucao kernel: [81109.228725] [] ? KCL_MEM_SmallBufferAllocAtomic+0x19/0x1b [fglrx]
Jul 24 09:22:20 evolucao kernel: [81109.228758] [] ? firegl_pm_save_framebuffer+0x204/0x300 [fglrx]
Jul 24 09:22:20 evolucao kernel: [81109.228785] [] ? firegl_cail_powerdown+0x8d/0x240 [fglrx]
Jul 24 09:22:20 evolucao kernel: [81109.228812] [] ? libip_suspend+0x22/0x50 [fglrx]
Jul 24 09:22:21 evolucao kernel: [81109.228831] [] ? ip_firegl_lseek+0xeb8/0x17ef [fglrx]
Jul 24 09:22:21 evolucao kernel: [81109.228835] [] ? pci_legacy_suspend+0x35/0xb8
Jul 24 09:22:21 evolucao kernel: [81109.228838] [] ? pci_pm_freeze+0x43/0x8b
Jul 24 09:22:21 evolucao kernel: [81109.228844] [] ? device_pm_wait_for_dev+0x24/0x24
Jul 24 09:22:21 evolucao kernel: [81109.228846] [] ? pci_pm_poweroff+0x98/0x98
Jul 24 09:22:21 evolucao kernel: [81109.228849] [] ? dpm_run_callback.clone.4+0x2a/0x58
Jul 24 09:22:22 evolucao kernel: [81109.228851] [] ? __device_suspend+0x145/0x1c4
Jul 24 09:22:22 evolucao kernel: [81109.228856] [] ? async_schedule+0x12/0x12
Jul 24 09:22:22 evolucao kernel: [81109.228858] [] ? async_suspend+0x1a/0x85
Jul 24 09:22:22 evolucao kernel: [81109.228861] [] ? async_run_entry_fn+0xa3/0x159
Jul 24 09:22:22 evolucao kernel: [81109.228865] [] ? process_one_work+0x214/0x393
Jul 24 09:22:22 evolucao kernel: [81109.228868] [] ? need_to_create_worker+0x19/0x32
Jul 24 09:22:22 evolucao kernel: [81109.228871] [] ? worker_thread+0x17e/0x243
Jul 24 09:22:23 evolucao kernel: [81109.228875] [] ? preempt_schedule+0x35/0x48
Jul 24 09:22:23 evolucao kernel: [81109.228877] [] ? manage_workers.clone.17+0x16e/0x16e
Jul 24 09:22:23 evolucao kernel: [81109.228880] [] ? kthread+0x84/0x8c
Jul 24 09:22:23 evolucao kernel: [81109.228883] [] ? kernel_thread_helper+0x4/0x10
Jul 24 09:22:23 evolucao kernel: [81109.228885] [] ? kthread_freezable_should_stop+0x4d/0x4d
Jul 24 09:22:23 evolucao kernel: [81109.228888] [] ? gs_change+0xb/0xb
You must be thinking in blaming fglrx for this problem but I use the opensource driver in my old notebook, which probably still has this problem (I still own my old notebook). Moreover, in the first months after buying my current notebook I used to use the opensource drivers with it too, so I am almost sure this problem is not related to that allocation failure above. OBS: I changed to the proprietary ATI driver because only with it my notebook's fan keeps quiet without the "silent" speed button that comes with my notebook. The silent button also limits the cores' clock to almost 1 GHz below the maximum clock and having to press it everytime I want full CPU power or restart the notebook is really annoying. For someone who uses Gentoo I used to press that button quite often when I used the opensource driver. Unfortunately, the GPU does not inform its power profile to the kernel so the opensource driver's dynamic profile does not work with it. I am stuck to the proprietary driver until someone can fix that issue. I hope
this news allows someone to finally implement the dynamic profile for my GPU.
Turning back to the segfault problem, some days ago I tried something different. Tux On Ice works with swap partition, swap files and also with a
dedicated file to store the image. I decided to split my 4 GB swap file into two: one 2 GB file for swap and another 2 GB file for the hibernate image. My notebook came with 4 GB of RAM memory and I usually do not need more than 1 GB of swap even when I run two virtual machines, Chromium, Firefox and several other programs running in parallel, so I can afford a smaller swap file (again: swap files are really easy to resize :-)). Guess what? There has been no segfaults so far :-D. I also love the fact that Tux on Ice compresses the RAM image using all available cores before saving it.
I do not know if the vanilla suspend implementation in the Linux kernel also suffers from this problem. By what I could find on the Internet it may be affected as well. As far as I know the vanilla suspend does not allow using anything but swap partitions to store the image (not even swap files). So if you have this problem you can give Tux on Ice a try.
I did a final test by suspending to disk with chromium, VirtualBox, Amarok, kmail, konversation, kopete and skype running. The command free reported "-/+ buffers/cache: 2310964 KB", which means about 59% of the available RAM memory used by applications (excluding buffers and disck cache). There was also 666844 KB in my swap file. The suspend process was not that fast but (about 20s). Resuming took much more time (about 2 minutes until the Plasma Desktop reappears). There was a lot of disk activity and the swap usage went from 666844 KB to 1027240 KB and then down to 879456 KB in a period of two minutes after which I could finally type commands in konsole. However everything is still working, no segfault and Amarok kept playing the same song I was listening to when I suspended my notebook :-) I am now at the fifth day running without reboots and segfaults, the swap usage dropped to 353112 KB during this period.
The failed memory allocation I talked about above happened this time and everything still works with fglrx (including kwin's effects). Well, I have found a fast suspend/resume configuration at last (as long as there was not much data in swap). If I close the memory hungry programs (like VirtualBox and Chromium) before suspending then it suspends in about 10s and resumes in about 15s, half of that time running pm-utils' scripts That is not that bad :-)