KERNEL PANIC

FATAL_ERROR: RED_BULL_RESERVOIR_EMPTY

A problem has been detected and systems have been shut down to prevent damage to your sanity.


*** STOP: 0x000000GO (0x000000RU, 0x000000ST, 0x000000SRE, 0x000000AI)


Rebooting in 5 seconds...

Originally published on an external platform.

I have been on both sides of the table—as an interviewer and an interviewee—for DevOps and SRE roles. In this post, I am consolidating a comprehensive list of questions I’ve encountered, ranging from core kernel internals to high-level system architecture.

Note: This is a living document intended to share knowledge, experience, and some of the more “fun” rabbit holes you might encounter during a technical deep dive.


🛠 Part 1: Linux Internals & Troubleshooting

Most SRE interviews begin with the fundamentals. These questions nudge your understanding of how Linux actually manages hardware and processes.

1. The Linux Boot Sequence

Question: What happens when a Linux system boots, from the moment you hit the power button until you see a login prompt?

This probes knowledge of the hardware/software handoff. It covers BIOS/UEFI, the MBR/GPT, the Bootloader (GRUB), Kernel loading, and the Init system (systemd/SysVinit). Detailed Answer can be found here

2. Deep Dive: Executing ls

Question: What happens internally when you type ls in a terminal and hit Enter?

This is a classic question to see if you understand the fork() and exec() system calls.

  1. Read & Tokenize: The shell reads the input using getline() and tokenizes it via strtok().
  2. Alias Check: It checks if ls is a shell alias or built-in.
  3. Path Resolution: If not built-in, the shell searches the PATH environment variable for the binary.
  4. Forking: Once found, the shell calls fork() to create a child process. The fork() call returns 0 to the child and the child’s PID to the parent (the shell).
  5. Execution: The child executes execve(), which provides a new address space and loads the ls program.
  6. Inodes: The ls utility reads directory contents by consulting the filesystem’s inode entries.
  7. Exit: Upon completion, the process calls _exit(0), and the kernel frees its resources.

Pro Tip: Run strace ls to watch these system calls in action.

3. Linux Inodes

Question: Explain what an Inode is and what information it contains.

An Inode (index node) is a data structure that stores metadata about a file (size, owner, permissions, disk block pointers) but not the filename or the actual data. Detailed Answer is available here

4. Crash vs. Panic

Question: What is the difference between an application Crash and a Kernel Panic?

  • Crash: Usually hardware or OS-initiated. It occurs when a process triggers a trap (e.g., trying to access memory incorrectly). Common signals include SIGSEGV (segmentation fault), SIGBUS (bus error), and SIGILL (illegal instruction).
  • Panic: Usually application-initiated (or kernel-initiated) by calling an abort() or panic() function. It happens when the system encounters an unrecoverable error and decides to shut down abruptly to prevent data corruption.

5. The /proc Virtual Filesystem

Question: Explain the purpose of the /proc directory.

/proc is a virtual (pseudo) filesystem that acts as a window into the kernel’s data structures.

  • /proc/[pid]: Contains directories for every running process.
  • /proc/self: Links to the process currently accessing the directory.
  • /proc/maps: Shows the memory address space of a process.
  • /proc/cmdline: Displays the arguments passed to the command line.
  • /proc/environ: Shows the environment variables for a process.
  • /proc/fd: Contains symbolic links to every open file descriptor.
  • /proc/locks: Lists all current system-wide file locks.
  • /proc/sys/fs/file-nr: Displays the number of open files and system limits.
  • /proc/sys/vm: Contains files to tune the virtual memory subsystem.

📈 Part 2: Performance & Troubleshooting

6. The “Invisible” Full Disk

Question: You get a “filesystem is full” error, but df shows free space. How do you troubleshoot?

There are two primary culprits:

  1. Inode Exhaustion: Run df -i. If you have millions of tiny files, you may run out of inodes even if disk space is available.
  2. Unlinked Open Files: A large file was deleted, but a process still holds its file descriptor open. The space isn’t reclaimed until the process closes the file or restarts. Use lsof +L1 to find these.

7. Performance Toolkit

Question: What are the first tools you use to analyze a slow Linux machine?

Use the “First 60 Seconds” approach:

  • uptime (Check load averages)
  • dmesg | tail (Look for kernel errors/OOM kills)
  • vmstat 1 (Check memory and process states)
  • mpstat -P ALL 1 (Check CPU balance)
  • pidstat 1 (Identity resource-hungry processes)
  • iostat -xz 1 (Check disk I/O latency)
  • free -m (Check memory availability)
  • sar -n DEV 1 (Check network throughput)
  • top (General process overview) Read more at Netflix Tech Blog

8. Linux Filesystems

Question: Explain the difference between common Linux filesystems.

Linux supports nearly 100 filesystem types. Common ones include EXT4 (standard), XFS (high performance for large files), BTRFS (copy-on-write, snapshots), and ZFS. Detailed Guide to Filesystems

9. Kernel Space vs. User Space

Question: What is the difference between Kernel Space and User Space?

User Space is where applications run, while Kernel Space is restricted to the core OS. Applications use libraries like libc to make System Calls to the Kernel to perform protected operations (like writing to disk). Detailed Answer

10. Troubleshooting High I/O

Question: How would you troubleshoot a system with high I/O wait?

Troubleshooting I/O Wait Guide


🧠 Part 3: Memory, Processes & Concurrency

11. Processes vs. Threads

Question: What are the fundamental differences between a process and a thread?

A process is an isolated program with its own memory space and a PCB (Process Control Block). Threads are “lightweight processes” that share the memory space of their parent process, making them faster to create but riskier due to shared state. Process Management Deep Dive

12. Memory Management & Status

Question: Explain Kernel Memory Management and Task Status.

Linux manages memory using complex paging and swapping algorithms. Processes can exist in various states: Running, Interruptible Sleep, Uninterruptible Sleep (waiting for I/O), Stopped, or Zombie.

13. Concurrency & Race Conditions

Question: What is a Race Condition in a Linux context?

A race condition occurs when multiple processes or threads access shared data simultaneously, and the final result depends on the timing of their execution. Concurrency and Race Conditions

14. Stack vs. Heap

Question: Explain Stack and Heap memory.

  • Stack: Fast, static memory allocation managed by the CPU. Stores local variables and function calls.
  • Heap: Dynamic memory allocation used for larger data structures. Requires manual management or Garbage Collection. Stack vs Heap Comparison

15. Memory Leaks

Question: Define a Memory Leak.

  • Naive Definition: Failure to release unreachable memory. Detected by tools like Valgrind or managed by Garbage Collection.
  • Subtle Definition: Failure to release reachable memory that is no longer needed. This is much harder to detect and can still occur in garbage-collected languages.

16. Interrupts

Question: How does Linux handle hardware and software interrupts?

Interrupts Deep Dive

17. Load Average

Question: What do the numbers in uptime actually mean?

Load average is the average number of processes in the Running or Uninterruptible state. Read the definitive Load Average guide by Brendan Gregg.

18. What happens when you curl?

Question: Explain the flow of data when you execute curl www.google.com.

This covers DNS resolution, TCP handshakes, TLS negotiation, and HTTP request/response cycles. What Happens When… (Detailed Flow)


🔗 High-Value Resources

Here are the best resources I recommend for any technical interview preparation:

  1. Facebook Production Engineer prep
  2. LinkedIn Prep Wiki
  3. SRE Interview Handbook
  4. Engineering Manager Prep
  5. Google SWE Interview Tips
  6. Amazon SWE Preparation
  7. Troubleshooting BottleNecks & Leaks
  8. Linux Performance Analysis (Brendan Gregg)
  9. Awesome Scalability Guide

I’ll keep tracking these and updating this guide. Stay tuned!

Happy Troubleshooting and Best of luck!

36.5°C
CORE TEMPERATURE

KERNEL PANIC

Critical system failure. All Gophers have escaped.

Rebooting universe in 5...

Error: PEBKAC_EXCEPTION
Address: 0xDEADBEEF