Linux for Site Reliability Engineers
Master the Linux skills that power production systems. From your very first terminal command to writing shell scripts that automate operations — explained in plain English for absolute beginners.
Prerequisites & Getting Started
Welcome! Before we dive into Linux, let's make sure you have everything you need. Think of this section as packing your backpack before a hike — spending five minutes here saves hours of frustration later.
What is a Terminal?
Imagine your computer has two faces. The first face is the one you normally see — icons, windows, and a mouse cursor. The second face is the terminal: a plain text window where you type commands and the computer types back. SREs spend most of their day in this second face.
Getting a Linux Environment
You need a Linux environment to practice. Here are your options:
- Windows users: Install WSL 2 (Windows Subsystem for Linux). Open PowerShell as administrator and run
wsl --install. Restart, then open "Ubuntu" from the Start menu. - macOS users: Open the built-in Terminal app (macOS is Unix-based and most commands are identical to Linux).
- Anyone: Create a free account on Google Cloud Shell or AWS CloudShell for an instant browser-based Linux terminal.
- VirtualBox / VMware: Download and run Ubuntu as a virtual machine on any OS.
Your First Commands — Verify Your Setup
Once you have a terminal open, type these to confirm things are working:
# Print your shell version
bash --version
# Find out which Linux distribution you're on
cat /etc/os-release
# Print a friendly hello world
echo "Hello, SRE world!"
What You Should Already Know
This tutorial assumes zero Linux knowledge. Basic computer literacy (using a browser, creating files) is all you need. No programming experience required — though curiosity helps enormously.
[DIAGRAM: Side-by-side comparison of GUI vs Terminal showing the same "list files" operation — clicking in Finder vs typing ls in a terminal]
Replace with actual diagram
Quiz Section 0 — 10 Questions
1What is a shell in the context of Linux?
- AA protective outer layer of the Linux kernel
- BA command-line interpreter that reads your commands and passes them to the OS
- CA graphical user interface for Linux
- DA type of Linux filesystem
Reveal Answer
✓ Correct Answer: B
A shell (like bash or zsh) is the program that reads what you type and asks the OS kernel to carry it out. The kernel does the heavy lifting; the shell is the translator between you and the kernel.
2What is the best way for a Windows user to try Linux without replacing their OS?
- ABuy a new computer pre-loaded with Linux
- BUse Windows Subsystem for Linux (WSL 2)
- CDelete Windows and install Linux
- DUse Notepad to simulate Linux commands
Reveal Answer
✓ Correct Answer: B
WSL 2 runs a real Linux kernel inside a lightweight virtual machine on Windows. You get a genuine Linux terminal alongside Windows with no rebooting needed — the ideal beginner setup.
3What does CLI stand for?
- AComputer Language Interface
- BCommand Line Interpreter
- CCommand Line Interface
- DComputer Linux Interface
Reveal Answer
✓ Correct Answer: C
CLI stands for Command Line Interface — the text-based way to interact with an OS by typing commands. As opposed to a GUI (Graphical User Interface) which uses icons and a mouse.
4Which of the following is NOT a Linux distribution?
- AUbuntu
- BCentOS
- CmacOS
- DDebian
Reveal Answer
✓ Correct Answer: C
macOS is Apple's proprietary OS based on BSD Unix — not a Linux distribution. It shares many Unix conventions (which is why many commands overlap), but the kernel is Darwin, not Linux.
5What is a terminal emulator?
- AA physical hardware device required to run Linux
- BA software program providing a text-interface window to the shell
- CA type of Linux kernel module
- DA network protocol for remote access
Reveal Answer
✓ Correct Answer: B
A terminal emulator (like GNOME Terminal, iTerm2, or Windows Terminal) is simply the window/app you open to get a shell prompt. It "emulates" the hardware terminals of the past in software.
6What is the relationship between the Linux kernel and a Linux distribution?
- AThey are the same thing — kernel and distro are synonyms
- BThe kernel is the core; a distribution bundles it with package manager, utilities, and tools
- CEach distribution has its own completely different kernel
- DDistributions are older than the kernel
Reveal Answer
✓ Correct Answer: B
Think of the kernel as the engine and a distribution as the complete car. Ubuntu, Fedora, and Debian all use the same Linux kernel, but each wraps it with different tools, package managers, and default software.
7What does "open source" mean in the context of Linux?
- AThe OS has no security features because the code is public
- BAnyone can view, modify, and distribute the source code under an open license
- CLinux is only free for personal, non-commercial use
- DThe operating system is incomplete and community-maintained patches are required
Reveal Answer
✓ Correct Answer: B
Open source means the source code is publicly available for anyone to study, modify, and redistribute. This transparency allows security researchers worldwide to audit it — often making it more secure, not less.
8Which shell is the most common default on Linux distributions?
- Azsh
- Bfish
- Cbash
- Dcsh
Reveal Answer
✓ Correct Answer: C
bash (Bourne Again Shell) is the lingua franca of Linux administration and scripting. Most server distros default to bash. macOS switched to zsh as default in Catalina, but bash is still the server standard.
9What is a virtual machine (VM) in the context of learning Linux?
- AA simulated internet browser
- BSoftware that emulates a complete computer, letting you run Linux inside your current OS
- CA type of RAM extension used by Linux
- DA remote server accessible only via paid subscription
Reveal Answer
✓ Correct Answer: B
A VM uses software (VirtualBox, VMware, UTM) to simulate complete hardware. You install Linux inside this simulated computer. You can break things and restore from a snapshot — perfect for learning without risk.
10Which major cloud providers offer free-tier Linux virtual machines?
- AOnly AWS
- BOnly AWS and GCP
- CNone — cloud VMs always cost money
- DAWS, GCP, and Azure all offer free-tier Linux VMs
Reveal Answer
✓ Correct Answer: D
AWS (EC2 t2.micro), GCP (e2-micro), and Azure (B1s) all offer free-tier Linux instances — great for practicing real-world Linux in a cloud environment identical to production.
Introduction to Linux & SRE
Before we type a single command, let's understand why Linux and SRE belong together. Knowing the "why" transforms memorizing commands into understanding a craft.
What is Linux?
Linux is an operating system kernel created by Linus Torvalds in 1991 as a university project. An OS kernel is the core software that manages your computer's hardware — CPU, memory, disk, and network. It's the bridge between your programs and your hardware.
Linux powers over 96% of the world's web servers, all Android phones, the world's top 500 supercomputers, and most cloud infrastructure. When you stream a video, send a message, or make an online payment, you're almost certainly interacting with a Linux server.
What is Site Reliability Engineering (SRE)?
SRE is a discipline born at Google in the early 2000s. The idea: treat operations problems like software engineering problems. Instead of manually clicking through dashboards to fix issues, write code to automate them. Instead of hoping a system stays up, engineer it to be reliable from the start.
An SRE's core responsibilities include: keeping systems running (reliability), responding to incidents (on-call), writing automation to reduce toil, and capacity planning to handle growth.
Key SRE Concepts
- SLI (Service Level Indicator): A metric that measures reliability — e.g., "request success rate"
- SLO (Service Level Objective): A target for that metric — e.g., "99.9% of requests must succeed"
- SLA (Service Level Agreement): A legal contract promising that SLO to customers
- Error Budget: The allowed amount of unreliability (1 - SLO). If your SLO is 99.9%, you have 0.1% — about 8.7 hours per year — of allowed downtime.
- Toil: Manual, repetitive operational work that scales with traffic. SREs aim to automate it away.
[DIAGRAM: SRE Responsibility Wheel showing Reliability, Scalability, Automation, Monitoring, Incident Response, and Capacity Planning as equal segments]
Why Linux Specifically?
Linux is free, open source, highly configurable, and runs on everything from Raspberry Pis to mainframes. It's the default OS for Docker containers, Kubernetes nodes, and virtually all cloud VMs. An SRE without Linux skills is like a chef without a knife.
Quiz Section 1 — 10 Questions
11Who created the Linux kernel?
- ABill Gates
- BSteve Jobs
- CLinus Torvalds
- DRichard Stallman
Reveal Answer
✓ Correct Answer: C
Linus Torvalds created the Linux kernel in 1991 as a student project at the University of Helsinki. Richard Stallman created the GNU project (many Linux tools), but the kernel itself is Torvalds' work.
12What does SRE stand for?
- ASystem Reliability Engineer
- BSite Reliability Engineer
- CServer Runtime Environment
- DSoftware Release Engineering
Reveal Answer
✓ Correct Answer: B
SRE = Site Reliability Engineer. The role was pioneered at Google (~2003) and described in the "SRE Book" (freely available at sre.google/books).
13What is the primary goal of Site Reliability Engineering?
- ATo write the most complex software possible
- BTo ensure systems are reliable, scalable, and efficient using software engineering
- CTo replace all operations teams with robots
- DTo avoid using cloud services
Reveal Answer
✓ Correct Answer: B
SRE applies software engineering discipline to operations: automate toil, build reliability in, define SLOs, and use error budgets to balance feature velocity with reliability.
14What is an SLO?
- ASoftware Licensing Order
- BSystem Load Optimizer
- CService Level Objective
- DServer Log Output
Reveal Answer
✓ Correct Answer: C
An SLO (Service Level Objective) is a target for a reliability metric, like "99.9% of HTTP requests succeed" or "p99 latency under 200ms." It's the internal commitment SREs work to maintain.
15Approximately what percentage of the world's web servers run Linux?
- AAround 10%
- BAround 50%
- COver 96%
- DAround 70%
Reveal Answer
✓ Correct Answer: C
Linux powers over 96% of the top 1 million web servers. This dominance is why Linux skills are non-negotiable for SREs who manage production infrastructure.
16What is a daemon in Linux?
- AA type of computer virus
- BA background service process that runs without direct user interaction
- CA system administrator user account
- DA type of hardware firewall
Reveal Answer
✓ Correct Answer: B
Daemons run silently in the background — nginx (web), sshd (SSH), chronyd (time sync), and mysqld (database) are all daemons. By convention, daemon names end in "d".
17What is the role of the Linux kernel?
- ATo provide a graphical desktop environment
- BTo manage hardware resources and provide a standardized interface for software
- CTo browse the internet
- DTo store user files and documents
Reveal Answer
✓ Correct Answer: B
The kernel is the OS core. It manages CPU scheduling, memory allocation, device drivers, and filesystem access. All programs go through the kernel to access hardware — it's the gatekeeper.
18Which of the following is a core SRE responsibility?
- ADesigning marketing campaigns
- BResponding to production incidents and ensuring system availability
- CWriting company HR policies
- DManaging social media presence
Reveal Answer
✓ Correct Answer: B
SRE responsibilities include on-call incident response, monitoring, capacity planning, writing automation, and post-incident reviews (blameless postmortems).
19What is a "runbook" in SRE context?
- AA type of Linux configuration file
- BA book about marathon training
- CA documented set of step-by-step procedures for handling specific operational tasks or incidents
- DA testing framework for shell scripts
Reveal Answer
✓ Correct Answer: C
Runbooks document "what to do when X happens." They allow any on-call engineer to handle known incident types, even without deep expertise in that system.
20What does "uptime" measure on a Linux system?
- AThe time of day when the server is fastest
- BHow long the system has been running continuously since last reboot
- CThe maximum internet connection speed
- DThe number of users currently logged in
Reveal Answer
✓ Correct Answer: B
Run uptime to see how long since last reboot plus load averages. High uptime generally indicates stability. Run uptime -p for a human-readable format.
Linux Filesystem & Navigation
Think of the Linux filesystem as a single giant tree. There's one root (written as /), and everything branches from it — unlike Windows, which has separate trees per drive (C:\, D:\).
The Filesystem Hierarchy Standard (FHS)
Linux follows a standard directory structure so every engineer knows where to find things on any distro:
/— Root; the top of the entire tree/home— User home directories (your personal space)/etc— System configuration files (nginx config, SSH config, etc.)/var— Variable data: logs (/var/log), databases, caches/usr— User binaries and libraries (installed programs)/bin,/sbin— Essential system commands (ls, cp, grep)/tmp— Temporary files, cleared on reboot/proc— Virtual filesystem: live window into the running kernel/dev— Device files (disks, terminals represented as files)
Essential Navigation Commands
# Where am I?
pwd
# /home/alice
# List files (l=long format, a=all including hidden, h=human sizes)
ls -lah
# Change directory
cd /var/log # absolute path
cd ../nginx # relative path (one level up, then into nginx)
cd ~ # go home
cd - # go back to previous directory
# Read files
cat /etc/os-release # dump whole file
head -n 20 app.log # first 20 lines
tail -n 50 app.log # last 50 lines
less /var/log/syslog # page through (q to quit)
# Create and manipulate
mkdir -p /opt/app/logs # create nested dirs
touch config.yaml # create empty file
cp -r src/ dest/ # copy recursively
mv old_name new_name # move / rename
rm -rf /tmp/old_data/ # CAREFUL: recursive force delete
[DIAGRAM: Linux Filesystem Hierarchy Tree — root / branching into /home, /etc, /var, /usr, /bin, /tmp, /proc, /dev with brief purpose labels]
Quiz Section 2 — 10 Questions
21What does the pwd command do?
- AChange the current directory
- BPrint the current working directory path
- CCreate a new password
- DList all files in the current directory
Reveal Answer
✓ Correct Answer: B
pwd stands for "Print Working Directory." It tells you exactly where you are in the filesystem, like a GPS coordinate for your current location in the directory tree.
22What does / (a single forward slash) represent?
- AA division operator in Linux scripts
- BThe home directory of the root user
- CThe root directory — the top-level of the entire filesystem
- DThe current directory
Reveal Answer
✓ Correct Answer: C
/ alone is the root directory — the absolute top of the Linux filesystem. All other paths branch from here. The root user's home is /root (different from /).
23What is the primary purpose of the /etc directory?
- AStore user home directories
- BStore system-wide configuration files
- CStore temporary runtime files
- DStore compiled executable programs
Reveal Answer
✓ Correct Answer: B
/etc is the SRE's most-visited directory. It contains /etc/nginx/, /etc/ssh/sshd_config, /etc/passwd, /etc/fstab, and hundreds of other config files.
24What does ls -la display?
- AOnly hidden files (those starting with a dot)
- BOnly directories in the current path
- CAll files including hidden ones, in long format with permissions, ownership, and size
- DFiles sorted by last access time
Reveal Answer
✓ Correct Answer: C
-l = long format (permissions, link count, owner, group, size, date), -a = all files including those starting with . (hidden). Add -h for human-readable sizes.
25What type of data lives in /var?
- AVariable-length programs
- BUser-specific variable configuration files
- CVariable data that changes frequently: logs, databases, caches, mail spools
- DVirtual machine disk storage
Reveal Answer
✓ Correct Answer: C
/var (variable) holds data that grows and changes: /var/log (logs), /var/lib (databases), /var/cache, /var/spool/mail. SREs watch /var/log and /var disk usage constantly.
26What command shows the first 10 lines of a log file?
- A
tail /var/log/syslog - B
head /var/log/syslog - C
top /var/log/syslog - D
start /var/log/syslog
Reveal Answer
✓ Correct Answer: B
head shows the first N lines (default 10). tail shows the last N lines. For live monitoring of growing logs, use tail -f (follow mode).
27What is the difference between an absolute path and a relative path?
- AAbsolute paths are always longer than relative paths
- BAbsolute paths start from root (
/) and are location-independent; relative paths depend on your current directory - CRelative paths can only be used by root users
- DThere is no practical difference
Reveal Answer
✓ Correct Answer: B
Absolute: /etc/nginx/nginx.conf — works from anywhere. Relative: ../logs/app.log — depends on where you currently are. Scripts should use absolute paths to avoid surprises.
28What does the /proc directory contain?
- AProgramming source code for system tools
- BProcessor-specific binary modules only
- CA virtual filesystem providing a live window into the running kernel and process information
- DPermanent process log archives
Reveal Answer
✓ Correct Answer: C
/proc exists only in memory — it's generated by the kernel on the fly. Try cat /proc/cpuinfo, cat /proc/meminfo, or cat /proc/1/status (info on PID 1).
29What command creates a new directory at /var/logs?
- A
create /var/logs - B
mkdir /var/logs - C
newdir /var/logs - D
touch /var/logs
Reveal Answer
✓ Correct Answer: B
mkdir (make directory). To create nested paths at once, add -p: mkdir -p /var/logs/app/2024. touch creates or updates timestamps on files, not directories.
30What does cd .. do?
- AGoes to the root directory
/ - BGoes to your home directory
- CMoves one directory level up to the parent directory
- DGoes to the
/usrdirectory
Reveal Answer
✓ Correct Answer: C
.. always refers to the parent directory. cd .. goes up one level. cd ../../ goes up two levels. cd ~ or just cd goes to your home directory.
File Permissions & Ownership
Linux was designed from day one as a multi-user system. Permissions are the security layer that determines who can do what with every file and directory. Getting permissions wrong is one of the most common causes of both security breaches and "permission denied" headaches.
Reading Permission Strings
When you run ls -l, you see a permission string like -rwxr-xr--. Here's how to decode it:
- Position 1: File type (
-=file,d=directory,l=symlink) - Positions 2–4: Owner permissions (r=read, w=write, x=execute)
- Positions 5–7: Group permissions
- Positions 8–10: Others (everyone else) permissions
Octal Notation
Permissions can also be expressed as numbers: r=4, w=2, x=1. Sum them for each group: rwx=7, r-x=5, r--=4. So chmod 755 sets owner=rwx, group=r-x, others=r-x.
# Change permissions (numeric)
chmod 755 deploy.sh # rwxr-xr-x (owner all, group/others r+x)
chmod 600 ~/.ssh/id_rsa # rw------- (owner r+w only — required for SSH keys!)
chmod 644 config.yaml # rw-r--r-- (owner r+w, others read-only)
# Change permissions (symbolic)
chmod +x script.sh # add execute for everyone
chmod g-w file.txt # remove write from group
chmod o= secret.conf # remove all permissions from others
# Change ownership
chown alice file.txt # change owner
chown alice:devteam file.txt # change owner AND group
chown -R www-data /var/www/html # recursive chown
# Run as root temporarily
sudo systemctl restart nginx
sudo -i # open a root shell (use sparingly)
[DIAGRAM: Permission string breakdown — annotated "-rwxr-xr--" with color-coded boxes showing file type, owner, group, and others sections, plus octal equivalents]
Quiz Section 3 — 10 Questions
31In the output -rwxr-xr--, who has write permission?
- AOwner, group, and others all have write permission
- BOnly the group has write permission
- COnly the owner has write permission
- DOwner and group both have write permission
Reveal Answer
✓ Correct Answer: C
Reading left to right: -(file) | rwx(owner: read+write+exec) | r-x(group: read+exec only) | r--(others: read only). Only the owner has the w bit.
32What does chmod 755 script.sh set?
- ADeletes the file
- BOwner gets rwx (7), group gets r-x (5), others get r-x (5)
- CChanges ownership to user with UID 755
- DCreates a backup copy with those permissions
Reveal Answer
✓ Correct Answer: B
7 = 4+2+1 = rwx, 5 = 4+0+1 = r-x. Result: -rwxr-xr-x. The owner can do everything; group and others can read and execute (run) but not modify.
33What is the purpose of sudo?
- ASwitch to the default user account
- BSafely delete system files
- CExecute a single command with elevated (root) privileges, with logging
- DCheck disk usage of system directories
Reveal Answer
✓ Correct Answer: C
sudo (SuperUser DO) runs one command as root without permanently switching user. Each use is logged in /var/log/auth.log, creating an audit trail — far safer than logging in as root.
34What does chown alice:devops file.txt do?
- AChanges the permissions of
file.txt - BChanges owner to
aliceand group todevops - CCreates a copy owned by alice in the devops directory
- DDisplays who currently owns the file
Reveal Answer
✓ Correct Answer: B
chown user:group sets both owner and group simultaneously. Use -R to apply recursively to a directory. Common pattern: sudo chown -R www-data:www-data /var/www/
35What special permission makes an executable run with the file owner's privileges rather than the caller's?
- ASticky bit
- BSGID (Set Group ID)
- CSUID (Set User ID)
- DExecute bit
Reveal Answer
✓ Correct Answer: C
SUID is why a regular user can run passwd (owned by root) to change their own password — the program runs with root's privileges to write to /etc/shadow. Look for s in the owner execute bit in ls -l.
36Which file contains user account information in Linux?
- A
/etc/users - B
/etc/passwd - C
/home/users.txt - D
/var/accounts
Reveal Answer
✓ Correct Answer: B
/etc/passwd stores: username:x:UID:GID:comment:home:shell. The x means the actual hashed password is in /etc/shadow (readable only by root). It's world-readable by design.
37What does chmod +x script.sh do?
- ARemoves execute permission from the script
- BAdds execute permission for owner, group, and others
- CMakes the script executable only by root
- DChanges the file extension to
.exe
Reveal Answer
✓ Correct Answer: B
+x adds execute for all three (owner, group, others). Without execute permission, running ./script.sh gives "Permission denied". You must chmod +x every new script.
38What does the sticky bit on a directory like /tmp do?
- AMakes the directory impossible to delete
- BOnly the file owner (or root) can delete or rename their own files within it, even if others have write access to the directory
- CMakes all files inside the directory hidden
- DPrevents new files from being created in the directory
Reveal Answer
✓ Correct Answer: B
The sticky bit (shown as t in ls -ld /tmp) prevents users from deleting each other's files in a shared directory. Without it, anyone with write access to /tmp could delete all files there.
39What is the default umask of 022 and what permissions result on new files?
- A000 — all permissions granted to new files
- B022 — new files get 644 (rw-r--r--) and new directories get 755
- C777 — full permissions given to all new files
- D111 — only execute permissions on new files
Reveal Answer
✓ Correct Answer: B
umask is subtracted from the maximum (666 for files, 777 for dirs). 666 - 022 = 644. 777 - 022 = 755. So files: rw-r--r--, dirs: rwxr-xr-x. Only owner can write; others read-only.
40How do you give the owner read and write access but zero permissions to everyone else?
- A
chmod 644 file - B
chmod 777 file - C
chmod 600 file - D
chmod 400 file
Reveal Answer
✓ Correct Answer: C
6=rw(4+2), 0=nothing. chmod 600 → rw-------. This is required for SSH private keys — SSH refuses keys that are group- or world-readable. 400 is read-only (for owner), no write.
Process Management
Every running program is a process. SREs live and breathe process management — it's how you find what's eating CPU, kill runaway jobs, manage services, and schedule automated tasks.
Key Concepts
- PID (Process ID): A unique number the kernel assigns to each running process
- PPID (Parent PID): Every process has a parent; PID 1 (systemd) is the ancestor of all
- Process States: Running (R), Sleeping (S), Stopped (T), Zombie (Z)
- Signal: A notification sent to a process (SIGTERM=15 asks nicely to quit; SIGKILL=9 forces quit)
- Daemon: A background service process (ends in 'd': sshd, nginx'd, etc.)
The systemd Service Manager
Modern Linux uses systemd as its init system (PID 1). It starts, stops, and monitors all system services. Think of it as the "manager of managers" — it bootstraps everything else at boot time.
# View running processes
ps aux # snapshot: all users, detailed format
top # live view (q=quit, k=kill, P=sort by CPU)
htop # better top (install separately)
# Send signals to processes
kill 1234 # send SIGTERM (15) to PID 1234 — ask nicely
kill -9 1234 # send SIGKILL — force terminate
pkill nginx # kill by process name
# systemd service management
systemctl status nginx
systemctl start nginx
systemctl stop nginx
systemctl restart nginx
systemctl enable nginx # start at boot
systemctl disable nginx # don't start at boot
# Background jobs
./long_task.sh & # run in background
jobs # list background jobs
fg %1 # bring job 1 to foreground
# Cron — schedule tasks
crontab -e # edit your cron jobs
# Format: minute hour day month weekday command
# 0 2 * * * /opt/scripts/backup.sh # run at 2am daily
[DIAGRAM: Process lifecycle state machine — arrows between Created, Running, Sleeping, Stopped, Zombie, and Terminated states with triggering signals labeled]
Quiz Section 4 — 10 Questions
41What command gives a dynamic, real-time view of running processes?
- A
ps aux - B
top - C
list - D
proc --live
Reveal Answer
✓ Correct Answer: B
top refreshes every ~3 seconds by default. ps aux takes a static snapshot. htop is a friendlier, colorful alternative to top with mouse support.
42What does PID stand for?
- AProcess Instruction Data
- BPrimary Interface Device
- CProcess ID
- DProcessor Interface Definition
Reveal Answer
✓ Correct Answer: C
PID (Process ID) uniquely identifies a running process. PID 1 is always systemd (or init on older systems). PIDs are reused after a process dies.
43What does ps aux show?
- AAll running processes with detailed info across all users
- BOnly processes owned by the current user
- COnly kernel/system processes
- DOnly network-related processes
Reveal Answer
✓ Correct Answer: A
a=all users' processes, u=user-friendly format (shows username, %CPU, %MEM), x=include processes without a controlling terminal (daemons). Pipe to grep: ps aux | grep nginx
44What signal does kill -9 <PID> send?
- ASIGTERM (15) — graceful termination request
- BSIGKILL (9) — immediate, forceful termination; cannot be caught
- CSIGHUP (1) — reload configuration
- DSIGSTOP (19) — pause the process
Reveal Answer
✓ Correct Answer: B
SIGKILL is the "nuclear option." It kills the process instantly, bypassing any cleanup handlers. Always try kill (SIGTERM) first to allow graceful shutdown; only escalate to kill -9 if needed.
45What is systemd?
- AA text editor for system configuration
- BThe modern init system (PID 1) and service manager on most Linux distros
- CA disk defragmentation daemon
- DA network configuration tool
Reveal Answer
✓ Correct Answer: B
systemd is PID 1 — the first process started by the kernel. It bootstraps the system, manages services via unit files, handles logging (journald), and manages sockets, timers, and mounts.
46What does systemctl status nginx show?
- AStarts the nginx web server
- BCurrent status, PID, recent logs, and whether nginx is enabled at boot
- CStops nginx immediately
- DShows nginx configuration file contents
Reveal Answer
✓ Correct Answer: B
systemctl status is your first command during any incident involving a service. It shows whether it's active/inactive, shows the recent log tail, and lists the PID and memory usage.
47What is the difference between kill and pkill?
- A
killis more powerful thanpkill - B
killtargets by PID;pkilltargets by process name pattern - C
pkillonly works on root-owned processes - DThey are functionally identical
Reveal Answer
✓ Correct Answer: B
kill 1234 signals PID 1234. pkill nginx signals all processes whose name matches "nginx" — useful when multiple worker processes exist. Use pgrep nginx first to verify matches.
48What does running a command with & at the end do?
- ARuns the command as root
- BRuns the command in the background, returning your shell prompt immediately
- CConnects two commands with a pipe
- DRedirects output to a file
Reveal Answer
✓ Correct Answer: B
Example: ./backup.sh & runs the backup while you continue working. Use jobs to see background jobs, fg %1 to bring job 1 forward, bg %1 to resume a stopped job in background.
49What is a cron job?
- AA type of process signal in Linux
- BA persistent background daemon process
- CA scheduled task that runs automatically at a specified time or interval
- DA type of compressed log file
Reveal Answer
✓ Correct Answer: C
Cron is the Linux task scheduler. SREs use it for automated backups, health checks, database cleanups, and report generation. Edit with crontab -e; list with crontab -l.
50What does the cron expression 0 2 * * * mean?
- AEvery 2 minutes
- BEvery 2 hours
- CAt 2:00 AM every day
- DAt minute 0, twice per day
Reveal Answer
✓ Correct Answer: C
Cron format: minute hour day-of-month month day-of-week. 0 2 * * * = minute 0, hour 2, every day, every month, every weekday = 2:00 AM daily. Memorize: "minute, hour, dom, month, dow".
Networking Essentials
SREs live on the network. Every production incident — slow APIs, failing health checks, cascading timeouts — requires network diagnosis. This section covers the commands you'll run within the first two minutes of any incident.
Key Networking Concepts
- IP Address: Your server's unique network identifier (like a postal address)
- Port: A logical endpoint (0–65535) identifying a specific service: 22=SSH, 80=HTTP, 443=HTTPS, 3306=MySQL
- DNS: Domain Name System — translates
www.google.com→142.250.80.46 - TCP vs UDP: TCP guarantees delivery (web, SSH); UDP is faster but unreliable (video streaming, DNS)
Core Networking Commands
# Show network interfaces and IPs
ip addr show # modern (replaces ifconfig)
ip a # shorthand
# Test connectivity
ping -c 4 google.com # send 4 packets, test reachability
traceroute google.com # show every router hop
mtr google.com # combined ping+traceroute (install separately)
# Check ports and connections
ss -tulpn # TCP/UDP Listening ports with Process names
ss -s # summary statistics
# HTTP/API testing
curl -I https://api.example.com # headers only
curl -s -o /dev/null -w "%{http_code}" https://api.example.com # just status code
wget -q https://example.com/file.zip # download file
# DNS lookup
dig google.com # full DNS query
dig +short google.com # just the IP
nslookup google.com # simpler DNS lookup
# Local DNS override
cat /etc/hosts # static hostname-to-IP mappings
# Firewall (Ubuntu)
sudo ufw status verbose
sudo ufw allow 443/tcp
sudo ufw deny 23/tcp
[DIAGRAM: Network request lifecycle — client → DNS resolver → authoritative DNS → server IP → TCP handshake → HTTP request → response, with port numbers labeled at each stage]
Quiz Section 5 — 10 Questions
51What command shows the IP address of a Linux system's network interfaces?
- A
ipconfig - B
ip addr showorip a - C
network show - D
showip --all
Reveal Answer
✓ Correct Answer: B
ip addr show (or ip a) is the modern command. The older ifconfig is deprecated on many distros. On Windows, the equivalent is ipconfig — don't confuse them.
52What does the ping command test?
- AThe exact download speed of the connection
- BWhether a host is reachable and the round-trip time for ICMP packets
- CThe number of users connected to a server
- DThe HTTP response status of a web server
Reveal Answer
✓ Correct Answer: B
ping sends ICMP echo-request packets and reports round-trip time. Note: some servers block ICMP, so no ping response doesn't always mean the host is down.
53What command shows active network connections and listening ports?
- A
connections --list - B
ports --show - C
ss(or legacynetstat) - D
network --connections
Reveal Answer
✓ Correct Answer: C
ss -tulpn: t=TCP, u=UDP, l=listening, p=show process, n=numeric (no DNS lookup). Essential for "what's listening on port 8080?"
54What is a network port?
- AA physical connector on the server's network card
- BA logical number (0–65535) identifying a specific service or application on a host
- CA type of network cable standard
- DA measure of network bandwidth capacity
Reveal Answer
✓ Correct Answer: B
Ports are like apartment numbers in a building (IP = building address). Key ports to memorize: 22=SSH, 80=HTTP, 443=HTTPS, 3306=MySQL, 5432=PostgreSQL, 6379=Redis, 27017=MongoDB.
55What does curl https://api.example.com/status do?
- ADownloads the response to a file named
status - BOpens the URL in your default web browser
- CMakes an HTTP GET request and prints the response body to stdout
- DOnly tests if the host is online, like
ping
Reveal Answer
✓ Correct Answer: C
curl is essential for API testing. Key flags: -X POST (method), -H "Content-Type: application/json" (header), -d '{"key":"val"}' (body), -s (silent), -w "%{http_code}" (print status code).
56What is DNS?
- ADynamic Network Security
- BDomain Name System — translates human-readable domain names to IP addresses
- CData Network Service
- DDistributed Node Structure
Reveal Answer
✓ Correct Answer: B
DNS is the internet's phone book. DNS failures cause widespread outages (even if your servers are perfectly healthy). Always check DNS first when services seem "down" but ping works by IP.
57What does traceroute google.com reveal?
- AGoogle's physical server location
- BEach network hop (router) packets travel through, with latency per hop
- CGoogle's server configuration and software versions
- DAvailable bandwidth to Google's servers
Reveal Answer
✓ Correct Answer: B
traceroute shows every router between you and the destination. If you see high latency at hop 8 but not hop 7, the problem is between those two routers. *** means a router blocks ICMP.
58What is /etc/hosts used for?
- AListing all hosts currently connected to the local network
- BStatic hostname-to-IP mappings that take priority over DNS
- CStoring SSH host fingerprints for known servers
- DConfiguring which network interface to use by default
Reveal Answer
✓ Correct Answer: B
/etc/hosts is checked before DNS. Add 127.0.0.1 myapp.local to resolve myapp.local locally without touching DNS. Also useful for blocking domains by mapping them to 0.0.0.0.
59What does wget https://example.com/script.sh do?
- AChecks if the file exists without downloading it
- BDownloads the file to the current directory
- COpens the file in a browser
- DStreams the file content without saving
Reveal Answer
✓ Correct Answer: B
wget downloads files. It retries on failure, supports resuming interrupted downloads (-c), and can mirror entire websites (-r). Use -O filename to specify output filename.
60What does UFW stand for and what does it do?
- AUniversal File Watcher — monitors file system changes
- BUncomplicated Firewall — a user-friendly front-end to iptables for managing network firewall rules
- CUnix File Wrapper — for packaging and compressing files
- DUpdate Firmware Wizard — for hardware upgrades
Reveal Answer
✓ Correct Answer: B
UFW wraps iptables (which is complex). sudo ufw allow 22/tcp, sudo ufw deny 3306, sudo ufw enable. On RHEL/CentOS, use firewalld instead.
System Monitoring & Performance
Performance problems are stealth attackers — they sneak up slowly, then cause sudden outages. SREs monitor systems continuously and investigate the "Four Golden Signals": latency, traffic, errors, and saturation. Linux gives you powerful CLI tools to observe all of these.
Understanding Load Average
Load average is the average number of processes waiting for CPU over the last 1, 5, and 15 minutes. Run uptime to see it. Rule of thumb: if load average exceeds your number of CPU cores, the system is overloaded.
The Four Key Resources to Monitor
- CPU:
top,mpstat,vmstat - Memory:
free -h,top,/proc/meminfo - Disk I/O:
iostat,df -h,du -sh - Network:
ss,iftop,nethogs
# CPU and load
uptime # load averages: 1min 5min 15min
nproc # number of CPU cores
vmstat 2 5 # 5 snapshots, every 2 seconds
mpstat -P ALL 1 # per-CPU statistics
# Memory
free -h # RAM and swap usage
cat /proc/meminfo # detailed memory info
# Disk space and usage
df -h # disk space on all filesystems
df -ih # inode usage (can fill up separately from space!)
du -sh /var/log/* # size of each item in /var/log
du -sh /* 2>/dev/null # find what's filling root filesystem
# Disk I/O
iostat -xz 2 # extended disk stats every 2s
# Kernel messages (hardware issues, OOM killer)
dmesg -T # kernel ring buffer with timestamps
dmesg -T | grep -i oom # find OOM killer events
SRE Pro Tip: Disk Full = Instant Outage
Set up alerts when / or /var exceeds 80% full. A full disk stops applications from writing logs, accepting connections, or even running. Many outages trace back to a forgotten log file filling the disk.
[DIAGRAM: SRE monitoring dashboard showing CPU%, memory usage bar, disk usage bar, load average trend chart, and network traffic — with alert thresholds marked in red]
Quiz Section 6 — 10 Questions
61What does "load average" represent in Linux?
- AAverage number of users logged in over time
- BAverage number of processes waiting for CPU over 1, 5, and 15 minutes
- CAverage CPU clock frequency
- DAverage network traffic in MB/s
Reveal Answer
✓ Correct Answer: B
Load average includes processes actively running AND waiting for CPU. On a 4-core server, load average of 4.0 = 100% utilization. >4 means processes are queueing. uptime shows: "load average: 0.52, 0.58, 0.59"
62What command shows disk space usage of mounted filesystems?
- A
du -h - B
disk --show - C
df -h - D
space --all
Reveal Answer
✓ Correct Answer: C
df = disk free. -h = human-readable. Shows each mounted partition with total, used, available, and use%. Run this first in any disk-related incident. du is for finding which files/dirs are consuming space.
63What does du -sh /var/log show?
- AThe number of files in
/var/log - BThe total disk space consumed by the
/var/logdirectory - CThe last modification date of
/var/log - DIndividual size of each log file
Reveal Answer
✓ Correct Answer: B
du = disk usage. -s = summarize (show one total, not each subdir), -h = human readable. Remove -s to see each subdirectory: du -h /var/log | sort -rh | head -20 to find the biggest logs.
64What does free -h show?
- AFree disk space on all partitions
- BFree CPU cycles available for new processes
- CTotal, used, and available RAM and swap memory
- DFree inodes remaining on the filesystem
Reveal Answer
✓ Correct Answer: C
free shows RAM and swap. The "available" column is more useful than "free" — it includes reclaimable cache. Low available memory + high swap usage = system under memory pressure.
65What is swap space in Linux?
- AA feature to hot-swap between Linux distributions without rebooting
- BDisk space used as overflow memory when RAM is full; much slower than RAM
- CA type of high-speed filesystem format
- DA method for switching between user sessions
Reveal Answer
✓ Correct Answer: B
Swap prevents immediate OOM kills when RAM fills up, but it's 100–1000x slower than RAM. High swap usage causes "swapping thrashing" — the system slows dramatically. Add RAM or fix memory leaks instead of relying on swap.
66What does OOM stand for and what happens when Linux triggers it?
- AOut Of Memory — the kernel selects and kills a process to free RAM
- BOrder Of Magnitude — a cloud cost analysis metric
- COverhead Of Microservices — a performance tuning term
- DOut Of Modules — a kernel panic indicator
Reveal Answer
✓ Correct Answer: A
The OOM killer selects the "worst" process (highest memory, lowest priority) to sacrifice. Evidence appears in dmesg: "Out of memory: Killed process 1234 (java)". This means your app is leaking memory or under-provisioned.
67Which commands provide per-2-second CPU and I/O statistics?
- A
top -d 2only - B
vmstat 2only - C
iostat 2only - DBoth
vmstat 2andiostat 2are useful for this
Reveal Answer
✓ Correct Answer: D
vmstat 2 shows CPU (us/sy/id/wa), memory, and swap activity. iostat -xz 2 focuses on disk device utilization. Both print headers then repeated rows — the first row is always since-boot averages; ignore it.
68What does a high iowait percentage in CPU stats indicate?
- AToo many processes competing for CPU time
- BCPU is idle waiting for I/O (disk read/write) operations to complete
- CA network connectivity problem is present
- DRAM is about to be exhausted
Reveal Answer
✓ Correct Answer: B
High iowait (>10-20%) means your disk is a bottleneck. The CPU has work to do but keeps waiting for slow disk operations. Solutions: use SSDs, optimize database queries, add caching, or increase disk I/O capacity.
69What command watches a log file for new content in real time?
- A
watch file.log - B
tail -f file.log - C
live file.log - D
stream --follow file.log
Reveal Answer
✓ Correct Answer: B
tail -f (follow) prints new lines as they are appended. tail -F also handles log rotation (reopens the file by name). Combine with grep: tail -f app.log | grep ERROR
70What is the purpose of dmesg?
- ADisplay disk S.M.A.R.T. health data
- BShow the kernel ring buffer: hardware detection, driver errors, OOM events since boot
- CDisplay memory diagnostics and DIMM health
- DShow systemd service daemon messages only
Reveal Answer
✓ Correct Answer: B
dmesg -T (with timestamps) is essential for diagnosing hardware failures, network card resets, disk errors, and OOM kills. Use dmesg -T | grep -E "error|fail|warn" -i to filter noise.
Log Management & Analysis
Logs are the black box flight recorder of your systems. When something breaks at 3 AM, logs tell you what happened, when, and why. SREs spend significant time reading, parsing, and alerting on logs. Mastering log analysis tools is not optional — it's survival.
Key Log Files
/var/log/syslog(Debian/Ubuntu) or/var/log/messages(RHEL/CentOS) — general system messages/var/log/auth.log— authentication: SSH logins, sudo usage, failed attempts/var/log/kern.log— kernel messages/var/log/nginx/access.loganderror.log— web server logs
The Holy Trinity of Log Analysis: grep, awk, sed
# grep — search/filter lines
grep "ERROR" app.log
grep -i "error" app.log # case insensitive
grep -c "ERROR" app.log # count matching lines
grep -v "DEBUG" app.log # exclude DEBUG lines
grep -r "password" /etc/ # recursive search
# awk — column/field processing
awk '{print $1}' access.log # print first column (IP addresses)
awk '{print $9}' access.log # print HTTP status codes
awk '$9 == "500"' access.log # filter lines where col 9 = 500
# sed — stream text transformations
sed 's/ERROR/CRITICAL/g' app.log # replace all occurrences (stdout only)
sed -i 's/old/new/g' config.txt # in-place edit (modify the file!)
# Combining tools with pipes
tail -n 1000 access.log | grep "500" | awk '{print $1}' | sort | uniq -c | sort -rn
# → "which IPs caused the most 500 errors in the last 1000 requests?"
# journalctl (systemd journal)
journalctl -u nginx --since "1 hour ago"
journalctl -p err -n 50 # last 50 error-level entries
journalctl --since "2024-01-15 14:00" --until "2024-01-15 15:00"
[DIAGRAM: Linux log pipeline — app writes to /var/log → logrotate compresses old logs → centralized log aggregation (ELK/Splunk/Loki) → alert rules → PagerDuty notification]
Quiz Section 7 — 10 Questions
71Where are system log files typically stored in Linux?
- A
/home/logs - B
/var/log - C
/etc/logs - D
/tmp/logs
Reveal Answer
✓ Correct Answer: B
/var/log is the FHS-mandated log directory. Learn this by heart: ls /var/log should be the second thing you run (after systemctl status) during an incident.
72What does journalctl -u nginx --since "1 hour ago" do?
- AStarts nginx and captures its output for 1 hour
- BShows nginx service logs from the past hour
- CRestarts nginx every hour automatically
- DChecks if nginx has been running for exactly 1 hour
Reveal Answer
✓ Correct Answer: B
journalctl queries systemd's binary journal. -u=unit (service), --since/--until filter by time. Add -f to follow live, -e to jump to end, -x for extra explanations.
73What does grep "ERROR" /var/log/app.log do?
- AReplaces all "ERROR" text with an empty string
- BCounts the number of lines containing "ERROR"
- CPrints all lines that contain the word "ERROR"
- DDeletes lines containing "ERROR" from the file
Reveal Answer
✓ Correct Answer: C
grep = Global Regular Expression Print. It filters, not modifies. Common flags: -i (case-insensitive), -c (count), -v (invert/exclude), -n (line numbers), -A 3 (3 lines after match).
74What does grep -r "password" /etc/ do?
- AFinds files named "password" in
/etc - BRecursively searches all files in
/etcfor lines containing "password" - CCreates a file named "password" in
/etc - DChanges all passwords found in
/etc
Reveal Answer
✓ Correct Answer: B
grep -r (recursive) walks the entire directory tree searching every file. Useful for finding hardcoded credentials in config files — a common security audit step. Add -l to show only filenames.
75What does tail -n 100 /var/log/syslog | grep ERROR do?
- AGets the last 100 lines from syslog, then shows only lines containing "ERROR"
- BCounts exactly 100 errors in syslog
- CDeletes the last 100 lines containing errors
- DWatches syslog until it finds 100 new error messages
Reveal Answer
✓ Correct Answer: A
The pipe (|) passes stdout of one command as stdin to the next. This is the fundamental Unix philosophy: compose small single-purpose tools into powerful pipelines. Master pipes and you master Linux.
76What is log rotation?
- AViewing logs in ascending vs descending order
- BAutomatically archiving, compressing, and deleting old log files to prevent disk full
- CSwitching which log file is written to every minute
- DEncrypting log files for security compliance
Reveal Answer
✓ Correct Answer: B
logrotate is configured in /etc/logrotate.conf and /etc/logrotate.d/. It renames app.log → app.log.1, compresses to app.log.1.gz, and deletes after N rotations. Without it, logs fill the disk.
77What does awk '{print $1}' access.log do?
- APrints the first character of each line
- BPrints the first whitespace-separated field (column) of each line
- CCounts total lines in access.log
- DSearches for the literal pattern "$1" in access.log
Reveal Answer
✓ Correct Answer: B
In nginx access logs, $1=client IP, $7=request path, $9=HTTP status code. awk '{print $9}' access.log | sort | uniq -c | sort -rn counts status codes — instant traffic analysis.
78What does wc -l /var/log/nginx/access.log do?
- AShows the file size in words
- BCounts the number of lines in the log file
- CCounts the number of words in the log file
- DCompresses the log file using LZ4
Reveal Answer
✓ Correct Answer: B
wc = word count. -l=lines, -w=words, -c=bytes. Since each log entry is one line, wc -l tells you total request count. grep "500" access.log | wc -l counts 500 errors.
79What does sed 's/ERROR/CRITICAL/g' app.log do?
- ASearches for files named ERROR or CRITICAL
- BOutputs a modified version with ERROR replaced by CRITICAL (non-destructive by default)
- CDeletes all lines containing ERROR or CRITICAL
- DCreates a backup of app.log called ERROR.log
Reveal Answer
✓ Correct Answer: B
sed (stream editor) processes to stdout without modifying the file. The s/from/to/g syntax is a substitute command (g=global, all occurrences). Add -i flag to edit the actual file in place.
80What is the purpose of /var/log/auth.log?
- AStores application authentication tokens and API keys
- BRecords all authentication events: SSH logins, sudo usage, failed login attempts
- CContains a list of authorized system administrators
- DStores PAM module configuration for authentication
Reveal Answer
✓ Correct Answer: B
Monitor auth.log for brute-force attempts: grep "Failed password" /var/log/auth.log | awk '{print $11}' | sort | uniq -c | sort -rn shows top attacking IPs. On RHEL/CentOS, this file is /var/log/secure.
Shell Scripting for SRE
Shell scripting transforms you from someone who types commands to someone who builds tools. It's how SREs automate runbooks, build health checks, create deployment pipelines, and eliminate toil. A single well-written script can save hours every week, forever.
Script Anatomy
Every good bash script starts with the same three lines:
#!/usr/bin/env bash
# ↑ Shebang: tells the OS to run this with bash
set -euo pipefail
# ↑ Safety: -e exit on error, -u error on unset vars, -o pipefail propagate pipe failures
# Variables
SERVER="web-prod-01"
MAX_RETRIES=3
# Command substitution
DATE=$(date +%Y%m%d)
UPTIME=$(uptime -p)
# Conditionals
if [ -f "/etc/nginx/nginx.conf" ]; then
echo "nginx config exists"
elif [ -d "/etc/apache2" ]; then
echo "apache found instead"
else
echo "no web server config found"
fi
# For loop
for server in web1 web2 web3; do
echo "Checking $server..."
ping -c 1 "$server" &> /dev/null && echo " OK" || echo " UNREACHABLE"
done
# Functions
check_service() {
local svc="$1"
systemctl is-active --quiet "$svc" && echo "$svc: running" || echo "$svc: STOPPED"
}
check_service nginx
check_service postgresql
# Error handling with trap
cleanup() { echo "Cleaning up..."; rm -f /tmp/my_lockfile; }
trap cleanup EXIT # always run on exit
trap "echo 'Interrupted!'; exit 1" INT # handle Ctrl+C
# Redirects
./script.sh > output.log 2>&1 # stdout + stderr to file
./script.sh >> output.log 2>&1 # append
./noisy_cmd.sh 2>/dev/null # discard stderr
[DIAGRAM: Shell script control flow — boxes showing: shebang → set -euo → variable declarations → function definitions → main logic (if/for/while) → cleanup trap, with arrows showing error exit paths]
Quiz Section 8 — 10 Questions
81What is a shebang line in a shell script?
- AA comment block explaining the script's purpose
- BThe first line (
#!/bin/bash) that tells the OS which interpreter to use - CA special
chmod +xembedded in the script - DA line that imports external libraries into the script
Reveal Answer
✓ Correct Answer: B
The shebang #! on line 1 tells the kernel what interpreter to use. #!/bin/bash uses bash. #!/usr/bin/env python3 uses whatever python3 is in $PATH. Without it, the script uses your current shell — unreliable.
82What does $? represent in a shell script?
- AA variable containing the script's filename
- BThe exit code (return status) of the last executed command
- CA wildcard matching any single character
- DThe process ID of the current shell
Reveal Answer
✓ Correct Answer: B
$? is 0 for success, non-zero for failure. Check it after critical operations: if [ $? -ne 0 ]; then echo "Failed!"; fi. With set -e, non-zero exits automatically stop the script.
83What does set -e do at the top of a shell script?
- AEnables debug mode, printing each command before execution
- BCauses the script to exit immediately if any command returns a non-zero exit code
- CSets the script to run with elevated root privileges
- DEnables extended glob patterns for filename matching
Reveal Answer
✓ Correct Answer: B
set -e (errexit) is "fail fast." Without it, a failed command is silently ignored and the script continues — potentially causing catastrophic cascading failures. Always include it. Debug mode is set -x.
84What does set -u do in a bash script?
- AUpdates all variables to their most recent values
- BTreats referencing an unset variable as a fatal error and exits
- CMakes the script run as a unique non-duplicate process
- DSets the user context under which the script runs
Reveal Answer
✓ Correct Answer: B
set -u (nounset) catches typos in variable names. Without it, $DIRECORY (typo) silently expands to empty string — potentially running rm -rf / instead of rm -rf /$DIRECTORY. A career-saving option.
85What is the correct bash syntax to check if a file exists?
- A
if file_exists "/path/to/file"; then - B
if [ -f "/path/to/file" ]; then - C
if exists("/path/to/file"); then - D
if "/path/to/file" != null; then
Reveal Answer
✓ Correct Answer: B
Test operators: -f=regular file exists, -d=directory, -e=any type, -r=readable, -w=writable, -x=executable, -s=non-empty file. Always quote variables inside [ ].
86What does 2>&1 mean in shell scripting?
- ARedirect output to two files simultaneously
- BRedirect stderr (fd 2) to wherever stdout (fd 1) is currently going
- CMultiply output by 2 before writing
- DCreate two separate output streams
Reveal Answer
✓ Correct Answer: B
File descriptors: 0=stdin, 1=stdout, 2=stderr. cmd > file.log 2>&1 captures both stdout and stderr to file.log. Order matters: stdout redirect first, then merge stderr into it. &> file.log is a bash shorthand.
87In for server in web1 web2 web3; do, how do you reference the current value?
- A
$server - B
{server} - C
$(server) - D
@server
Reveal Answer
✓ Correct Answer: A
All bash variables are referenced with $. Use ${server} when adjacent to other text (e.g., ${server}_backup). Quote variables in double quotes to handle spaces: "$server".
88What does trap do in shell scripting?
- ACatches and handles specific signals or exit events, running cleanup code
- BPrevents users from terminating the script with Ctrl+C
- CTraps all output to a log file automatically
- DCreates a debugging breakpoint in the script
Reveal Answer
✓ Correct Answer: A
trap "cleanup" EXIT ensures cleanup always runs, even on error exit. Use trap "cleanup" ERR to run only on error. Patterns: delete temp files, release locks, send failure notifications.
89What is the purpose of $(command) in bash?
- ACreates an independent subshell
- BCommand substitution — executes the command and uses its output as a value
- CTests if the command exists before running it
- DRuns the command in the background
Reveal Answer
✓ Correct Answer: B
Command substitution: TODAY=$(date +%Y%m%d) stores today's date. HOSTNAME=$(hostname) stores server name. Backticks `` `cmd` `` do the same but don't nest easily — prefer $().
90Which combination of set options provides the safest bash script settings?
- A
set -eonly - B
set -eu - C
set -euo pipefail - D
set -xonly
Reveal Answer
✓ Correct Answer: C
set -euo pipefail is bash strict mode: -e=exit on error, -u=error on unset variables, -o pipefail=pipeline fails if any stage fails (without this, false | true succeeds). The gold standard for production scripts.
System Administration & Security
This final section covers the operational duties that SREs perform regularly: managing users and access, keeping systems patched, securing SSH, scheduling tasks, and knowing the warning signs of a compromised system.
User Management
Linux is a multi-user system. Every service should run as its own dedicated user, not as root. Follow the principle of least privilege: give accounts only the permissions they need.
SSH Security — Your Primary Attack Surface
SSH is almost always exposed to the internet. Every misconfigured SSH server is an invitation. Use key-based authentication, disable password auth, and use non-standard ports only as an obscurity measure (not security).
Package Management
Keep systems patched. Unpatched vulnerabilities are the primary vector for server compromises. Set up unattended security updates on all servers.
# User management
useradd -m -s /bin/bash -G sudo alice # create user with home, bash, sudo group
passwd alice # set password
usermod -aG docker alice # add to docker group
userdel -r alice # delete user and home dir
id alice # show UID, GID, groups
# sudoers — always use visudo to edit!
visudo
# alice ALL=(ALL:ALL) NOPASSWD: /bin/systemctl restart nginx
# SSH key-based authentication setup
ssh-keygen -t ed25519 -C "alice@company.com" # generate key pair
ssh-copy-id alice@server.example.com # copy public key to server
ssh -i ~/.ssh/id_ed25519 alice@server # connect with specific key
# Harden SSH (/etc/ssh/sshd_config)
# PasswordAuthentication no
# PermitRootLogin no
# PubkeyAuthentication yes
# Package management (Debian/Ubuntu)
apt update && apt upgrade -y
apt install -y nginx
apt autoremove -y
# Package management (RHEL/CentOS/Fedora)
dnf update -y
dnf install -y nginx
# Secure file transfer
scp -i ~/.ssh/id_ed25519 file.txt user@server:/tmp/
rsync -avz --progress /local/dir/ user@server:/remote/dir/
Security Red Flags — Know These
On a compromised server you may see: unexpected user accounts in /etc/passwd, unfamiliar processes in ps aux, new cron jobs in crontab -l, modified system binaries, unusual outbound connections in ss -tulpn, or thousands of failed login attempts in /var/log/auth.log.
[DIAGRAM: SSH key-based authentication flow — client generates key pair → public key uploaded to server's authorized_keys → SSH connection with private key → server verifies match → authenticated session granted]
Quiz Section 9 — 10 Questions
91What does useradd -m -s /bin/bash john do?
- AModifies user john's existing shell
- BCreates a new user "john" with a home directory and bash as their login shell
- CAdds john to the sudo group
- DCreates a service account named john with no shell access
Reveal Answer
✓ Correct Answer: B
-m creates /home/john, -s /bin/bash sets bash as login shell. Without -m, no home directory is created. For service accounts (no login), use -s /sbin/nologin.
92What is the safest way to edit the sudoers file?
- A
nano /etc/sudoers - B
vim /etc/sudoers - C
visudo - D
cat > /etc/sudoers
Reveal Answer
✓ Correct Answer: C
visudo locks the file, validates syntax before saving. A syntax error in sudoers can lock out ALL sudo access system-wide, potentially requiring single-user mode recovery. Never edit sudoers directly.
93What does ssh-keygen -t ed25519 generate?
- AAn SSL/TLS certificate for a web server
- BAn SSH key pair (public + private) using the modern Ed25519 algorithm
- CA new SSH server configuration
- DAn encrypted disk partition
Reveal Answer
✓ Correct Answer: B
Creates ~/.ssh/id_ed25519 (private — keep secret!) and ~/.ssh/id_ed25519.pub (public — share freely). Ed25519 is faster and more secure than RSA-2048. Set a strong passphrase on the private key.
94What is the purpose of ~/.ssh/authorized_keys?
- AA list of SSH servers this user is allowed to connect to
- BStores public keys of users allowed to log into this account via SSH key auth
- CEncrypted password storage for SSH
- DSSH connection history and access logs
Reveal Answer
✓ Correct Answer: B
During SSH auth, the server checks if your private key matches any public key in authorized_keys. The file must be owned by the user with permissions 600 or 640. One public key per line.
95What does apt update && apt upgrade -y do on Ubuntu?
- ADownloads and installs only the latest kernel
- BRefreshes the package list from repos, then upgrades all installed packages non-interactively
- COnly applies security patches, skipping feature updates
- DUpgrades Ubuntu to the next major version
Reveal Answer
✓ Correct Answer: B
apt update refreshes the local package index. apt upgrade installs newer versions of installed packages. -y auto-confirms. For major version upgrades, use do-release-upgrade.
96What does scp user@remote:/path/file.txt . do?
- ACreates a symbolic link from the remote file to the current directory
- BSecurely copies a file from the remote server to the current local directory
- CChecks if a file exists on the remote server without copying
- DSyncs the current directory to the remote server
Reveal Answer
✓ Correct Answer: B
scp (Secure Copy Protocol) uses SSH for encrypted file transfer. . means current directory. For syncing directories with delta updates, prefer rsync -avz — it only transfers changed files.
97What is the principle of least privilege?
- AGranting users maximum permissions to ensure they can always complete work
- BGiving users and processes only the minimum permissions necessary for their tasks
- CAllowing only the root user to perform any action on the system
- DEnsuring all users have equal permissions
Reveal Answer
✓ Correct Answer: B
Least privilege limits blast radius when an account is compromised. If a web server process can only read /var/www/html, a successful attack can't read /etc/shadow or other sensitive files. Apply to users, services, and API keys.
98What does crontab -e do?
- AExecutes all cron jobs immediately
- BOpens the current user's crontab file for editing
- CLists all system-wide cron jobs from all users
- DDeletes all scheduled cron jobs immediately
Reveal Answer
✓ Correct Answer: B
crontab -e opens your personal cron table in $EDITOR. crontab -l lists current jobs, crontab -r removes all (careful!). System-wide jobs live in /etc/cron.d/ and /etc/crontab.
99What is SSH port forwarding (tunneling) used for?
- AMaking SSH connections faster by pre-allocating bandwidth
- BSecurely accessing services on remote networks through an encrypted SSH tunnel
- CAutomatically forwarding email attachments over SSH
- DCreating automatic backup SSH connection pools
Reveal Answer
✓ Correct Answer: B
ssh -L 5432:db-internal:5432 user@bastion tunnels the remote DB port to localhost:5432. Lets you use local tools (pgAdmin, MySQL Workbench) against internal databases without exposing them to the internet.
100Which of the following is a key sign of a potentially compromised Linux server?
- ADisk space being used by log files
- BLegitimate users logged in during business hours
- CUnexpected new user accounts, unfamiliar processes, or suspicious auth.log entries like mass failed SSH logins from unknown IPs
- DThe server having a high uptime value
Reveal Answer
✓ Correct Answer: C
Red flags: unknown accounts in /etc/passwd, strange processes in ps aux, new cron jobs, unexpected outbound connections in ss, modified system binaries, or thousands of failed SSH attempts in auth.log. Set up Fail2Ban and AIDE for automated detection.
You've completed all 100 questions!
You now have a solid foundation in Linux for SRE work. Here's your roadmap for what to learn next:
🐳 Containers
Docker, Kubernetes — the modern deployment platform for SREs
📊 Observability
Prometheus, Grafana, ELK Stack — metrics, logs, and traces
🏗️ Infrastructure as Code
Terraform, Ansible, Puppet — automate everything at scale