In chasing ELK - I found a little RAM

Creating a blue team SOC lab is an interesting exercise – conceptually, you believe you have things figured out. You start thinking you can put together a lab based on all the industry standard names: Elasticsearch, Logstash, Kibana (ELK) — all this for centralised logging and search. You think about Shuffle and how you can use that for security orchestration and automated response. Your case management, surely that will be handled by TheHive. These are the names I was familiar with and exposed to — Reddit threads, job posting boards and peers — and so, I wrote them down.

Here is what the original stack looked like on paper:

Tool	Purpose	RAM Requirement
VirtualBox	Virtualisation (Type 2 hypervisor)	~2–4 GB (host OS overhead)
OPNsense	Firewall	~2 GB
Elasticsearch	Log indexing and search	4–8 GB
Logstash	Log aggregation and parsing	1–2 GB
Kibana	Dashboard and GUI	1–2 GB
Shuffle	SOAR — automated response	4–8 GB
TheHive	Case management	4–6 GB
Total		~18–32 GB

The reality? The machine I was deploying to only has 16 GB of RAM — it's an old laptop that needed to find a new lease of life.

The maths is clear. Even at its lowest specification, my laptop could not act as a server for this blue team stack — it wants more than the hardware can provide. Factor in Elasticsearch actually indexing events and Shuffle running workflows, and 100% RAM utilisation becomes realistic. In speaking to some friends, it became clear that this would be a liability — running a lab that is constantly memory-swapping is a broken security tool. You cannot trust a system that is quietly dying.

I decided that I would have to revisit those tools in the future and reassess my options. At the core of it, I want an always-on environment that is stable. I had to go back to the drawing board.

Proxmox over alternatives

Knowing the bloat that comes from installing an OS, I found myself exploring Type 1 hypervisors — this was crucial as it runs the base of the stack.

A hypervisor is a software layer that lets one physical machine run multiple isolated operating systems simultaneously. There are two categories and the difference matters for a lab that is meant to run continuously.

A Type 2 hypervisor sits on top of an existing operating system — think along the lines of VirtualBox. You would normally have a Windows or Linux installation on your machine and then install it on top of that, and then install your respective software inside it. This approach is completely accepted and makes a lot of sense if you are spinning up testing environments or test-driving a new OS — anything for occasional use. But the problem comes down to the layering. The host OS is consuming RAM and CPU cycles constantly, even when you are not using it. On a 16 GB machine that is meant to be running a firewall, SIEM, SOAR and a visualisation layer at the same time, that host OS overhead is expensive.

A Type 1 hypervisor runs directly on the hardware with no operating system underneath it — it is the OS. Every resource that the machine has is available to the hypervisor and the system it manages, with nothing wasted on a general-purpose OS layer that you would barely interact with.

Proxmox is a Type 1 hypervisor built on Debian Linux. It runs on the bare metal, provides a web-based management interface accessible from any browser on the same network, and supports both virtual machines and Linux containers in the same interface.

It becomes easy to see the winner when it comes to constrained hardware.

There are other specific considerations in choosing Proxmox over something like ESXi, and it mainly comes down to two things: Proxmox came as a recommendation and I did not want to pay for anything at this point.

Proxmox is open source, free to use, and the community repository provides full package access without a subscription key. As I am starting out, I am pleased that the web UI has everything I need, and it comes with LXC container support. This is important as Linux containers share the same kernel as Proxmox, reducing the RAM overhead. I also found that there is a large, active home-lab community which is useful when you are developing these instances on a weekend or in the evenings — having a knowledge base is really helpful.

Why OPNsense over pfSense

There is no great complexity to this choice — it is what I discovered first. That said, it is worth noting that pfSense seems to be the go-to for a number of people, but when I started researching the differences for this blog post, I found some interesting considerations that I did not know at the time.

One argument is that OPNsense releases on a predictable schedule — about every two months, with clear version numbers and public changelogs. pfSense, on the other hand, has apparently increasingly blurred the line between its community edition and its commercial product, and the release cycle has slowed down because of this. Based on recommendation, it seems that if one is developing a home lab, an actively maintained codebase with transparent releases is the better call.

Having never used pfSense, I believe OPNsense's web UI is cleaner. In my configuration phase, I found the interface mostly clean and had no misunderstandings of what I was looking at — even if the terminology eluded me at the time. There is an argument to be made that when learning to configure firewalls, NAT entries and VLAN assignments, the quality of an interface has a real effect on your learning capacity. I chalk this down to good design.

As an aside, I did read that when using OPNsense, it is best to use Zenarmor for traffic visualisation and Suricata for IDS/IPS — apparently they integrate better into the architecture of OPNsense. I will have to report back when I am using those tools actively.

RAM(p)ing it up

The original stack needed a minimum of 18 GB of RAM to run at idle. Already I have broken beyond the threshold of 16 GB. Realistically, it is worse than that. Elasticsearch is a Java application and its documentation recommends allocating no more than half of the machine's RAM to its JVM heap — meaning I would have had to give Elasticsearch 8 GB of working memory, requiring all 16 GB of my machine's RAM to exist solely for Elasticsearch. Logstash adds another 1–2 GB and Kibana another 1–2 GB. The ELK stack, when properly configured, wants 10–12 GB on its own.

This is unfeasible, as it only leaves 4–6 GB for OPNsense, Shuffle, TheHive, and the hypervisor itself — preventing any meaningful headroom required to actually run the system under load.

The revised stack

Here is what the pivot looked like on paper:

Function	Original Tool	Revised Tool	RAM Saved
Firewall	pfSense CE	OPNsense 26.1	—
SIEM + Indexer	ELK Stack (3 separate services)	Wazuh All-in-One	~4 GB
SOAR	Shuffle (Docker swarm, multi-container)	n8n (single container)	~2–3 GB
Log visualisation	Kibana (standalone)	Grafana + Loki	~1 GB

Welcome to Wazuh

I only heard about Wazuh from a friend before I started exploring building my own home lab. It was a tool they used frequently and it was the first option I thought of when I needed to rethink my approach.

Wazuh is a security platform built specifically for the monitoring use case that ELK is typically configured for in a SOC context. Its indexer is a fork of OpenSearch — a purpose-built tool for security event data. Running Wazuh All-in-One via the official install script deploys the manager, indexer and dashboard as a single unified service — not three separate components that each need their own memory allocation.

The result is that this stack delivers equivalent detection and alerting capability — log collection, anomaly detection, rule-based alerting, compliance dashboards — at about half the RAM cost. The drawback is that it does not provide the full flexibility of a raw ELK deployment where you are building every index pattern and dashboard from scratch. But I think that is acceptable right now — as part of Phase 1, I want to spend more time building accurate detection logic rather than maintaining configuration.

The revised stack — explained

OPNsense — security desk at the entrance

Every packet entering or leaving the lab passes through OPNsense first. It decides what is allowed, what is blocked, and what gets logged. It is the digital boundary between the home network and the isolated lab environment. Nothing reaches the internal lab without OPNsense permitting it.

Wazuh — the CCTV system

Once traffic is inside the lab, Wazuh watches it. It collects logs from every device and container in the environment, runs these logs against detection rules, and raises alerts when something looks wrong. Where OPNsense controls the perimeter, Wazuh is the internal monitor.

n8n — the guard who acts on the alert

Wazuh raises an alert and n8n decides what to do about it — send a notification, create a ticket, trigger an automated response. Fundamentally, n8n is a workflow automation tool; it is not a security product by design, but it handles SOAR logic well. Structurally, n8n deploys as a single lightweight container and does not require a Docker swarm to run. Shuffle, its replacement, needed 2–3 GB to operate — n8n runs under 512 MB at idle.

Grafana + Loki — the manager's dashboard

Grafana provides the visualisation layer: time-series dashboards, alert trend charts, system health panels. Loki is its log aggregation backend and runs considerably leaner than Elasticsearch. Combined, they provide the "what is happening right now" view that a SOC operator would look at throughout a shift.

Tailscale — the secure key to the building

The lab runs on a machine connected to a domestic LTE router operating under CGNAT, which means there is no public IP address and no way to port forward from the internet. Tailscale creates an encrypted WireGuard mesh between the lab machine, the desktop, and any other enrolled device, so the entire lab is reachable remotely without exposing anything to the public internet. It runs as a daemon on the Proxmox host and uses almost no resources.

AdGuard

Before any device in the lab even attempts to connect to a website or service, it has to ask AdGuard for the address (DNS). AdGuard acts as the first filter — if a device tries to call home to a known tracking server, a malicious domain, or an intrusive ad network, AdGuard simply says "address not found." By sinkholing these requests at the DNS level, it stops threats before a single packet of malicious content is even downloaded.

The RAM budget that makes this work:

Component	Allocation	Notes
Proxmox Host OS	1.5 GB	Web GUI, bridge management, Tailscale daemon
OPNsense VM	1.5 GB	Firewall and NAT
Linux Mint XFCE (Jump Box VM)	2.0 GB	Browser and terminal access to lab GUI
Wazuh All-in-One LXC	4.0 GB	Manager, Indexer, and Dashboard
n8n SOAR LXC	2.0 GB	SQLite backend, lightweight footprint
Headroom / System Buffer	5.0 GB	Stability margin; future Grafana + Loki
Total	16.0 GB

The 5 GB headroom is deliberate — you will notice that I have not included AdGuard in the RAM table above, as it was something I decided to add later. If I recall correctly, I allocated 1 GB to it, leaving 4 GB spare. Memory pressure causes the kernel to start swapping processes to disk, which introduces latency across the whole stack. In a professional environment this would be unacceptable, but as a home lab I am less concerned — the goal is simply to make the lab as trustworthy as possible.

The architecture principle that matters most

Layer	Component	Responsibility
1	Physical hardware	Raw compute and connectivity
2	Proxmox	Virtualisation, bridge networking, snapshots
3	OPNsense	Firewalling, NAT, DHCP
4	AdGuard	DNS filtering
5	Wazuh	Detection and alerting
6	n8n	Automated response
7	Grafana + Loki	Visualisation
8	Tailscale	Remote access

It is important to note that each layer has a single responsibility, and this contributes meaningfully to good security practice. When a layer does one thing, you can reasonably determine what that layer should and should not see. When a layer does more than one thing, troubleshooting and debugging become considerably more difficult.

A good way of demonstrating this principle is the network bridge architecture. Proxmox creates two virtual network bridges — vmbr0 and vmbr1. vmbr0 connects to the USB-C Ethernet adapter — my physical link to the home network. vmbr1 has no physical interface at all — it exists only inside the machine's memory as a completely virtual switch that no device on the home network can reach directly.

OPNsense sits across both bridges — one interface on vmbr0 facing the home network and one on vmbr1 facing the internal lab. Every packet of data that moves between those two worlds passes through OPNsense. There is no other path. This is what makes the firewall a real security boundary rather than a decoration — vmbr1 cannot be reached by bypassing OPNsense, as it is not physically connected to anything that would allow that. This becomes particularly useful when examining vulnerable machines, malware, or other suspicious tools.

This is what separates a lab where the firewall is genuinely enforcing policy from a lab where the firewall is present but irrelevant.

The full technical playbook for Phase 1 — architecture decisions, every configuration file, the failure log, and the Phase 2 roadmap will be made available in due time on my GitHub Lab page.