Lekuo RTL9101 M.2 to 8-Port SATA HBA Review: True Multi-Port Architecture Without Multiplexing Compromises
Introduction
The evolution of M.2-based storage expansion continues with this Lekuo RTL9101-powered SATA HBA card, representing a fundamental architectural shift from previous generation designs. While superficially similar to existing M.2 SATA expansion solutions, this card delivers eight SATA 6Gbps ports through dual SFF-8087 connectors with a critical distinction: native port implementation rather than port multiplication. For homelab administrators and storage enthusiasts working within M.2 form factor constraints, this represents a meaningful upgrade in both performance predictability and power management sophistication.
You can buy this HBA with the link below:
https://lekuo.shop/products/lekuo-8-port-m-2-pcie-3-0-sata-expansion-card
What’s in the box
The card and the included two SFF-8087 to SATA cables are packaged in an unassuming cardboard box - they even included a M.2 screw, which is nice.
Hardware Overview
This M.2 card leverages the Realtek RTL9101 PCIe-to-SATA bridge chipset, a design that natively supports nine independent SATA 3.1 controllers. Due to M.2 form factor limitations, the card implements eight of these controllers through two SFF-8087 connectors, each delivering four SATA 6Gbps ports. The card operates over a PCIe 3.0 x2 interface, providing 16Gbps of theoretical bandwidth shared across all connected drives.
Key Specifications:
Chipset: Realtek RTL9101
Interface: PCIe 3.0 x2 (M.2 M-Key)
SATA Ports: 8 (via 2x SFF-8087)
SATA Standard: SATA 3.1 (6Gbps per port)
Controller Type: AHCI 1.31 compliant
Onboard Storage: Customized SPI flash with embedded AHCI driver
The Critical Architectural Difference
Understanding what sets this card apart requires examining how previous generation M.2 SATA HBAs achieved multi-port expansion. The widely used JMB585+JMB575 solution provides five native SATA ports from the JMB585 controller, then uses the JMB575 port multiplier chip to convert one of those ports into four additional ports. This means four drives share the bandwidth of a single SATA connection, creating potential bottlenecks during simultaneous operations.
The RTL9101 takes a fundamentally different approach. This chipset natively implements nine independent SATA controllers without relying on port multiplication. Each of the eight exposed ports operates as a true SATA controller with dedicated resources. Bandwidth distribution happens dynamically at the PCIe level rather than through fixed port multiplication hierarchies. This architecture delivers more predictable performance when multiple drives are active simultaneously.
Power Management and Advanced Features
The RTL9101 brings an impressive array of power management capabilities that exceed what's typically found in budget storage controllers:
PCIe Power States:
L0: Active state
L0s: Rapid entry/exit low power state
L1: Deeper power saving with higher latency
L1 + CLKREQ#: Advanced low power with Rx/Tx and PLL shutdown
L1 substates: L1.1 and L1.2 for granular power optimization
Additional Capabilities:
Latency Tolerance Reporting (LTR)
Link Power Management (L1.off and L1.snooze)
PCI MSI and MSI-X interrupt handling
SATA Link Power Management with automatic partial-to-slumber transitions
For homelab environments running 24/7, these power management features can contribute to meaningful energy savings, particularly in systems with spindown-capable drives or during periods of low storage activity.
AHCI and Driver Implementation
The RTL9101 implements AHCI 1.31 compliance with the controller driver stored in onboard SPI flash. This embedded driver approach means the card can initialize and enumerate drives before operating system boot, enabling boot-from-card scenarios and simplifying driver management across different operating systems.
Supported Features:
Native Command Queuing (NCQ) for optimized command ordering
First-party DMA transfers
PIO transfer support
Full SATA 1.5G/3G/6G speed negotiation
Performance Characteristics and Bandwidth Distribution
With PCIe 3.0 x2 providing approximately 16Gbps of real-world bandwidth (after encoding overhead), the theoretical maximum throughput sits around 2GB/s bidirectional. Distributed across eight SATA ports, each theoretically capable of 6Gbps (750MB/s), the PCIe interface becomes the limiting factor during heavy multi-drive workloads.
However, the RTL9101's native controller architecture means this bandwidth sharing happens intelligently at the PCIe transaction level. Unlike port-multiplied solutions where four drives compete for a single SATA link's bandwidth before even reaching the PCIe bus, each RTL9101 port can independently negotiate for PCIe bandwidth based on actual drive activity.
In practical terms: sequential operations on multiple drives will still be constrained by the PCIe 3.0 x2 ceiling, but random I/O patterns and mixed workloads will see more equitable bandwidth distribution compared to port-multiplied architectures.
Performance Testing
Testing was conducted using CrystalDiskMark with four Intel DC S3610 SATA SSDs. These enterprise-grade drives were chosen for their consistent performance characteristics and ability to saturate SATA 6Gbps links, properly stressing the RTL9101's PCIe 3.0 x2 bandwidth ceiling. Two testing profiles were used: Peak Performance preset (high queue depth) to establish maximum throughput capabilities, and Real World preset (Q1T1) to evaluate typical homelab workload behavior.
For the base system, I went with the Kamrui Hyper H2 equipped with Intel i9-11900H.
Test 1: Peak Performance Profile (High Queue Depth)
The Peak Performance preset uses SEQ1M Q8T1 for sequential operations and RND4K Q32T1 for random operations, representing maximum stress conditions.
Sequential Performance (SEQ1M Q8T1)
| Configuration | Read (MB/s) | Write (MB/s) | Aggregate Read | Aggregate Write |
|---|---|---|---|---|
| Single Drive | 523.64 | 442.86 | 523.64 MB/s | 442.86 MB/s |
| Two Drives | 528.68, 529.31 | 434.70, 434.92 | 1,057.99 MB/s | 869.62 MB/s |
| Three Drives | 528.98, 531.74, 531.92 | 429.66, 430.66, 428.95 | 1,592.64 MB/s | 1,289.27 MB/s |
| Four Drives | ~467 each | ~406 each | ~1,868 MB/s | ~1,624 MB/s |
Analysis: Sequential performance scaling demonstrates the RTL9101's native controller architecture working as designed. With one and two drives, each drive operates at or near its maximum SATA performance ceiling (~530 MB/s read). The PCIe 3.0 x2 interface provides sufficient bandwidth for two drives operating simultaneously without throttling.
At three drives, we see aggregate throughput approaching 1.6 GB/s read and 1.3 GB/s write, still with minimal per-drive performance degradation. The four-drive scenario shows the PCIe bandwidth ceiling materializing—aggregate throughput approaches the theoretical ~2 GB/s limit of PCIe 3.0 x2 (after encoding overhead), requiring bandwidth distribution across drives. Individual drive performance drops to approximately 467 MB/s read and 406 MB/s write, representing roughly 89% and 92% of single-drive performance respectively.
Critically, this bandwidth distribution is remarkably even across all four drives, demonstrating the advantage of the RTL9101's native controller approach over port-multiplied architectures.
Random 4K Performance (Q32T1)
| Configuration | Read (MB/s) | Write (MB/s) | Read IOPS | Write IOPS | Read Latency (μs) | Write Latency (μs) |
|---|---|---|---|---|---|---|
| Single Drive | 210.70 | 182.41 | 51,441 | 44,532 | 619 | 714 |
| Two Drives | 173.87, 180.77 | 140.40, 145.02 | 42,447, 44,133 | 34,276, 35,406 | 528, 528 | 528, 526 |
| Three Drives | 115.53, 143.10, 121.41 | 92.89, 119.14, 99.62 | 28,205, 34,937, 29,640 | 22,677, 29,085, 24,322 | 883, 738, 816 | 950, 807, 860 |
| Four Drives | 97.84, 82.99, 156.62, 77.30 | 72.21, 61.29, 126.37, 62.90 | 23,887, 20,260, 38,237, 18,872 | 17,628, 14,964, 30,852, 15,357 | 917, 1,244, 675, 1,254 | 984, 1,328, 758, 1,322 |
Aggregate Random Performance:
- Two Drives: 86,580 read IOPS / 69,682 write IOPS
- Three Drives: 92,782 read IOPS / 76,084 write IOPS
- Four Drives: 101,256 read IOPS / 78,801 write IOPS
Analysis: The random I/O results reveal more complex behavior under queue depth stress. With two drives operating simultaneously, per-drive IOPS drops to approximately 82-86% of single-drive performance, but aggregate IOPS increases significantly while maintaining consistent sub-530μs latencies.
The three-drive scenario shows more pronounced performance variation between drives, with one drive achieving 34,937 read IOPS while others settle around 28-29K IOPS. Latencies remain relatively controlled in the 738-883μs range for reads.
The four-drive test exposes the limitations of running Q32 across multiple drives simultaneously over a PCIe 3.0 x2 link. Performance distribution becomes notably uneven, with one drive achieving 38,237 read IOPS (156.62 MB/s) while others drop to 18,872-23,887 IOPS. Latency variance increases dramatically, ranging from 675μs to 1,254μs for reads.
This behavior likely stems from several factors:
- Queue depth saturation - Q32 per drive means 128 outstanding commands competing for PCIe bandwidth
- AHCI command slot contention - The controller is managing command distribution across multiple drives under extreme queue pressure
- PCIe transaction arbitration - The RTL9101 must prioritize and schedule PCIe transactions from nine potential controllers under heavy load
It's worth noting that Q32 represents an unrealistic sustained workload for most consumer and homelab scenarios. Real-world storage workloads rarely maintain queue depths this high across multiple drives simultaneously.
Test 2: Real-World Workload Profile (Q1T1)
To complement the peak performance testing, a second round of benchmarks was conducted using CrystalDiskMark's Real World preset, which employs Q1T1 (Queue Depth 1, Thread 1) for both sequential and random operations. This represents typical consumer and homelab workload patterns far more accurately than the high-queue-depth torture testing. The mixed workload test uses a 70% read / 30% write distribution.
Sequential Performance - Real World (SEQ1M Q1T1)
| Configuration | Read (MB/s) | Write (MB/s) | Mixed 70/30 (MB/s) |
|---|---|---|---|
| Single Drive | 464.17 | 428.85 | 381.20 |
| Two Drives | 467.80, 463.23 | 428.49, 431.91 | 378.73, 380.17 |
| Three Drives | 460.85, 464.21, 468.82 | 421.42, 421.12, 423.49 | 378.10, 380.26, 380.23 |
| Four Drives | 402.95, 414.61, 444.52, 435.90 | 401.01, 402.22, 414.44, 413.40 | 373.09, 372.12, 373.60, 377.04 |
Aggregate Throughput:
- Two Drives: 931.03 MB/s read / 860.40 MB/s write / 758.90 MB/s mixed
- Three Drives: 1,393.88 MB/s read / 1,266.03 MB/s write / 1,138.59 MB/s mixed
- Four Drives: 1,697.98 MB/s read / 1,631.07 MB/s write / 1,495.85 MB/s mixed
Analysis: At Q1T1, the sequential performance characteristics paint a very different picture than the peak performance tests. With one, two, and three drives, performance remains remarkably consistent across all drives, with minimal variance. Each drive maintains near-baseline performance, demonstrating that the PCIe 3.0 x2 bandwidth is more than adequate for moderate sequential workloads.
The four-drive scenario shows the first signs of bandwidth constraints, but notably only in pure read operations where individual drive speeds drop to the 403-445 MB/s range. Write performance remains surprisingly strong at 401-414 MB/s per drive, very close to baseline. Most impressive is the mixed 70/30 workload performance: all four drives maintain 372-377 MB/s, representing less than 2% degradation from single-drive performance.
This demonstrates an important characteristic of the RTL9101 under realistic loads—when drives aren't simultaneously hitting peak sequential speeds, the controller efficiently distributes available PCIe bandwidth with minimal overhead.
Random 4K Performance - Real World (Q1T1)
| Configuration | Read (MB/s) | Write (MB/s) | Mixed (MB/s) | Read IOPS | Write IOPS | Mixed IOPS |
|---|---|---|---|---|---|---|
| Single Drive | 30.80 | 72.11 | 34.15 | 7,520 | 17,604 | 8,337 |
| Two Drives | 29.32, 29.52 | 59.70, 61.48 | 33.60, 33.23 | 7,157, 7,206 | 14,575, 15,010 | 8,203, 8,112 |
| Three Drives | 28.17, 28.32, 28.74 | 52.95, 54.37, 57.48 | 32.50, 32.69, 32.79 | 6,875, 6,913, 7,015 | 12,928, 13,275, 14,032 | 7,936, 7,981, 8,005 |
| Four Drives | 26.27, 26.65, 28.28, 27.52 | 46.02, 47.52, 56.11, 51.35 | 31.55, 31.83, 32.56, 32.22 | 6,414, 6,505, 6,905, 6,718 | 11,234, 11,602, 13,699, 12,536 | 7,701, 7,771, 7,949, 7,866 |
Aggregate Random Performance:
- Two Drives: 14,363 read IOPS / 29,585 write IOPS / 16,315 mixed IOPS
- Three Drives: 20,803 read IOPS / 40,235 write IOPS / 23,922 mixed IOPS
- Four Drives: 26,542 read IOPS / 49,071 write IOPS / 31,287 mixed IOPS
Random 4K Latency - Real World (Q1T1)
| Configuration | Read (μs) | Write (μs) | Mixed (μs) |
|---|---|---|---|
| Single Drive | 132 | 56 | 119 |
| Two Drives | 139.56, 138.61 | 68, 66 | 121, 123 |
| Three Drives | 145, 144, 142 | 77, 75, 71 | 125, 125, 125 |
| Four Drives | 155, 153, 144, 148 | 88, 86, 72, 79 | 129, 128, 125, 126 |
Analysis: The Q1T1 random I/O results reveal the RTL9101's true character under realistic operating conditions—and it's excellent. Unlike the erratic behavior seen under Q32 stress testing, Q1T1 workloads show consistent, predictable performance scaling.
Per-drive IOPS decreases gradually and proportionally as more drives are added, which is expected behavior as they share PCIe bandwidth. However, the aggregate IOPS scales nearly linearly: moving from one to four drives delivers 3.5x read IOPS, 2.8x write IOPS, and 3.8x mixed workload IOPS. This demonstrates efficient PCIe transaction management without the bottlenecking you'd see in port-multiplied architectures.
Latency behavior is particularly noteworthy. Even with four drives operating simultaneously, read latency increases only 17% (from 132μs to 148μs average), write latency increases 41% (from 56μs to 81μs average), and mixed workload latency increases just 6% (from 119μs to 127μs average). These are remarkably small latency penalties given that PCIe bandwidth is being dynamically shared across four independent SATA controllers.
The consistency of latency across all four drives in the four-drive test (144-155μs for reads, 72-88μs for writes) indicates fair scheduling and arbitration—no single drive is being starved or prioritized.
Performance Testing Summary
The dual testing approach reveals the RTL9101's complete performance profile. Under peak performance conditions (Q8/Q32), the card demonstrates its ability to approach the PCIe 3.0 x2 bandwidth ceiling with sequential workloads while showing some arbitration complexity under extreme random I/O queue depths. Under real-world conditions (Q1T1), the controller exhibits exactly the behavior homelab users need: consistent, predictable performance with minimal latency overhead.
For common use cases—file serving, media streaming, backup operations, RAID array workloads—users can expect:
- Consistent per-drive performance with minimal variance between drives
- Efficient bandwidth utilization approaching the PCIe 3.0 x2 theoretical ceiling
- Low latency overhead even with all eight ports populated and active
- Predictable scaling as additional drives become active
The mixed 70/30 workload results are particularly telling: even with four drives active, each maintains 92-99% of single-drive performance. This is the metric that matters for NAS workloads, where reads and writes occur simultaneously across multiple drives.
The Q1T1 testing reveals what matters most for typical homelab deployments: the RTL9101 handles realistic workloads with grace and predictability. The dramatic performance variance seen in Q32 testing simply doesn't manifest under normal operating conditions, making this card well-suited for its intended use cases despite the M.2 PCIe connections’ bandwidth constraints.
Addressing Common Misconceptions About M.2 SATA Expansion Cards
The homelab and NAS communities have developed strong opinions about SATA expansion solutions, with conventional wisdom heavily favoring traditional PCIe HBA cards over M.2-based alternatives. While this guidance made sense during the era of port-multiplied M.2 cards, the RTL9101's native controller architecture warrants revisiting these assumptions. Let's address some common Reddit-tier criticisms with actual data and practical considerations.
Myth 1: "You'll run into issues sooner rather than later as they have cheap port multipliers on them"
Reality: The RTL9101 doesn't use port multiplication. This chipset natively implements nine independent SATA controllers—no multiplexing, no shared bandwidth at the SATA level. The performance testing demonstrates this clearly: under Q1T1 real-world workloads, all four drives maintained 92-99% of single-drive performance with consistent, predictable latency distribution.
The "cheap port multiplier" criticism was entirely valid for JMB585+JMB575 solutions, where four ports literally shared a single SATA link's bandwidth. The RTL9101 architecture eliminates this bottleneck. Bandwidth sharing happens at the PCIe level, where the controller intelligently arbitrates between nine independent SATA controllers competing for PCIe 3.0 x2 lanes—fundamentally different from four drives fighting over a single 6Gbps SATA connection.
Myth 2: "SATA will make you kick yourself. Get an IT-flashed HBA card. They're probably close to the same price"
Reality: This advice ignores PCIe slot economics and platform limitations.
Consider the real-world constraints:
AMD Zen 3/4 Platforms: No integrated graphics. The primary PCIe x16 slot must house a GPU just to POST the system. Your second slot is typically x4 electrically and runs through the chipset, adding latency. To get x8/x8 bifurcation across both slots requires premium motherboards with higher-end chipsets.
Intel Platforms: While integrated graphics exist, most homelab builders still install discrete GPUs for transcoding, gaming, or compute workloads. The second PCIe slot—if it exists and isn't blocked by a massive GPU cooler—is often x4 and chipset-connected.
The M.2 Advantage: Most motherboards provide 2-3 M.2 slots. One runs NVMe for the OS, leaving 1-2 slots available that provide direct CPU-connected PCIe lanes without consuming the precious few full-length slots. An M.2 SATA card doesn't force you to choose between storage expansion and other PCIe devices.
Yes, a traditional HBA provides more bandwidth (x8 or x16 versus x2), but it costs you a slot you might not have to spare. The RTL9101 card occupies real estate you probably weren't using anyway.
Myth 3: "Don't get the 9-port version, some ports are slow because it shares lanes. The 6-port is good"
Reality: This fundamentally misunderstands how the RTL9101 works. All nine SATA controllers are native implementations—there's no hierarchy where some ports are "fast" and others are "slow." The ASM1166-based 6-port cards use a similar native controller approach, but they're limited to six ports because that's what the ASM1166 chipset supports.
The RTL9101 provides nine native controllers but this particular M.2 card only exposes eight via dual SFF-8087 connectors due to form factor constraints. Each of those eight ports has equal standing in the controller's PCIe transaction arbitration. The performance testing proves this—examine the four-drive Q1T1 results where latency variance across drives was minimal (144-155μs reads, 72-88μs writes).
The actual limitation is PCIe bandwidth, not port hierarchy. With PCIe 3.0 x2 providing ~2 GB/s, you'll hit this ceiling when multiple drives operate at peak sequential speeds simultaneously. But this affects all ports equally, not selectively.
Myth 4: "Get the HBA, have a better life. Shortcuts like this are not worth it"
Reality: Define "better life." If you're building a 24-bay enterprise storage server, absolutely get a proper LSI 9300/9400-series HBA. But for homelab scenarios with 4-8 SATA drives, the M.2 solution offers tangible advantages:
Thermal Management: Server-grade HBAs were designed for 1U/2U chassis with dedicated high-CFM airflow. In a typical ATX case or compact homelab build, these cards run hot because the airflow just isn’t there. The LSI SAS2008/3008 chipsets idle at 60-70°C and can thermal throttle without active cooling. The RTL9101 requires no heatsink whatsoever—it's passively cooled and generates minimal heat.
Power Consumption: Traditional HBAs draw 5-15W at idle depending on model and attached drives. The RTL9101's sophisticated power management (L0s, L1, L1.1, L1.2 substates) and modern process node result in significantly lower power consumption. For 24/7 homelab operation, this compounds over months and years.
Noise: Related to cooling—if your HBA needs a 40mm fan screaming at 6,000 RPM to stay cool, your "better life" includes constant server noise in your living space.
Cable Management: Two SFF-8087 cables versus four individual SATA cables or multiple SAS breakout cables makes for cleaner routing in compact cases.
The "shortcut" framing is misguided. This is an engineering tradeoff, not a compromise. You're trading raw PCIe bandwidth (which you may not need) for slot availability, thermal efficiency, and power consumption—all of which matter in homelab contexts.
Use Case Fit
This card occupies an interesting niche in the storage expansion landscape:
Ideal Applications:
- Multi-drive RAID arrays where even bandwidth distribution matters
- Backup target aggregation using multiple mechanical drives
- Media server storage pools with concurrent streaming workloads
- NAS expansion in systems with available M.2 slots but limited PCIe slots
- Homelab storage tiers requiring moderate aggregate throughput
Less Suitable For:
- High-performance NVMe replacement (obviously)
- Workloads requiring >2GB/s aggregate throughput
- Scenarios where individual drive performance must hit maximum speeds simultaneously
- Systems requiring boot reliability from external HBA (though AHCI support theoretically enables this)
Installation and Compatibility Notes
The M.2 form factor introduces some practical considerations. Ensure your M.2 slot provides PCIe lanes rather than SATA-only M.2 slots (common on some mini PC motherboards). You'll need the included SFF-8087 to SATA breakout cables, which add cable management complexity in compact builds. The card will occupy valuable M.2 real estate that could otherwise house NVMe storage.
Most modern operating systems should recognize the AHCI controller without additional driver installation, though.
Conclusion
The RTL9101-based M.2 SATA HBA represents a thoughtful evolution in M.2 storage expansion, addressing the fundamental bandwidth sharing limitations of port-multiplied designs. While PCIe 3.0 x2 bandwidth constraints still exist, the native nine-controller architecture ensures more predictable performance characteristics across connected drives.
For homelab builders seeking to maximize SATA port density within M.2 form factor constraints, this card offers a meaningful upgrade over JMB585+JMB575 solutions, particularly in workloads involving simultaneous drive activity. The sophisticated power management features add further value for always-on deployments.
This isn't a performance powerhouse, obviously. PCIe slot-based HBAs with x4 or x8 links will always deliver higher aggregate throughput, but within the M.2 ecosystem, the RTL9101 architecture sets a new baseline for multi-port SATA expansion done properly.
Join me in the discussion on Reddit here:
https://www.reddit.com/r/mctk/comments/1p9hliu/lekuo_rtl9101_m2_to_8port_sata_hba_review_true/
https://lekuo.shop/products/lekuo-8-port-m-2-pcie-3-0-sata-expansion-card