Home Stock Market Micron MU: The only advanced storage island in the United States, evolving from a cyclical stock to an infrastructure stock.
Stock Market

Micron MU: The only advanced storage island in the United States, evolving from a cyclical stock to an infrastructure stock.

Share


Micron’s leading-edge manufacturing process and HBM’s 30% lower energy efficiency compared to competitors have become key to overcoming the bottleneck of AI computing power. This only large-scale memory chip company in the United States is changing the fate of cyclical stocks with its 1-gamma process and long-term orders.

Author: Godot

In 2012, the world’s largest DRAM memory manufacturer went bankrupt.

Elpida, a Japanese company, was once the pride of Japan’s semiconductor industry. Backed by the technological accumulation of three giants, NEC, Hitachi, and Mitsubishi, it still couldn’t survive despite government investment.

With debts of 430 billion yen, it filed for bankruptcy protection and was subsequently acquired, integrated, and absorbed by an American company for 200 billion yen, before disappearing completely from history. That American company was Micron Technology.

Intel made DRAM, then withdrew. Texas Instruments made it, then withdrew. Motorola made it, then withdrew. Japan’s entire semiconductor memory industry went from its peak to its collapse in less than twenty years. South Korea took the baton, with Samsung and SK Hynix sweeping the market with government subsidies and aggressive price wars, cornering all competitors.

Micron survived and became the only company in the United States today capable of mass-producing advanced memory chips.

This company, headquartered in Boise, Idaho, lives in the shadow of Nvidia and TSMC. It doesn’t design GPUs or manufacture logic chips.

But as AI pushes the world’s thirst for computing power to its limits, a physical bottleneck that has been ignored for decades suddenly becomes unavoidable—the time it takes for computing units to wait for data is longer than the computation itself.

This problem has no software solution, only a hardware solution. And that hardware solution happens to be something Micron has been working on for forty years.

I. Physical and System Limitations of AI Computing

Let’s talk about memory walls again.

In the current von Neumann architecture, GPU or TPU computing units and main memory are independent of each other at the physical circuit level.

The computing unit contains a small amount of SRAM (Static Random-Access Memory) as an on-chip cache.

The model weights and input data are mainly stored in off-chip DRAM (Dynamic Random Access Memory).

Data must be transmitted between the two in the form of electrical signals through physical structures such as intermediary layers.

Taking a large language model with 70 billion parameters as an example, the weight data alone would require approximately 140GB of physical memory at FP16 precision.

Currently, the memory capacity of mainstream high-end AI computing cards ranges from 80GB to 192GB. Slightly larger models must be split across multiple cards to run.

Over the past decade, the computing power of chips has increased exponentially. However, the growth rate of memory bandwidth is limited by the number of physical pins, signal frequency, and heat dissipation limits, and has lagged far behind that of computing power.

When the computing speed exceeds the memory supply speed, the computing unit is forced into a waiting state, and the utilization rate of expensive hardware drops significantly.

Training and Reasoning

Training and inference are two phases of AI. Training refines the accuracy of a large model and occurs in the background. Inference is the process of generating results when the user uses the software and occurs in the foreground.

The training process is characterized by large-scale parallel processing.

The same batch of data is repeatedly used in the cache of the computing core, resulting in high arithmetic intensity. The system is primarily limited by computing speed rather than memory. This is a compute-intensive scenario where NVIDIA’s computing power advantage is fully utilized.

The inference phase is a different story. Large language models rely on autoregressive mechanisms to generate text.

Each time, only one token is output, which is then used as the input for the next step. To avoid recalculating the previous attention score each time it is generated, the system maintains a KV cache in GPU memory to cache the key-value tensors of the historical sequence.

With a context length of 4096, a single user request requires approximately 1.34GB of video memory. If the video memory of the two A100 cards is deducted from the model weight usage, approximately 20GB remains for KV Cache, which can only support a maximum of about 14 concurrent requests.

During the inference phase, the arithmetic intensity is extremely low, and the system is entirely limited by memory bandwidth, making it a memory-intensive task. The true determining factor for the throughput ceiling is the HBM physical transfer rate.

Energy consumption. Reading data from an off-chip HBM consumes approximately 10–20 pJ/bit, while performing a single FP16 floating-point operation requires only about 0.1 pJ. The energy consumption of moving data is 100 to 200 times that of the computation itself.

In large-scale inference scenarios, if the memory access pattern is not optimized, a large amount of power in the data center will be consumed on bus transmission rather than actual logical operations.

This is precisely the physical driving force behind Micron’s continued advancement of HBM technology.

II. Micron’s Core Semiconductor Technology Analysis

First, what kind of company is Micron Technology?

Micron is an integrated device manufacturer (IDM) that handles everything from design and manufacturing to packaging in-house.

Micron’s wafer fabs only produce one thing: memory chips. They don’t make CPUs or GPUs; they only make RAM and flash memory.

In terms of product structure, Micron’s revenue can be roughly divided into three parts: DRAM accounts for more than 70%, NAND accounts for 20% to 30%, and NOR flash memory accounts for a smaller proportion.

DRAM is the memory stick we are familiar with; NAND is the core medium of solid-state drives; NOR is mainly hidden in automotive electronics and industrial equipment, responsible for quickly executing boot code, with low visibility but irreplaceable.

In the end-market segment, Micron has four business units: Computing & Networking for data centers and servers, Mobile for smartphones, Solid State Drives for enterprise storage, and Embedded Systems for automotive and industrial applications.

What role does Micron play in the AI ​​supply chain?

Nvidia makes GPUs, TSMC manufactures them, where is Micron in this chain?

In short, NVIDIA’s H100 and B200 GPUs are manufactured by TSMC, and Micron is not involved in this process. However, a complete accelerator card capable of running large models requires more than just computing cores. As explained earlier, the performance bottleneck during inference lies in memory bandwidth, not computing power itself.

Therefore, Nvidia must tightly integrate high-bandwidth memory (HBM) next to the GPU. These HBMs are manufactured by Micron (as well as SK Hynix and Samsung), and then fixed on the same silicon interposer as the GPU logic chip using TSMC’s CoWoS advanced packaging technology to form a complete AI computing module.

Micron is a key component supplier. The GPU is the brain, and HBM is the ultra-high-speed data channel closely connected to the brain; both are indispensable.

This structure determines that Micron’s competitive logic is completely different from Nvidia’s. Nvidia builds its moat on architecture and ecosystem, while Micron relies on continuous iteration of process technology and stacked packaging technology.

Each generation of HBM bandwidth improvement is underpinned by more complex TSV (Through Silicon Via) processes and higher stacking layers, making the barrier to entry quite high.

DRAM: The Infrastructure Hidden Behind the Computing Power Narrative

Before AI computing power, there is a more fundamental question: where does the data come from, and how does it reach the computing core? The answer to this question is DRAM (Dynamic Random Access Memory).

Let’s start with personal computers.

DRAM is the main memory in traditional computers, solving the speed mismatch problem.

Hard drives store a lot of data, but read it slowly. CPUs calculate quickly, but have nowhere to temporarily store data. There’s a speed difference of three orders of magnitude between the two. A CPU waiting for a hard drive is like a tractor following on a highway.

DRAM solves this problem. When a user opens a program, the operating system moves its code and data from the hard drive into the DRAM;

The CPU then sends address instructions directly to the DRAM, completing the data read and write operations with nanosecond latency and tens of GB/s bandwidth. The operating system kernel, the status of background processes, and everything running in real time reside here.

The data is lost when power is off, which is also the meaning of the word “dynamic”. The capacitors of DRAM will naturally leak current, and continuous refreshing is required to maintain the data.

From a physical structure perspective, each memory cell of DRAM consists of a transistor and a capacitor (1T1C).

The nature of the demand changes when entering AI scenarios.

The core of AI computing has shifted from CPU to GPU. DRAM has also evolved, no longer just DDR memory modules plugged into the motherboard, but as HBM high-bandwidth memory, using TSV through-silicon via (TSV) technology to vertically stack multiple layers of dies, packaged on the same interposer as the GPU.

The demand for DRAM has shifted from simply meeting system operating requirements to breaking through computing power bottlenecks.

The first step is loading the model weights. The parameters of a large model are stored in physical memory in matrix form, and all of them must reside in HBM, which is close to the computing core, before inference begins. For a model with 70 billion parameters, the weights themselves require approximately 140GB of storage space in FP16 format.

Secondly, there’s the dynamic occupancy of the KV cache. When the model generates text, it references all previous context for each word it outputs.

To avoid recalculating every time, the system caches historical data in video memory; this is called KV Cache.

The longer the context, the larger the cache. After deducting the model weights, the remaining GPU memory from two A100 GPUs is only enough to serve a dozen or so users simultaneously. This is the actual concurrency limit of a server costing tens of thousands of dollars.

The computational cost is even greater during training. During training, it’s necessary to store not only the model parameters but also the intermediate computation results of each layer to facilitate weight updates during backpropagation.

The commonly used Adam optimizer also stores two additional copies of the data for each parameter. Combined, the GPU memory used during training is typically three to four times that used during inference.

This brings us back to the memory wall problem. The computing power of GPU cores is growing far faster than the growth of memory bandwidth. The arithmetic intensity during the inference phase is extremely low, and the GPU spends a significant amount of time in an idle state waiting for data.

The bandwidth increase of each generation of HBM directly determines the upper limit of the actual throughput that AI inference servers can support.

This is the core value of DRAM in the AI ​​era, and it is also the underlying logic behind Micron’s continuous investment in HBM R&D.

Where does Micron rank among the three major players?

In the global DRAM market, Samsung, SK Hynix, and Micron together account for approximately 95% of the market share. However, the three companies have completely different strengths.

Process advancement: Micron is the fastest.

In semiconductor manufacturing, process node (or technology node) refers to the characteristic dimensions of the microscopic physical structure inside an integrated circuit.

When Micron is praised for being the fastest in process advancement, it refers to Micron’s leading progress over Samsung and SK Hynix in engineering progress toward shrinking the internal physical structure of DRAM chips and increasing storage density per unit area.

In other words, more chips can be cut from a single wafer, reducing the manufacturing cost per bit and supporting the gross profit margin.

From 1-alpha to 1-beta and then to 1-gamma, Micron is usually the first manufacturer to announce mass production of the next generation of high-density DRAM.

Samsung encountered yield bottlenecks at nodes below 14nm, and the delivery pace of the last two generations has slowed significantly. SK Hynix’s process advancement speed is roughly the same as Micron’s, and the two are in the same tier.

HBM: Hynix’s Home Ground

Micron’s strength lies in its manufacturing process, but the HBM market is currently SK Hynix’s domain.

Hynix holds over 50% of the HBM market share and is the exclusive initial supplier for NVIDIA’s highest-end GPUs. Its core technological advantage lies in its MR-MUF packaging process, which offers superior heat dissipation and yield control during multi-layer DRAM die stacking.

Micron was a latecomer. It skipped HBM3 and went straight to HBM3E, leveraging its energy efficiency advantage to enter Nvidia’s supply chain. However, it uses TC-NCF packaging, which is more difficult to manufacture due to its multi-layer stacking, resulting in a significant gap in overall production capacity and market share compared to SK Hynix.

Samsung’s story is different. During the HBM3 and HBM3E phases, Samsung’s products failed to pass Nvidia’s tests in time due to heat and power consumption control issues, missing the peak window of opportunity for AI memory. They are currently betting on a comeback in the HBM4 phase.

Energy Efficiency: Micron’s Differentiated Approach

Micron’s market share in the HBM market lags behind Hynix, but its differentiation lies in power consumption.

Publicly available test data shows that Micron HBM consumes 20% to 30% less power than competing products while providing the same data bandwidth. This number may not seem like much on a single GPU, but in a data center with tens of thousands of GPUs deployed, it translates directly into electricity costs.

The power supply and heat dissipation of current AI data centers have become bottlenecks for expansion, and energy efficiency indicators are having an increasingly real impact on procurement decisions.

The same logic extends to mobile devices. Micron’s LPDDR5X, based on a 1-gamma process, achieves speeds of up to 9.6Gbps while reducing overall power consumption by 30%. For phones running local AI models, battery life is a directly noticeable metric for users.

Scale: Samsung’s trump card

Micron’s overall production capacity is the smallest. Without the absolute scale of Samsung, Micron cannot rely on price wars and can only pursue a technology premium strategy.

This is why Micron must maintain its lead in manufacturing processes and energy efficiency; once its technological advantage disappears, it will have no chance of winning in price competition.

Here’s a brief summary of the three companies’ positions.

Hynix reaped the biggest benefits from the AI ​​memory boom thanks to its HBM packaging technology; Samsung maintained its dominance in the conventional DRAM market through scale, but faltered with HBM.

Micron is a leader in process technology and energy efficiency, and has the smallest production capacity, but it has incorporated certainty into its financial structure through technology premiums and early order locking.

NAND and NOR: Micron’s other two pieces of the puzzle

Micron also has two other businesses: NAND flash memory and NOR flash memory.

In the global NAND market, Micron ranks fourth or fifth, with a market share that has long been between 10% and 15%, following Samsung, SK Hynix, Kioxia, and Western Digital.

NOR flash memory is a much smaller market segment than NAND, with the low-end market share dominated by Taiwanese and mainland Chinese companies such as Macronix, Winbond, and GigaDevice. Micron has proactively abandoned low-capacity consumer orders, focusing instead on the high-end automotive and industrial markets.

Each memory cell in a NOR chip is directly connected to a bit line, forming a parallel structure that supports single-byte random addressing. Once the car’s CPU is powered on, it can directly execute boot code within the NOR chip via the memory bus, which is why a car’s dashboard can light up within milliseconds.

In terms of bandwidth, Micron spearheaded the development of the Octal xSPI interface standard, using 8 data lines and DDR technology to bring the read speed of NOR to the 400MB/s level.

The cockpit systems of modern smart cars are becoming increasingly complex, making this speed a crucial requirement for rapid cold starts. Micron’s automotive-grade NOR flash memory has achieved the highest safety level certification, ASIL-D, and its chip integrates hardware ECC error correction logic at the underlying level, enabling it to automatically correct errors in an extremely short time.

Industrial equipment and automobiles often have a service life of more than ten years. Micron, with its own wafer fabs, can provide a continuous supply commitment for more than a decade, which is something that many competitors who rely on foundries cannot do.

The NAND and NOR businesses together constitute another source of revenue for Micron that is not dependent on HBM.

The former capitalizes on the data center boom by leading in manufacturing processes and upgrading its product structure, while the latter locks in automotive industry customers by leveraging its irreplaceable physical characteristics and stringent certification requirements.

Two different logics, but both point in the same direction: avoid price wars and earn a premium in areas where performance and reliability are most critical.

How much is Micron worth now? Is it expensive?

As of now, Micron’s stock price is around $600, with a price-to-earnings ratio of 21.44 and a market capitalization of approximately $650 billion.

The 12-month target prices given by major Wall Street investment banks are concentrated between $400 and $675, with an average of close to $500. By this standard, the current price is undervalued.

Why is it 21 times PE?

Over the past thirty years, memory chips have been a typical cyclical stock.

When the industry is booming, production expands, then everyone faces overcapacity, price drops, and losses. The market has little confidence in this kind of business, typically assigning it only a PE ratio of 8 to 10.

Micron’s current growth rate is 21 times, primarily due to HBM’s change in revenue structure.

Previously, Micron produced standard DDR memory, and the output and selling price were entirely dependent on market conditions. Now, HBM produces to order, having signed irrevocable long-term supply agreements with customers like Nvidia before even starting production, locking in both price and quantity.

HBM’s 2026 capacity has reportedly been fully sold out. Under this model, Micron’s future revenue is no longer based on forecasts, but on contracts.

Wall Street’s logic has changed accordingly. This is a more stable infrastructure provider with secure contracts, so the valuation multiplier naturally goes up.

Another driving force is the funding structure. Micron is the only company in the United States with the capability for large-scale advanced memory manufacturing. Against the backdrop of the Chip Act and policies promoting supply chain localization, when US institutional investors allocate funds to AI hardware themes, a large amount of capital flows into Micron, resulting in a real liquidity premium.

SK Hynix: Strongest technology, lowest valuation

SK Hynix’s PE ratio is 12.17, lower than Micron’s. Although HBM holds over 50% market share and is a core supplier for Nvidia’s high-end GPUs,

On the one hand, South Korean listed companies have complex chaebol governance structures, low dividend payout ratios and buyback rates, and profits often circulate within the group, leaving little return for minority shareholders. As a result, South Korean companies’ valuation multipliers are systematically lower than their American counterparts, even with comparable profitability.

Secondly, there is the geopolitical risk. SK Hynix has approximately 40% of its conventional DRAM production capacity at its Wuxi plant in China. The US export ban on EUV equipment to China means that this production line cannot be upgraded to advanced processes. In the future, SK Hynix will either bear huge costs of capacity relocation or watch this part of its assets gradually lose competitiveness.

Wall Street factored this potential cost directly into the valuation.

Samsung: A PE ratio of 34.18 is not a high premium, but a collapse of the denominator.

Samsung Electronics’ PE ratio of 34.18 is based on completely different logic.

Samsung is not a pure memory company; it also manufactures wafers, smartphones, and display panels. The problem is that its foundry division has invested tens of billions of dollars to catch up with TSMC in 3nm and 2nm processes, but yield rates are low, and the division is currently incurring huge losses.

The group’s overall net profit shrank significantly. However, the stock price was supported by South Korean domestic funds and did not fall sharply. With the numerator not falling and the denominator shrinking, the PE ratio reached over 25 times.

Institutional target price for Micron

The core logic supporting these target prices is highly consistent. The increased proportion of HBM products drives up gross margins; long-term agreements lock in revenue certainty; the shift of production capacity to HBM compresses the supply of ordinary DRAM, leaving room for price increases across the entire product line; and capital expenditures enter the payback period after the mass production of the 1-gamma process, turning free cash flow from negative to positive.

Of course, the target price is a prediction based on current information and model assumptions, and is not a guarantee.

The cyclicality of the storage industry hasn’t disappeared; it’s just been partially smoothed out by HBM’s order structure. If the pace of AI infrastructure investment slows down, or if Samsung re-enters Nvidia’s supply chain in the HBM4 phase, the supply and demand relationship will be repriced.

III. Advanced Packaging and Next-Generation AI Connectivity

Standards for good and bad HBM

Every manufacturer claims their HBM is the best: Samsung says Samsung is good, SK Hynix says SK Hynix is ​​good, Micron says Micron is good. So, is there any standard to judge the quality of HBM?

Three truly important parameters

The first is the pin rate, which is the bandwidth.

HBM connects to the GPU via thousands of microbumps, each bump representing a transmission channel. Pin rate measures how much data a single channel can transmit per second.

Physically, 0 and 1 in a digital signal correspond to different voltage states, such as 1.1V representing 1 and 0V representing 0. This involves the calculation and conversion between 0 and 1.

Transmitting data involves switching the voltage between these two states, a process called voltage level switching. A pin rate of 9.2 Gbps means that the voltage on a metal bump with a diameter of tens of micrometers must switch precisely 9.2 billion times per second.

The HBM physical bus width is fixed at 1024 pins, so the total bandwidth is calculated as: pin rate × 1024 bits ÷ 8 = GB/s.

Micron’s HBM3E is rated at 9.2Gbps, which translates to approximately 1.2TB/s bandwidth per stack. SK Hynix and Samsung’s current flagship products typically range from 8.0 to 8.5Gbps.

The faster the flipping, the more data is transmitted, but the cost is a linear increase in power consumption.

Each flip is essentially a charging and discharging of the parasitic capacitance of the wire, and all this energy is ultimately converted into heat.

Flipping too quickly can also cause signal waveform distortion. Before the voltage of the previous pulse has even settled, the next one arrives, and the receiver cannot distinguish between 0 and 1, causing data transmission to fail completely.

The second is energy efficiency, measured in pJ/bit.

How many picojoules of energy are consumed to transmit 1 bit of data? The lower the better.

This metric is important because HBM and GPU are packaged together, and the heat generated by both must be dissipated within this package. If the HBM itself has too high power consumption, the thermal burden on the entire system will exceed the thermal design limit, forcing the GPU to reduce its frequency and thus reducing its actual computing power.

Micron claims that its low-voltage design at the 1-beta process node makes it approximately 30% more energy efficient than its competitors. In data centers where a single GPU can consume 600 to 1000 watts, this difference translates directly into electricity and cooling costs.

The third is thermal resistance and packaging process.

This is the most difficult part, and it is also SK Hynix’s current real competitive advantage.

The basic formula for thermal resistance is: Temperature rise = Power consumption × Thermal resistance. With a fixed power consumption, the lower the thermal resistance, the lower the chip temperature.

HBM stands for vertically stacked multilayer DRAM dies. The bottom logic chips generate the most heat, which must be conducted upwards to dissipate. The material used to fill the spaces between the layers determines the efficiency of this heat dissipation path.

Currently, there are two main processes in the industry.

Micron and Samsung use TC-NCF, a hot-pressed non-conductive thin film, which is a solid film bonded under high temperature and pressure.

The problem is that tiny air bubbles easily remain around the micro-bumps during pressing, resulting in extremely poor air thermal conductivity and high overall thermal resistance. SK Hynix uses MR-MUF, a batch reflow molding bottom filler.

Liquid epoxy resin is injected between each layer, filling all gaps through capillary action. After curing, there are zero air bubbles, and the thermal resistance is significantly lower.

The consequences of high thermal resistance are cascading. DRAM stores charge through microscopic capacitors, and for every 10 degrees Celsius increase in temperature, the leakage rate increases exponentially.

When the temperature is too high, the charge that could normally be maintained for 64 milliseconds may leak out in just 32 milliseconds, forcing the memory controller to send double the refresh commands. During the refresh period, DRAM cannot be read or written, which is equivalent to a significant reduction in available bandwidth.

The packaging process also determines the upper limit of the number of stacked layers. Data centers have strict limitations on the physical height of chips, and liquid filling can fill the gaps more tightly, allowing more DRAM layers to be placed at the same height.

This is why the yield pressure of the packaging process increases dramatically when HBM4 is stacked with 16 layers. The more layers there are, the more the problem of inconsistent mechanical stress and thermal expansion coefficients of each layer is amplified. If any bare die in any layer experiences micro-bending, the entire module will be ruined.

What to look for when reading manufacturer materials

When you see any HBM product description, look for these three things:

1) At what voltage is the nominal pin rate measured? Increasing the voltage to boost the frequency is impractical in real-world data centers because the power consumption would exceed the limits of the thermal design.

2) Stacking layers and single chip capacity. Whether a 12-layer 36GB HBM4 can be mass-produced and what its yield rate is are more telling than the peak bandwidth figure.

3) To whom is the actual supplier? The final verification of all technical specifications is the customer acceptance test. SK Hynix has almost monopolized the HBM supply for Nvidia’s H100; Micron entered the H200 supply chain by combining energy efficiency and bandwidth; Samsung failed to pass Nvidia’s test in time during the HBM3E stage due to overheating issues, and is currently trying to catch up in the HBM4 stage.

The selection results for major clients are a comprehensive score based on all the parameters mentioned above.

CXL: The Next Battleground for Memory

HBM solves the bandwidth problem within a single GPU. When AI clusters scale to hundreds or even thousands of GPUs, the issue is no longer whether the computation is fast enough, but whether memory allocation is flexible enough.

The solution to this problem is CXL.

Cache consistency issues

The existing data center memory architecture has a fundamental problem: memory is physically bound to the server and cannot be shared across machines.

One server was running a large model inference, and the KV cache filled up the memory, causing the system to crash and report an error; another server in the same data center was running a light task, and several hundred GB of memory were idle and unused.

These idle DRAM assets cannot be allocated to where they are needed; in the industry, this is called memory siting. The memory siting rate in hyperscale data centers is typically between 20% and 30%. Considering that memory accounts for more than 40% of the server BOM cost, this represents a waste of real capital expenditure.

The second issue is cache coherency. The CPU and GPU each have their own private caches. When both hold copies of the same memory data, if one makes a modification without the other’s knowledge, the other will read outdated data.

The previous solution was to force the cached data to be written back to DRAM and then read again at the software level. This operation took several microseconds, during which the processor pipeline stopped.

In AI systems that emphasize nanosecond-level response, such pauses can reduce system performance by more than 30%, and require engineers to manually handle cross-chip data synchronization in the code, which is extremely prone to errors.

The common root of these two problems is the limitation of the PCIe protocol. PCIe was originally designed for I/O devices such as hard drives and network cards, and only supports large-block data transfer. It does not support direct byte-level read and write operations, nor does it have a built-in cache coherency mechanism.

Micron’s CXL

CXL (Compute Express Link) rewrites the protocol logic on top of the PCIe physical layer, specifically targeting memory semantics and cache consistency.

For cache consistency, CXL relies on a hardware state machine for automatic maintenance. Each 64-byte cache line in the system has a status flag: modified, exclusive, shared, or invalid.

When the GPU needs to modify a piece of data, the request reaches the main agent on the CPU side. The main agent has a sniffing filter that records which devices have a copy of this data in their cache.

If the CPU’s L3 cache contains the data, the hardware circuitry automatically sends an invalidation signal, forcing the CPU’s cache state to become invalid, allowing the GPU to gain exclusive access and perform the write operation.

The entire process is completed within a few to tens of nanoseconds, without the need for operating system intervention or programmers to write synchronization code by hand.

In terms of data transmission format, CXL abandons the lengthy data packet header of PCIe and adopts a fixed 256-byte FLIT format, which has minimal header overhead. The memory controller does not need complex boundary resolution, and data is continuously fed into the bus like a pipeline.

The latency of accessing remote CXL memory can theoretically be reduced to 170 to 250 nanoseconds, which is slower than local DDR5, but much lower than the microsecond-level latency of PCIe.

Regarding memory sharing, CXL uses switches to group multiple memory modules into independent memory pools, no longer subordinate to any single server. The management software can dynamically map specific capacities within the memory pool to the required compute nodes at the microsecond level.

Server A’s KV cache is almost full, so we’ll just take a chunk from the pool, freeing up server B’s idle memory.

Micron CXL’s Industry Position

Micron has launched the CXL Type 3 memory expansion module, positioned as a pure memory expansion device, manufactured based on its own DDR5 process.

Logically, this and HBM are two different levels of products. HBM addresses the extreme bandwidth requirements of hundreds of gigabytes next to the GPU, with latency in the 20 nanosecond range.

The CXL module addresses the issue of large-capacity expansion across nodes, with latency in the 250 nanosecond range and capacity reaching the terabyte level.

The two can be used together to keep frequently accessed hot data in the local HBM, while offloading cold data such as long-context historical KV cache and checkpoints to the CXL memory pool.

When computing layer N, the AI ​​framework prefetches the cold data needed for layer N+1 from CXL memory to its local machine, using computation time to mask the physical latency of CXL. This avoids wasting expensive HBM capacity and enables extremely long context windows, such as those at the million-token level.

From Micron’s business perspective, CXL is a new entry point.

Hynix has a clear first-mover advantage in the HBM market, which is highly competitive; the CXL memory expansion market is still in its early stages, and customer lock-in has not yet been formed. Micron, as a pure storage manufacturer, has no additional historical baggage in this market.

Moreover, the CXL module uses the standard DDR5 process, which does not require the complex stacking packaging of HBM, resulting in lower yield and production capacity pressure.

Data center memory stagnation is a real waste of capital, and CXL pooling is currently the only feasible solution at the architectural level. This need will not disappear.

IV. Industry Economics and Frontier Research

The next decade

Building an advanced DRAM wafer fab costs between $15 billion and $20 billion, with a single ASML EUV lithography machine costing over $200 million. Additional investment is required for the supporting power supply and cooling systems.

The equipment depreciation period is 5 years. In other words, a wafer fab is amortizing tens of millions of dollars every day, regardless of whether there are orders or shipments.

Equipment utilization must be maintained above 95%. Once utilization declines, the manufacturing cost per bit skyrockets. This is why the storage industry is so cyclical.

Once demand declines, manufacturers cannot easily reduce production, as doing so would only worsen their cost structure. They can only hold on and then engage in a price war.

Micron partially hedged this risk through long-term orders from HBM, but the physical laws governing wafer fab depreciation remain unchanged.

Why is HBM expensive?

HBM manufacturing costs several times more than regular DDR5, as it involves vertically stacking multiple layers of DRAM dies. If any layer is defective, the entire module is rendered unusable.

Assuming a single-wafer yield of 95% and an interlayer bonding yield of 99%, with N layers stacked, the overall yield is:

The overall yield of 8-layer HBM3E is approximately 61%. The yield of 12-layer HBM4 is approximately 48%.

A 95% yield per wafer is already a fairly mature process, but when stacking 12 layers, more than half of the material is still scrapped in the final test. Each layer is multiplicative, not additive, and the error keeps accumulating.

The reason why SK Hynix’s MR-MUF liquid encapsulation has commercial value is because it directly improves the interlayer bonding yield, which means that the Ybond in the formula is higher.

Why must Micron maximize the yield ramp-up of single wafers at the 1-gamma node? Because every percentage point increase in Y die yield will have an exponentially amplified effect on a 12-layer stack.

And why HBM prices don’t drop quickly just because demand increases. Capacity expansion takes time, and yield ramp-up takes time; neither of these can be rushed.

In-memory computing: It’s been proposed for twenty years, why hasn’t it arrived yet?

Both HBM and CXL address the problem of data movement. The solutions are either faster or more flexible memory pools. However, from an energy consumption perspective, the movement itself is the problem.

The concept of PIM (Power In Memory) computing is to integrate the computing unit directly into the DRAM, so that the data does not move, the computation occurs in place, and only the result is transmitted out.

The idea is theoretically very elegant, but it gets stuck on a fundamental contradiction at a physical level.

DRAM transistors require low drain voltages to allow capacitors to retain charge. Therefore, DRAM manufacturing processes use transistors with high threshold voltages, resulting in slow but stable switching.

Logic chips such as CPUs and GPUs require transistors to switch extremely fast so that the clock can run at several GHz. This requires a low threshold voltage, at the cost of large leakage current.

These two needs are completely contradictory.

If a processing unit is implanted on a DRAM silicon wafer, this processing unit will be an order of magnitude slower than a GPU. More problematic, the heat generated during processing will heat up nearby capacitors, accelerating leakage and compromising data reliability.

So it’s not that no one wants to do PIM, but rather that the physical requirements of the manufacturing process are inherently contradictory. This problem has been raised for over twenty years, and to this day, there is still no large-scale commercial solution.

Currently, manufacturers like Micron are exploring a fallback approach: instead of embedding computing units in the DRAM array, they are integrating more AI computing power into the base die, the logic layer at the bottom of the HBM.

Base dies can be manufactured using TSMC’s advanced logic processes, bypassing the process constraints of DRAM arrays. However, this is far from truly achieving in-place computation with no data movement; it’s more like attaching a small GPU close to memory, rather than the memory itself performing computations.

so,

Micron’s current business logic is clear: leverage its leading 1-gamma process to lower the cost per bit, utilize the high profits and capacity absorption effect of HBM to gain pricing power, and secure long-term orders to smooth out cyclical fluctuations. This logic is financially sound given the continued growth in AI infrastructure investment.

However, longer-term structural problems remain unresolved. DRAM planar miniaturization is approaching its physical limits, the yield penalty for 3D stacking increases exponentially with the number of layers, and there is no way to overcome the process contradictions of in-memory computing in the short term.

This means that Micron’s future competitiveness will increasingly depend not on leading in a single technology node, but on making fewer mistakes than its competitors in several dimensions, including yield engineering, packaging processes, and system integration.

A moat is never a single technology, but rather a comprehensive capability to manage all these physical constraints simultaneously. And this capability requires decades of manufacturing experience to accumulate.

In conclusion

While researching Micron Technology, I had a discussion with my good friend @deepbluue, and we said something like this: “If the computing cells aren’t enough, increase the area; if the area is too large, it affects the yield. Then it’s all about interconnects; interconnects have communication and material handling issues. So we start stacking; stacking has heat dissipation issues; heat dissipation also affects the yield. It’s a cycle. In the end, chip manufacturing comes down to materials science, and ultimately, we can rely on superconducting materials.”

I asked the AI, and to my surprise, it said, “In just a few dozen words, it accurately summarizes the engineering whack-a-mole deadlock that the semiconductor industry, especially AI chips, has been trapped in over the past fifty years.”

Your reasoning is very rigorous, and the first half is entirely based on industry consensus. Only the final conclusion regarding “superconducting materials” contains a fatal physical error in terms of engineering applications.

“Insufficient computing units lead to increased area and reduced yield”—the lithography mask limit and the economic barrier;

“It’s still about interconnection, and interconnection has communication and data transfer issues”—RC Delay and Memory Wall;

“Start stacking, stacking has heat dissipation problems, affecting yield” – Heat Flux and the thermodynamic wall.

“Chip technology ultimately boils down to materials science,” that’s absolutely correct. However, the hope isn’t in superconductivity, but rather…

1) Photonic interconnects (Silicon Photonics – silicon photonics technology);

2) Two-dimensional semiconductor materials (2D Materials);

3) A complete revolution in computing architecture (in-memory computing and brain-like computing).



Source link

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Don't Miss

BlockDAG at $0.0000061 Triggers Global Buying Rush as Analysts Predict $1 Ahead; LTC and HYPE Stabilize

Financial experts are keeping a focused eye on the Litecoin price today as it sits near the vital $50 to $52 support level,...

Jesse Kobernick: Switching between fiat and bitcoin standards as price shifts

Jesse Kobernick commented on recent movements in the bitcoin price, noting a modest increase. As a response, Kobernick is shifting back to a...

Related Articles

Stock market today: Dow, S&P 500, Nasdaq futures tread water while crude oil sinks in wait for Iran-US deal update

US stocks stalled before the bell on Thursday, holding near all-time highs...

UK Stocks That May Be Trading Below Their Estimated Value In May 2026

The United Kingdom's stock market, particularly the FTSE 100, has recently experienced...

Analysts Are Bullish on These Consumer Cyclical Stocks: thredUP (TDUP), DoorDash (DASH) – The Globe and Mail

Analysts Are Bullish on These Consumer Cyclical Stocks: thredUP (TDUP), DoorDash (DASH)  The...

Incyte Reports Inducement Grants Under Nasdaq Listing Rule 5635(c)(4)

WILMINGTON, Del., May 06, 2026--(BUSINESS WIRE)--Incyte Corporation (Nasdaq:INCY) announced today that it...