Please select your location and preferred language where available.
The Breakthrough Lies in “Moving the Focus Away from DRAM” - The Paradigm Shift in AI Implementation Driven by RAG and LLMs
Confronting the Challenge of Updating and Increasing Data
The adoption of AI is rapidly occurring across various industries, including manufacturing. This growing trend highlights the challenge of how to effectively update the steadily increasing volumes of data to be incorporated into large language models (LLMs). To address this issue and advance the practical implementation of AI, what approaches prove most effective? This topic was discussed by Daisuke Okanohara, Co-founder and CEO of Preferred Networks, Inc., and Masayuki Arakawa, Chief Specialist, Flash Storage Strategy Department, SSD Division, Kioxia Corporation.
The rapid spread of generative AI started in 2023 when ChatGPT caught the world’s attention. Since then, large language models (LLMs) have continued to rapidly advance by increasing the number of parameters, enhancing their inference capabilities, and achieving breathtaking improvements in performance. In business settings, initiatives to feed LLMs with diverse datasets, aimed at enhancing productivity and creating new value, are accelerating. However, as practical deployment advances, another challenge has surfaced: the limits on the volume of data that LLMs can effectively handle.
The internal documents, contracts, laws, regulations, market data, and other information that companies want LLMs to “learn” are not only vast in scope, but constantly being updated. The key to advancing broad implementation of AI will be how to effectively address the vast increases in the amount of data that requires updating.
What approach will be effective for overcoming this “data barrier?” Daisuke Okanohara, a co-founder and CEO of Preferred Networks, Inc., a company which develops domestic Japanese-language LLMs and specialized AI semiconductors, discussed this challenge with Masayuki Arakawa, Chief Specialist, Flash Storage Strategy Department, SSD Division, Kioxia Corporation (Kioxia).
LLMs entering a phase which demands “practicality” as the volume of data to be ingested is rapidly increasing
Arakawa remarks, “Over the past few years, competition among hyperscalers has intensified, and LLMs have become astonishingly intelligent over this short span of time,” expressing amazement at the speed of progress. At the same time, he points out that the axis of competition surrounding LLMs is shifting from the pursuit of “intelligence” toward the pursuit of “practical utility.”
Okanohara adds, “To date, improvements in LLM performance were driven by ‘intelligence’ or whether the model could answer questions that a human being could not. However, as the implementation of LLMs advances, particularly in business settings, the models must deliver on more practical aspects, such as how much data related to actual work they can ingest or how strictly they can follow instructions.”
The issue here is that the amount of data handled by the LLM increases as practicality is pursued. The companies want to feed a wide range of data into the models, including internal documents, manuals, survey results, industry practices, laws and regulations, market trends, and international affairs. Furthermore, this information is constantly being updated.
The need for an approach that separates “intelligence” from “knowledge”
It is unrealistic to embed all “social knowledge [data]”—which continues to grow steadily and requires constant updating—directly into LLMs, considering the associated costs and the frequency of model retraining required. Moreover, the data from which AI can extract the greatest value often contains confidential information. Arakawa comments, “Going forward, it may become necessary to separate ‘knowledge [data]’ from ‘intelligence [LLM processing],’ allowing LLMs to concentrate on their core strength—the ability to think.”
One leading approach is to actively utilize vector databases, which store industry data, corporate data, and other external proprietary information, within RAG (Retrieval-Augmented Generation), a technology that increases the response accuracy of generative AI.
Okanohara points out that it is not necessary for an LLM to memorize all information for it to demonstrate intelligence. “An LLM is a mixture of the functions of storing information and processing it. However, for LLMs to process information—that is, to realize intelligence—they do not need to retain the majority of information within the model itself. Most information can be stored as external memory, which the LLM can reference while performing advanced processing.”
RAG and LLMs evolving and connecting
Okanohara continues by describing the many advantages of using an external memory such as a RAG vector database. One key benefit is that users can fully control the information that the LLM references.
“For example, it is very difficult to incorporate situational changes, such as the revision of a company’s strategy, into an LLM. However, the inability of humans to control such changing information is very problematic for the practical application of AI. The approach of handling external information in a human-manageable form would be ideal, and RAG is a good step toward achieving that goal.”
Arakawa adds, “To derive valuable reasoning from LLMs, it is essential to reference reliable information, but we have not yet fully reached that stage. Even when authoritative information or knowledge is required, there have been cases where reasoning was based on sources such as book reviews written by the general public. In RAG, it is important for users to have sufficient control over where the LLM retrieves information from. To achieve this, having a proprietary vector database is an effective approach. Ideally, the evolution of LLMs will be closely linked with the advancement of RAG and vector databases.”
SSDs as a “knowledge vessel” for the accumulation of increasing data
Meanwhile, vector databases are fundamentally built on the premise of using the GPU or host DRAM. This leads to a bottleneck for further advancing the social implementation of AI. The reason is that DRAM is extremely expensive compared to flash memory and the SSDs enabled by such memory, and there are limits to how much DRAM capacity can be increased. It is not well suited for storing the rapidly increasing amounts of data required.
“Since RAG is still in the early stages of adoption, DRAM constraints are not yet a pressing issue in practice,” notes Arakawa. For example, a vector database used at the departmental level within a large enterprise typically contains on the order of 100 million vectors, which corresponds to roughly 450–500 GB of capacity. As this size can fit within the DRAM of a high-end server, he points out that, at the proof-of-concept stage, many organizations may not yet feel a strong sense of urgency regarding capacity limitations.
However, the situation completely changes when it comes to real-world deployment. The reason is that broad implementation means monetization. To derive practical value, the volume of data handled increases dramatically to reach a scale of 1 billion vectors—or a capacity of 4.4 to 4.5 TB—for an entire company. The limits of DRAM are becoming apparent in terms of capacity, cost, and supply.
Furthermore, Arakawa adds, “Leading-edge service providers are aiming for vector databases on the order of 10 billion or 100 billion vectors—and, in the near future, even 1 trillion vectors. There are two primary drivers behind the expected growth toward the trillion-vector scale. The first is Agentic AI, which is predicted to see rapid adoption. The second is Video RAG, an emerging technology, and other forms of multimodal data. Web conferencing is commonly used in business today, and Video RAG utilizes the video, audio, and subtitles from web conferences for RAG.” Agentic AI and Video RAG are two key factors driving vector databases to the enormous size of 1 trillion vectors.
Arakawa explains that SSDs serve as the “vessel of knowledge,” enabling the storage of massive volumes of vector data while addressing the limitations of DRAM. To promote the effective use of SSDs in this context, Kioxia has developed KIOXIA AiSAQ™ software.
This software solution is an ANNS (Approximate Nearest Neighbor Search) algorithm for storing large-scale vector databases on SSDs that searches for necessary data directly from the SSD without relying on DRAM.
KIOXIA AiSAQ software was released as open source, and in December 2025, it was announced that the Milvus open-source vector database officially adopted the technology (version 2.6.4 and later).
“KIOXIA AiSAQ technology is like the engine of a car. It is a large-capacity engine, but without a vehicle to carry it, you cannot take a drive on the road of RAG. The vehicle, in this analogy, is the vector database application. By making the technology open, we aim to enable various vendors’ vector database applications to incorporate the KIOXIA AiSAQ engine, thereby promoting the utilization of SSDs in RAG,” says Arakawa.
SSDs + AiSAQ are driving the evolution of LLMs and a new wave of innovation
Okanohara states that with a system like KIOXIA AiSAQ, the barrier to experimenting with LLMs will become significantly lower.
“LLM development also requires a process of trial and error using new ideas. With new models appearing almost every day at a speed that requires the prototyping of new models every one to two weeks, having a RAG platform optimized for SSDs that is sufficiently prepared will likely make development easier for LLM designers.”
Okanohara continues, “The practical implementation of AI brings challenges related to cost and power consumption. In attempting to resolve these problems, I feel that we are finally able to choose the optimal storage configuration that includes not only DRAM but also flash memory and SSDs.
“Until now, the development of LLMs proceeded without considering costs or power to determine whether they would truly be useful. However, as the number of users increases to hundreds of millions or billions of people, and the introduction of AI begins in earnest in the industrial sector as well, I believe that it will become essential to utilize flash memory. In addition, since flash memory is strictly hardware, the software needed to connect it as an AI system will also require a process of trial and error. Therein lies the seeds of innovation, and that process is expected to create new technologies and forms of ingenuity.”
Okanohara recalls his experience transitioning from DRAM-centric designs to flash memory in search engine development. When flash memory became available for use at that time, the transition from DRAM to flash memory happened in one to two years. “If flash memory can be applied to DRAM-based systems in the world of AI systems as well, the possibility of a paradigm shift exists.”
Kioxia is currently developing the KIOXIA LC9 Series of high-capacity enterprise SSDs for generative AI. The highest capacity model offers 245.76 TB per unit, which is enough to store a vector database with a scale of 20 billion vectors. Kioxia expects that these SSDs can significantly reduce the TCO (Total Cost of Ownership), including power consumption, etc.
Arakawa concludes, “For information to generate true value and contribute to society through the real-world deployment of generative AI, the intelligence of the LLM and the knowledge of the RAG must work hand in hand like close partners. This requires large-scale vector databases that serve as ’vessels of knowledge,’ and I believe that SSDs and Kioxia are the ones who will make it possible.”
- Company names, product names, and service names may be trademarks of third-party companies.
Reprinted from: EE Times Japan
Translated from the March 2, 2026, edition of EE Times Japan
This article was translated with permission from EE Times Japan.
Department names and titles are as of the time of the interview.