「計算・データ・学習・推論」融合基盤システム一式

発注機関: 国立大学法人東京大学
所在地: 東京都文京区
公告日: 2025年2月2日
納入期限: —
入札開始日: —
開札日: —

リンク先が表示されない場合は、発注機関のサイトで直接ご確認ください

公告全文を表示

「計算・データ・学習・推論」融合基盤システム一式資料提供招請に関する公表次のとおり物品の導入を予定していますので、当該導入に関して資料等の提供を招請します。令和７年２月３日国立大学法人東京大学総長藤井輝夫◎調達機関番号 415 ◎所在地番号 13○第12号１調達内容(1) 品目分類番号 14(2) 導入計画物品及び数量「計算・データ・学習・推論」融合基盤システム一式(3) 調達方法借入(4) 導入目的本システムは、共同利用・共同研究拠点である本学情報基盤センターにおいて、大学等の広範囲にわたる学術研究に対して大規模かつ超高速の演算処理機能を提供する。計算・データ・学習・推論を有機的に連携することで、AI for Scienceによる科学の変革や新しい科学の創成、ひいてはSociety5.0の実現を支えるプラットフォームとしての利用など幅広い応用に資する。(5) 導入予定時期令和８年度３月以降(6) 調達に必要とされる基本的な要求要件Ａ本システム全体は以下の要件を満たしていること。a 本システムは、「データ・推論基盤(データ活用社会創成プラットフォーム基盤)」部、「計算・学習基盤」部とストレージからなること。計算ノード、ストレージは互いに透過的に利用できること。Ｂ本システムの「データ・推論基盤」部は以下の要件を満たしていること。a 以下のハードウェア要件を満たしていること。① 計算ノードは、汎用CPUノード、データ解析ノード、推論ノードからなること。合計800 Gbps以上のバンド幅でシステム外部と通信できること。② 汎用CPUノード群の総メモリバンド幅は、350 TB/秒、総メモリ容量は100TiByte以上であること。③ 汎用CPUノード群の総理論演算性能(倍精度浮動小数点による)は、3.0PFLOPS以上であること。演算加速装置は搭載しないこと。④ データ解析ノード群の演算加速装置が備える総メモリ容量は、8.2 TiByte以上であること。⑤ データ解析ノード群の演算加速装置が備える総理論演算性能(FP4またはそれ以上の精度を持つ浮動小数点による、疎性を考慮しない)は260 PFLOPS以上であること。⑥ 推論ノード群の演算加速装置が備える総メモリ容量は、8.2 TiByte以上であること。⑦ 推論ノード群の演算加速装置が備える総理論演算性能(FP4またはそれ以上の精度を持つ浮動小数点による、疎性を考慮しない)は260 PFLOPS以上であること。⑧ 各計算ノードは物理容量 3.0 TByte以上のNVMe SSDを持つこと。⑨ 計算ノードが備えるノード間接続ネットワークインタフェースは、汎用CPUノードではノード当たり400 Gbps以上、データ解析ノード・推論ノードでは演算加速装置当たり400 Gbps以上であること。各演算加速装置の主記憶の内容を汎用 CPUの主記憶を介さず直接転送可能であること。⑩ 計算ノードが備える外部接続ネットワークインタフェースは、ノード当たり100 Gbps以上であること。b 以下のソフトウェア要件を満たしていること。① Kubernatesによるコンテナ管理機能を備えること。管理のためのWebポータルを備えること。② プロジェクト管理のためのWebポータルを備えること。Ｃ本システムの「計算・学習基盤」部は以下の要件を満たしていること。a 以下のハードウェア要件を満たしていること。① 計算ノードは、シミュレーションノード、学習ノードからなること。② シミュレーションノード群の総理論演算性能(倍精度浮動小数点による)は、250 PFLOPS以上であること。③ シミュレーションノード群の総メモリバンド幅は、18 PByte/秒以上であること。④ 学習ノード群の総理論演算性能(FP4またはそれ以上の精度を持つ浮動小数点による、疎性を考慮しない)は、2.6 EFLOPS以上であること。⑤ 計算ノードが備えるノード間接続ネットワークインタフェースは、演算加速装置当たり400 Gbps以上であること。各演算加速装置の主記憶の内容を汎用CPUの主記憶を介さず直接転送可能であること。⑥ 各計算ノードは物理容量 3.0 TByte以上のNVMe SSDを持つこと。⑦ シミュレーションノード群と学習ノード群との間は、バンド幅が 5.0 TByte/秒以上であること。b 以下のソフトウェア要件を満たしていること。① Linuxオペレーティングシステムが動作すること。② 汎用CPU向けに自動SIMDベクトル化機能及びOpenMP API(バージョン4.5以上)を有するFortran 2008, C11, C++17以降に対応する処理系を備えること。演算加速装置向けに自動並列化機能、OpenACC API(バージョン2.7以上)あるいはOpenMP API(バージョン5.0以上)を有するFortran 2008, C11, C++17以降に対応する処理系を備えること。③ MPI3.1以上の通信ライブラリが提供されること。④ Python の処理系を備えること。⑤ 高度に最適化された数値計算ライブラリ、学習ライブラリが提供されること。⑥ バッチジョブシステムが提供されること。シミュレーションノード群と学習ノード群の両者を同時に使用する単一のジョブが実行できること。⑦ コンテナシステムが提供されること。⑧ 「計算・学習基盤」部の管理サーバ群は、「データ・推論基盤」部の汎用CPUノード群を使って構成すること。Ｄ本システムのストレージは以下の要件を満たしていること。a 以下のハードウェア要件を満たしていること。① 高速ストレージとして、20 PByte以上の記憶容量を有する高い信頼性を持つストレージシステムを提供すること。「計算・学習基盤」部の計算ノード群から1.2 TB/s以上の転送性能でアクセスが可能であること。② アーカイブストレージとして、20 PByte以上の記憶容量を有する高い信頼性を持つストレージシステムと、5 PByte以上の記憶容量を有する高い信頼性を持つテープアーカイブ装置を提供すること。他の学内スパコンシステムからも読み書き可能な機能を備えること。b 以下のソフトウェア要件を満たしていること。① 高速ストレージ上の領域に対し、本システムの全ての計算ノードから並列ファイルシステムとしてマウントしPOSIXアクセスが可能であること。ファイル圧縮機能を備えること。② 高速ストレージ上の領域に対し、本システムの全ての計算ノード、および本システム外部から、AWS S3(Amazon Web Service Simple Storage Service)互換オブジェクトストレージとしてアクセス可能であること。ファイル圧縮機能を備えること。上記1項と透過的にファイルが参照可能であることが望ましい。③ 高速ストレージ上の領域に対し、「データ・推論基盤」部の計算ノードからNVMe over Fabricsプロトコルによりブロックデバイスとして接続が可能であること。④ アーカイブストレージ上の領域に対し、本システムの全ての計算ノードから、ファイルシステムとしてマウントしPOSIXアクセスが可能であること。また、本システム外部からオンラインストレージサービスとしてアクセス可能であること。ファイル圧縮機能を備えること。テープアーカイブ装置も含めた階層的な管理が行えること。⑤ ユーザー情報、グループ情報の管理機能を備えること。「データ・推論基盤」部と「計算・学習基盤」部にそれぞれマッピングして提供する機能を備えること。Ｅ本システムのインタコネクトは以下の要件を満たしていること。a 以下のハードウェア要件を満たしていること。① 「データ・推論基盤」部と「計算・学習基盤」部との間はバンド幅が3.5TByte/秒以上であること。b 以下のソフトウェア要件を満たしていること。 ① 「計算・学習基盤」部の計算ノードは、「データ・推論基盤」部を経由して本システム外部との間で通信が行えること。Ｆ導入システム全体の消費電力は、冷却設備の電力を含めて4.5 MVA以下であること。CPU、演算加速装置、メモリおよびディスク装置が連続的に稼働し続けた際にも十分な廃熱が行えるよう、電源容量、冷却、設置方式が考慮されること。汎用CPUおよび演算加速装置の冷却は水冷とすること。設置面積は冷却設備を除いて370平方メートル以下であること。屋外に設置する冷却設備の設置面積は500平方メートル以下であること。２資料及びコメントの提供方法上記１(2)の物品に関する一般的な参考資料及び同(6)の要求要件等に関するコメント並びに提供可能なライブラリーに関する資料等の提供を招請する。(1) 資料等の提供期限令和７年３月17日17時00分(郵送の場合は必着のこと。)(2) 提供先〒277－0882 千葉県柏市柏の葉６-２-３東京大学情報システム部情報戦略課会計チーム和田一弘電話070-1531-4283３説明書の交付本公表に基づき応募する供給者に対して導入説明書を交付する。(1) 交付期間令和７年２月３日から令和７年３月17日まで。(2) 交付場所上記２(2)に同じ。４説明会の開催本公表に基づく導入説明会を開催する。(1) 開催日時令和７年２月10日14時00分(2) 開催場所 Zoomによるオンライン説明会５その他この導入計画の詳細は導入説明書による。なお、本公表内容は予定であり、変更することがあり得る。６ Summary(1) Classification of the products to be procured : 14(2) Nature and quantity of the products to be rent : Integrated Infrastructure System forSimulation, Data, Learning, and Inference 1 Set(3) Type of the procurement : Rent(4) Basic requirements of the procurement :Ａ “Integrated Infrastructure System for Simulation, Data, Learning, and Inference”must satisfy the following specifications:a This system consists of “Data and Inference Infrastructure (Infrastructure Systemfor Data Exploitation Platform)” part, “Simulation and Learning Infrastructure”part, and Storage. Compute nodes and Storage must be able to be used transparentlywith each other. ② Total memory bandwidth of the general-purpose CPU nodes must be 350TByte/sec. or more, and total memory capacity must be 100 TiByte or more. ③ Total theoretical peak performance of the general-purpose CPU nodes (bydouble-precision floating point arithmetics) must be 3.0 PFLOPS or more. Compute accelerators must not be employed. ④ Total memory capacity of compute accelerators in the data analysis nodes mustbe 8.2 TiByte or more. ⑤ Total theoretical peak performance of compute accelerators in the data analysisnodes (by floating point arithmetics with FP4 or higher precision, withoutconsidering sparsity) must be 260 PFLOPS or more. ⑥ Total memory capacity of compute accelerators in the inference nodes must be8.2 TiByte or more. ⑦ Total theoretical peak performance of compute accelerators in the inferencenodes (by floating point arithmetics with FP4 or higher precision, withoutconsidering sparsity) must be 260 PFLOPS or more. ⑧ Each compute node must be equipped with NVMe-connected SSD(s) with aphysical capacity of 3.0 TByte or more. ⑨ The network interface of the interconnect employed by compute node must be400 Gbps or more per node for the general-purpose CPU node, and 400 Gbps ormore per accelerator for the data analysis node and the inference node. The dataon the main memory of the compute accelerator device must be directlytransferred without accessing the main memory of the general-purpose CPU. ⑩ External network interfaces of compute nodes must be 100 Gbps or more pernode. b It must satisfy the following software specifications:① The system must include container management functionality based onKubernetes. The system must provide a web portal for its administration. ② The system must provide a web portal for project management. Ｃ “Simulation and Learning Infrastructure” part of “Integrated Infrastructure Systemfor Simulation, Data, Learning, and Inference” must satisfy the followingspecifications:a It must satisfy the following hardware specifications:① Compute nodes of the part consist of the simulation nodes and the learningnodes. ② Total theoretical peak performance (by double precision floating pointarithmetics) of the simulation nodes must be 250 PFLOPS or more. ③ Total memory bandwidth of the simulation nodes must be 18 PByte/sec ormore. ④ Total theoretical peak performance (by floating point arithmetics with FP4 orhigher precision, without considering sparsity) of the learning nodes must be 2.6EFLOPS or more. ⑤ The network interface of the interconnect employed by compute node must be400 Gbps or higher per accelerator. The data on the main memory of the computeaccelerator device must be directly transferred without accessing the main memoryof the general-purpose CPU. ⑥ Each compute node must be equipped with NVMe-connected SSD(s) with aphysical capacity of 3.0 TByte or more. ⑦ The simulation nodes and the learning nodes must be interconnected by anetwork with a bandwidth of 5.0 TByte/sec or more. b It must satisfy the following software specifications:① A Linux operating system must run. ② Fortran 2008, C11, and C++17 languages must be supported includingautomatic SIMD vectorization function and OpenMP API (Version 4.5 or higher). For the compute accelerators, Fortran 2008, C11, and C++17 languages must besupported with automatic parallelization function, OpenACC API (Version 2.7 orhigher), or OpenMP API (Version 5.0 or higher). ③ An MPI3.1 library must be provided. ④ Python language must be supported. ⑤ Highly optimized math libraries and learning libraries must be provided. ⑥ A batch job system must be provided. A job that simultaneously uses both thesimulation nodes and the learning nodes must be executed. ⑦ A container system must be provided. ⑧ The management servers for “Simulation and Learning Infrastructure” partshould be configured using the general-purpose CPU node group of the “Data andInference Infrastructure” part. Ｄ Storage of “Integrated Infrastructure System for Simulation, Data, Learning, andInference” must satisfy the following specifications:a It must satisfy the following hardware specifications:① Fast Storage System must be highly reliable with 20 PByte or more capacity. Fast Storage System must achieve the access performance from all the computenodes of “Simulation and Learning Infrastructure” part with bandwidth of 1.2TB/s or more. ② Archive Storage System must be highly reliable, with 20 PByte or more capacityfor a storage system and 5 PByte or more capacity for a tape archive system. Archive Storage System must be available from other Supercomputer Systems inITC/U.Tokyo. b It must satisfy the following software specifications:① The area on Fast Storage System must be mountable as parallel file system fromall the computing nodes of this system, and POSIX access must be available. Itmust have file compression functions. ② The area on Fast Storage System must be accessible as AWS S3 (Amazon WebService Simple Storage Service) compatible Object Storage by all the computingnodes of this system and from outside this system. It must have file compressionfunctions. It is desirable that files can be referenced transparently with the aboveItem 1. ③ It must be possible to connect to the area on Fast Storage System as blockdevices using the NVMe-over-Fabrics protocol from the compute nodes in “Dataand Inference Infrastructure” part. ④ The area on the Archive Storage System must be mountable as file system fromall the compute nodes of this system, and POSIX access must be available. It mustalso be accessible as online storage service from outside this system. It must havefile compression functions. It must be available for hierarchical management,including a tape archive system. ⑤ Storage System must have functions for managing user and group information. It must have functions for mapping to and providing “Data and InferenceInfrastructure” part and “Simulation and Learning Infrastructure” part. Ｅ Interconnect of “Integrated Infrastructure System for Simulation, Data, Learning,and Inference” must satisfy the following specifications:a It must satisfy the following hardware specifications:① The bandwidth between “Data and Inference Infrastructure” and “Computationand Learning Infrastructure” must be at least 3.5 TByte/sec. b It must satisfy the following software specifications:① The computing nodes in “Simulation and Learning Infrastructure” part must beable to communicate with the outside of the system via the “Data and InferenceInfrastructure” part. Ｆ Overall maximum power consumption except the cooling system must be 4.5 MVAor less. The power capacity, cooling facility, and system assembly are required to becarefully designed so that the system is kept cool even if CPU, accelerator, memory,and disks are fully and continuously operated. The cooling of general-purpose CPUsand accelerators must be water-cooled. The footprint of entire system except coolingsystem must be equal to or less than 370 square meters. The footprint of coolingsystem which is located outdoor must be equal to or less than 500 square meters. (5) Time limit for the submission of the requested material : 17:00 17 March, 2025(6) Contact point for the notice : WADA Kazuhiro, Accounting Team, InformationStrategy Group, Information Systems Department, The University of Tokyo, 6-2-3KashiwanoHa Kashiwa-shi Chiba 277-0882 Japan, TEL 070-1531-4283

「計算・データ・学習・推論」融合基盤システム 一式

「計算・データ・学習・推論」融合基盤システム一式