2005/8/12

■UCBの研究教育について

　□UCB CSの高性能コンピューティング教育について
（July 20, 2005）

　今月は、UCBの高性能計算関連の講義を調査しました。
調査対象は、Demmel教授とYelik教授の講義内容です。

　●講義の調査に関する概観

　以下の点が概観できる。

　（１）スペシャリスト養成目的の講義

　日本の大学院では、たとえ専門科目とはいえ、以下に示す
ようなきわめて専門性の高い講義による課題は出さないと思
われる。

　これは、日本の大学院はジェネラリストを養成することを
目的としているのに対して、米国のそれは、スペシャリスト
を養成することを目的としているから、と推察できる。

　（２）研究プロジェクトに関連する演習

　講義においても、学科や研究室の枠組みを超えた研究テー
マの集合である「研究プロジェクト」．．．注１）に関連す
る内容で、講義の単位が取れるようになっている。(．．．注２)

　これは、スペシャリスト養成を目的としていることからも、
その目的と合致している。

　注１：
　ここで用いている「プロジェクト」は２つの種類がある。

　まず、対外的な予算獲得を目的に形成される＜研究プロジェクト＞
である。この研究プロジェクトは、学科、研究室の枠組みを超えて
研究目的に合致する教員、PD、大学院生で構成される。

　これとは別に、講義の単位を取得するための＜講義プロジェクト＞
が存在する。この講義プロジェクトは、学科、研究室の枠組みを超え
て構成されるが、大学院生にのみにより構成されている。講義で扱う
研究テーマは同一であるが、講義プロジェクト構成員が所属する研究
プロジェクトのテーマは多くの場合、同一でない。

　注２：
　この研究プロジェクトによる内容で講義の単位が取れるというのは、
UCBに限定したことではないようである。
　　　
　BeBOPプロジェクトには、イリノイ大学に学部のとき在籍していた
大学院生がいる。その学生に、イリノイ大学での講義形式を聴いたと
ころ、イリノイ大学でもUCBと同様の研究プロジェクトベースの講義
になっていたそうである。
　　　
　ただし、講義プロジェクトの課題は提出が難しいことから、大きな
単位が与えられるのに対し、割り当て課題を解くと小さな単位が与え
られるという＜オプション＞がある。

　●Prof. James Demmelの講義

　　○行った講義の概略：
Teaching for Spring 2005: CS 267, Applications of Parallel Computers
Teaching for Fall 2004: Math 221, Matrix Computations
Teaching for Spring 2004: Math 55, Discrete Mathematics
Teaching for Spring 2002: Math 128a, Numerical Analysis
Teaching for Spring 2001: CS 170, Efficient Algorithms and Intractable
Problems
　　(co-taught by Prof. Jonathan Shewchuk)
Teaching for Fall 2000: Math 55, Discrete Mathematics
Teaching for Fall 1999: CS 170, Efficient Algorithms and Intractable
Problems
Teaching for Fall 1999: Math 221, Matrix Computations
Teaching for Spring 1999: CS 267, Applications of Parallel Computers
Teaching for Fall 1998: Math 128a, Numerical Analysis
Teaching for Spring 1998: CS 170, Efficient Algorithms and Intractable
Problems
Teaching for Fall 1997: Math 221, Matrix Computations

　数値線形代数、離散数学、並列計算応用、の内容が主である。

　○授業「並列計算機の応用」
　U.C. Berkeley CS267 Home Page Applications of Parallel Computers
　Spring 2005 Monday and Wednesday 9:30-11, 405 Soda
　（週２回、各１時間３０分）

　　・講義資料：
　CS 267 Lectures: Spring 2005

　Lecture 1: Introduction
　Lecture 2: High Performance Programming on a Single Processor:
Memory Hierarchies, Matrix Multiplication, and Automatic
Performance Tuning
　Lecture 2 (cont): Automatic Performance Tuning (continuation of Lecture 2)
　Lecture 3: Parallel Machines and Programming Models
　Lecture 4: Shared Memory Machines and Programming;
Example: Sharks and Fish Sharks and Fish code to discuss in
lecture
　Lecture 5: Distributed Memory Machines and Programming Guest Lecture
on MPI Progamming by Bill Saphir, Chief Architect,
National Energy Research Scientific Computing Center (NERSC)
　Lecture 6 Sources of Parallelism and Locality in Simulation - part 1
　Lecture 7 (part 1): Sources of Parallelism and Locality in Simulation -
part 2
　Lecture 7 (part 2): Tricks with Trees
　Lecture 8: Dense Linear Algebra - Parallel Matrix Multiply
　Lecture 9: Dense Linear Algebra - Parallel Gaussian Elimination
　Lecture 10: Graph Partitioning
　Lecture 11: Floating Point Arithmetic
　Lecture 12: Guest Lecture by Xiaoye Li: Sparse Linear Algebra
　Lecture 13: Guest Lecture by Katherine Yelick: Programming in UPC
　Lecture 14: Solving PDEs - the Poisson Equation (1)
　Lecture 15: Solving PDEs - the Poisson Equation (2)
　Lecture 16: Guest Lecture by David Skinner on Parallel Application
Scaling,
Performance and Efficiency
　Lecture 17: Hierarchical Methods for the N-Body Problem
　Lecture 18: Load Balancing
　Lecture 19: Guest lecture by Teresa Head-Gordon on Computational Biology
　Lecture 20: Guest lecture by Michael Wehner on Climate Modeling
　Lecture 21: Automatic Performance Tuning - Sparse Matrix Kernels
　Lecture 22: Guest lectures by Osni Marques and Tony Drummond on the ACTS
Collection of Advanced CompuTational Software. The software
itself is here.
　Lecture 23: Guest lecture by Julian Borrill on Computational Astrophysics
- Analyzing Cosmic Background Radiation
　Lecture 24: Guest lecture by Katherine Yelick on The Digital Human
　Lecture 25: Unstructured Multigrid, based on work of Mark Adams
　Lecture 26: Guest lecture by Phil Colella on Adaptive Mesh Refinement,
part 1 and part 2.
　Lecture 27: Guest lecture by David Anderson on Volunteer Computing, the
technology
behind SETI@home.
　Lecture 28: Guest lecture by David Bailey, Chief Technologist of NERSC,
on Performance Needs and Tools for High Performance Computers
(pdf)
Home page for Performance Evaluation Research Center (PERC)
PERC survey paper on Status of Performance Research (pdf)
Summary of PERC work on Performance Tools (Word)
Paper by David Bailey and Allan Snavely on Performance Modeling
(pdf)
　Lecture 29: Guest lecture by Wes Bethel, leader of the LBNL Visualization
Group,
on Parallelism in Graphics

　この講義資料から、並列計算機の基礎的な内容から、最先端の研究内容まで
カバーしている。さらに最先端の研究をしている研究者を招待し講義が
構成されている。ゲスト研究者としては、LBL、LBNLの研究員が多い。

　講義内容は、並列計算機アーキテクチャの基礎から、並列プログラミングの基礎、
線形代数の基礎、最先端の数値計算アルゴリズムや自動チューニングなどの
HPC分野の最先端の研究内容まで及ぶ。

　成績のつけ方は、各人が選んだテーマに関するレポートである。しかし、普通の
レポートではなく、提出されたレポートが会議の論文として通るぐらいの
高いクオリティがある。

　このレポートは、講義により決められた（ただし、教員が与えるのでなく、自分で
講義と関連する内容のテーマを決める）講義プロジェクトの報告である。各人の
研究内容を良くするため、各人の課題に対する＜ポスター発表＞をする時間が
設けられている。

----------------------

　○講義「応用数値線形代数」
Applied Numerical Linear Algebra
UC Berkeley Mathematics Department Math 221
Room Change starting Friday Sep 10: 405 Soda

　　・講義資料：
Handouts
　Course Overview in Postscript or PDF
　　：概略や参考文献が載っている
　Class Survey . Please fill this out and email it to
njw@math.berkeley.edu if you did not get the copy in class.
　　：講義を受ける人の名前や、各人の研究テーマが数値計算に関連するか、
　　　　および、どの計算機言語に詳しいかなどのアンケート
　Class Roster
　Guest Lecture by Beresford Parlett on the MRRR Algorithm
　　　for the symmetric tridiagonal eigenproblem.
　Guest Lecture by Christof Voemel on the MRRR Algorithm
　　　for the symmetric tridiagonal eigenproblem.
　　：最先端の数値計算アルゴリズムMRRRについて、
　　　　研究者を呼んでゲスト講義
　An Introduction to the Conjugate Gradient Method Without
　　　the Agonizing Pain by Jonathan Shewchuk, is a very easy to
　　　understand description of one of the most popular iterative methods
　　　for solving A*x=b. In contrast to the terse treatment in the course
text book,
　　　you might want to see Shewchuk's answer to the question "How could
fifteen lines
　　　of code take fifty pages to explain?"
　Lectures notes on Multigrid, in either html or powerpoint.

Textbook
　Applied Numerical Linear Algebra by J. Demmel, published by SIAM, 1997.
　List of Errata for the Textbook
　(This will be updated during the semester. Suggestions welcome!)

　　・割り当て課題：
Homework assignments for Math 221
Homework #1 due Wednesday, Sept 8:
　From Chapter 1: 1,2,3,4,5,10,11
　Variation on Question 12 from Chapter 1:
　Answer the question as stated just for addition and multiplication.
　：テキストの問題を解く
　Write down an algorithm for complex division a/b.
　：アルゴリズムを書く
　Assume your algorithm runs using IEEE double precision arithmetic.
　Let a = 2^512 + i*2^512 (which is roughly equal to 1.341 * 10^154 + i *
1.341 * 10^154),
　and b=a, so a/b = 1. What answer does your algorithm compute?
　：計算精度に関する考察を書く、プログラム実験付き
　(you can write a very short program to try it, or just reason based on on
the rules of
　IEEE arithmetic.) If it does not compute something close to 1, propose how
you might improve it.
　...
　From Chapter 2: modification of problem 14:
　Rather than modifying existing LAPACK or CLAPACK codes as
　　in the book's version of question 2.14, we will instead do this
assignment in Matlab,
　　as described below.
　　：Matlabを用いて、LAPACKやCLAPACKのコードを改良する
　Your assignment is to implement Gaussian Elimination with Complete
Pivoting (GECP) in Matlab,
　　and evaluate it as described here. On the class webpage you will find
four programs,
　　：完全枢軸選択付きのLU分解をMatlabで書く

　この講義では、割り当て課題があるので、レクチャーの度合いが強い。

　だが、講義では最先端数値計算アルゴリズムMRRRについて、研究者を招待して
講義を行っている。この点で、講義を受けるだけで最先端の研究内容を知ることが
できることを特徴としている。

　また、割り当て課題のレベル（専門性）はきわめて高い。Matlabが使えることを
前提としているが、この演習について、数値計算分野外の学生が行うことは
きわめて困難である。

結論としてこの講義も、数値計算関連のテーマを研究課題として持っている
大学院生を前提にして、講義が構成されているといってよい。
　　
---------------------------------------

　●Prof. Katherine Yelickの講義

　　○行ってきた講義の概略：
CS4: Introduction to Computing for Engineers
　(Fall 2004)
CS267: Introduction to Parallel Computation
　(Spring 98, Fall 2001, Spring 2004)
CS294-8: Principles of Fault Tolerant Computing
CS298-1: Systems Seminar (Fall 2000)
CS61B: Data Structures and Advanced Programming
　(Spring 2001, Fall 98, Fall 97)
CS263: Design of Programming Languages

　並列計算機の応用、計算機システム、計算機言語の講義が主である。

　　○講義：並列計算機の応用
CS 267 + ENGIN 233: Applications of Parallel Computers
(Lectures: Mon-Wed 1:00-2:30pm in 405 Soda)
(Discussions: 1:00-2:00 in 310 Soda)
（約５ヶ月間、１時間３０分の講義が週２回あるほかに、
　１時間の討論の機会がある？）

　　　・講義内容：
1 Wed 1/21 Overview of Parallel Computing
2 Mon 1/26 Uniprocessor Memory Hierarchies
3 Wed 1/28 Matrix Matrix Multiplication on Uniprocessors
4 Mon 2/2 Machines and Tools (Jason Riedy)
5 Wed 2/4 Overview of Machines and Models
6 Mon 2/9 Shared Memory Machines
7 Wed 2/11 Distributed Memory Machines
8 Wed 2/18 Message Passing Programming (MPI) (part1, part2)
9 Mon 2/23 Shared Memory Programming (Sharks & Fish)
10 Mon 3/1 Sources of Parallelism and Locality
11 Wed 3/3 Unified Parallel C (UPC)
12 Fri 3/5 Titanium
13 Mon 3/8 Parallel Matrix Multiply
14 Wed 3/11 Dense Linear Algebra
15 Mon 3/15 Poisson: Jacobi, CG, SOR, FFT
16 Wed 3/18 Sparse Matrix-Vector Multiplication
17 Mon 3/29 Evaluating Parallel Applications (Lenny Oliker)
18 Wed 3/31 Tools for Parallel Computing (Tony Drummond and Osni Marques)
19 Mon 4/5 Poisson: Multigrid
20 Wed 4/7 Fluid flow in Biological Systems
21 Mon 4/12 Dynamic Load Balancing
22 Wed 4/14 Graph Partitioning
23 Mon 4/19 Tree-Based Computations
24 Wed 4/21 Computational Astrophysics (Julian Borrill)
25 Mon 4/26 Datamining and Sorting
26 Wed 4/28 Grid Computing
27 Mon 5/3 Climate Modeling
28 Wed 5/5 Trends in High Performance Computing
29 Mon 5/10 Poster Session (1-3pm)

　この講義では、計算機アーキテクチャの基礎から、MPIなどの
並列プログラム、数値計算アルゴリズムの基礎、などを行っている。
この講義でも、研究者を招いて招待講義がある。また、Poster Session
がある。

　Prof.Demmelの講義内容と比べると、GRIDやDynamic Load Balancingなどの
項目があることから、計算機システム寄りの講義となっている。

　　・割り当て課題
Homework 0 (1/21 - 1/28) - Literature Study: results
　：各人の持っているテーマと本講義の関連をまとめる
Homework 1 (1/28 - 2/11) - Optimizing Matrix Multiply
　：行列積コードの高速化。キャッシュブロッキングなどの技術を利用。
Homework 2 (2/21-3/8) - Particle Simulation (Fish)
　：The Fish Particle Simulation (FPS)を使った演習。MPI (distributed memory)
　pthreads (threaded, shared memory) 、OpenMP (vectorized / threaded shared
memory)
　などを利用する、プログラム演習。
Homework 3 (3/17-4/2) - Four Problems in Two Languages (UPC and Titanium)
　：NAS　Parallelベンチマークを用いて、UPCやTitaniumというPCクラスターを利用
した
　　演習。性能評価による問題発見能力を見る問題。
Final Projects (proposals due 4/14, posters due 5/10, papers due 5/12)
　：各人のテーマに関する報告

　この演習も専門性がきわめて高く、並列処理や数値計算分野以外の学生が
この講義の演習を行うのは容易ではない。

　　・講義プロジェクト最終報告
　Final Projects

　BLAST Implementation on BEE2
　　　- Chen Chang
　PFLAMELET; An Unsteady Flamelet Solver for Parallel Computers
　　　- Fabrizio Bisetti
　Parallel Pattern Matcher
　　　- Frank Gennari, Shariq Rizvi, and Guille Diez-Canas
　Parallel Simulation in Metropolis
　　　- Guang Yang
　A Survey of Performance Optimizations for Titanium Immersed Boundary
Simulation
　　　- Hormozd Gahvari, Omair Kamil, Benjamin Lee, Meling Ngo, and Armando
Solar
　Parallelization of oopd1
　　　- Jeff Hammel
　Optimization and Evaluation of a Titanium Adaptive Mesh Refinement Code
　　　- Amir Kamil, Ben Schwarz, and Jimmy Su
　Communication Savings With Ghost Cell Expansion For Domain Decompositions
　　Of Finite Difference Grids
　　　- C. Zambrana Rojas and Mark Hoemmen
　Parallelization of Phylogenetic Tree Construction
　　　- Michael Tung
　UPC Implementation of the Sparse Triangular Solve and NAS FT
　　　- Christian Bell and Rajesh Nishtala
　Widescale Load Balanced Shared Memory Model for Parallel Computing
　　　- Sonesh Surana, Yatish Patel, and Dan Adkins

　研究プロジェクトの色合いが濃いテーマが、多数伺える。

　レポートの内容はきわめて高く、多くのレポートは、
このまま国際学会に通るレベルにあると思われる。

---------------------------------------------------------------

■UCBの教授の研究活動について---その２ (July 22, 2005)

　UCBには、固有値問題でとても著名なProf.Beresford Neill Parlett
がいます。彼はとても高齢で、現在８０歳以上だと思われます。
彼のポジションは、名誉教授（Professor Emeritus）ですが、
大学内にオフィスをもつことが許されています。おそらく、給与も
大学からもらっている可能性があります。

　本日、定例のLAPACKセミナーに出席し、MRRRという新アルゴリズムの
SR8000上での性能報告を行いました。結論としては、予想していたとおり
二分法ルーチンのメイン部分がベクトル化できないことが判明し、
それがボトルネックとなっていると思われます。

これを解決するには、二分法をベクトル計算機向きに改善した＜多分法＞
というアルゴリズムが知られているのですが、それを実装することが
考えられます。

この多分法について、速度は速いのですが、精度が二分法より悪くなる
可能性があり、それが数理的には興味があるところです。

この話に関連して、本日のLAPACKセミナーにProf.Parlettが遅れて
参加してきたので、本日の研究報告のまとめをProf.Demmelが行いました。
Prof．Parlettは多分法に興味があるらしく、セミナー終了後に私の居室で
話をしていただきました。

その話は、まだ出版はしていないが、今扱っているアルゴリズムに関連する
新しいアルゴリズムがある、ということでした。興味があるなら、私の
オフィスをたずねてください、とのことでした。

個人的には、固有値問題の＜神様＞といえるProf.Parlettと話せただけでも
光栄でしたが、彼が８０歳を超えてもなお、すばらしいアルゴリズムの
アイデアを持っていることに感銘を受けました。

ここで、居室に来る前に、最初に彼からいくつか奇妙な質問をされました。
それは、「君は誰から給料をもらっているのか？」とか、「Prof.Demmelには
いわないでくれ」とかいう発言です。私は「日本政府から」と答えました。

後になって、その意味が分かってきました。というのは現在、Prof.Parlettは、
大学院生を持っていなく、さらに、（おそらく）研究資金を取ってきていない
（もしくは年齢制限から取れない）ため、アイデアはあっても、それを
証明するための人材が不足しているから勧誘に来た、ということだと思います。

個人的には、彼のすばらしいアイデアに興味があるので、夏休み明けに彼の
オフィスを訪問して、できるだけの協力をしたいと思います。

しかし、彼ほどの高齢の大家になってもなお、研究（論文出版）をし続けなく
てはならないという背景（．．．注１）があるようにも解釈できます。
そうだとすると、信じがたいことです。

UCBには、何らかの厳しい研究に対する評価システムがあるのではないか
と思われる事象でした。

（もちろん本件は、義務はなくとも有用なアイデア後世に残すために、
　研究ができる人材を探していることも要因としてあります。）

注１．．．彼は、MRRRの開発に貢献したI.Dhillonの指導教官でした。
　　既に報告したように、MRRRの著名な論文は、Parlettを第一著者
　　として出版されています。おそらくこれは、彼の成果として出版する
　　必要があったから、とも考えられます。また別の観点として、
　　MRRRはParlettのアイデアに対する貢献が大きかった（であろう）
　　ことも否定できません。

--------------------------------------------------------------------------

■米国の生活について

　□観光など

　おかげさまで、平日は研究、週末は観光という充実した日々を過ごさせ
て頂きました。

　Free Wayにのれるようになったので、土日は郊外に観光に出れるように
なりました。

　そこで、サンフランシスコ市内はもとより、サンフランシスコ北部の
Sausalitoという海辺の町や、バークレーの北にあるワインで有名なナパ
などを観光しました。

　特にSausalitoはお勧めです。ゴールデン・ゲート・ブリッジが海越し
や山越しに見えて美しいので、サンフランシスコ市内観光よりもお勧め
です。

　□子供の変化

　プリスクールに３ヶ月も平日毎日通うと、完全でないにしろ、子供が
英語を何かしら話すようになります。

　数のカウントや挨拶、YesやNoの返事は、英語で喋るようになりました。
発音はいいです。やっぱり発音は、小さいうちに身に着けないとだめであ
ることを痛感しました。

　発音が良すぎるので、時々何を言っているかよくわからないこともあり
ます。（もっとも、英語と日本語を混合して喋るので、何語をしゃべって
いるのかわからないことが主な原因ですが。）