學術講座公告:Analysis and Optimization of Parallel Data Access on Big Data File Systems

KEY SPEAKER報告人:Dr. Jun Wang (University of Central Florida, USA)
TALK PLACE: 信息樓229
TALK TIME報告時間:2015年12月18日13:30
TALK TITLE報告題目:Analysis and Optimization of Parallel Data Access on Big Data File Systems
?
TALK ABSTRACT報告摘要:
In this work, we study parallel data access on distributed file systems, e.g., the Hadoop file system. Our experiments show that parallel data read requests are often served data remotely and in an imbalanced fashion. This results in a serious disk access and data transfer contention on certain cluster/storage nodes. We conduct a complete analysis on how remote and imbalanced read patterns occur and how they are affected by the size of the cluster. We then propose a novel method to Optimize Parallel Data Access on Distributed File Systems referred to as Opass. The goal of Opass is to reduce remote parallel data accesses and achieve a higher balance of data read requests between cluster nodes. To achieve this goal, we represent the data read requests that are issued by parallel applications to cluster nodes as a graph data structure where edges weights encode the demands of data locality and load capacity. Then we propose new matching-based algorithms to match processes to data based on the configurations of the graph data structure so as to compute the maximum degree of data locality and balanced access. Our proposed method can benefit parallel data-intensive analysis with various parallel data access strategies. Experiments are conducted on PRObEs Marmot 128-node cluster testbed and the results from both benchmark and well-known parallel applications show the performance benefits and scalability of Opass.
?
報告人簡介:
王軍博士現任美國University of Central Florida大學電子工程與計算機科學系計算機系統結構和存儲實驗室主任。王軍博士是美國國家科學基金會杰出青年職業獎(NSF CAREER AWARD) 和美國能源部杰出青年獎獲得者 (DOE EARLY CAREER PRINCIPAL INVESTIGATOR AWARD)。他已經在相關領域的高級雜志和一流會議上發表了80多篇論文, 包括IEEE Transactions Computers和 IEEE Transactions on Parallel and Distributed Systems(共12篇,其中通訊作者和第一作者11篇), HPDC, ICS, EUROSYS, MIDDLEWARE, FAST, IPDPS 等等。王教授的論文多次被世界頂級研究人員引用, 包括 UIUC,微軟Research, IBM T. J. Watson Research。他所發表論文由Google Scholar統計被期刊引用次數已超過7000次以上(統計到2015年1月31日為止)。王軍博士領導的計算機系統結構和存儲實驗室在過去五年內主持7個科研項目, 共參加十多個科研項目,總共獲得超過500萬美元的美國聯邦基金研究支助。王教授目前承擔三個美國國家科學基金研究項目和一個美國航空航天局研究項目。王教授最新主持驗收了一個美國國家科學基金研究項目,三年投資近四十萬美元開發研究新一代云計算系統平臺來有效支持超級高性能大數據分析的應用。王教授是這個項目首席和唯一的研究者。王軍博士多次擔任美國國家科學基金會評委(總11次), 美國能源部研究項目評委和美國衛生組織研究項目評委,同時擔任IEEE Transactions on Parallel and Distributed Systems, 和International Journal of Parallel, Emergent and Distributes Systems (IJPEDS)期刊編委,和多個國際學術會議的程序委員會委員,是第一屆國際存儲, 虛擬化,性能和能源 會議(SPEED2008)的組織者,第10屆IEEE NAS網絡,體系結構和存儲會議的會議主席, 第7屆IEEE NAS網絡,體系結構和存儲會議的存儲項目主席,和第23屆IEEE ICCCN會議的Cyber Physical System Cloud panelist。王軍博士指導畢業的八個博士生均在美國一流IT公司任職,包括GOOGLE, APPLE, MICROSOFT 和 EMC.?

( 講座具體信息以數字平臺通知為準!)

掃碼分享本頁面
掃碼分享本頁面