EVJN__]J H_P:zSX_MVLVTKk&N[RVIZ:[Va+N[KWIZ+CXZV X[Q݊NC|5Ѭ27֪Ex5 1MC>tFW V^bSAF]FSA^_-FSPQw  'V U@S;BW@[ K@mHUF D^WWH[FHA[ V^pS%p"t'vU %!]d2OhJMiHWNlKMcQGyGTdGohPKnmThPKi/CPlQ7ObKNhDlQG!UKiJ7=JhEJy=/GTdGqnNGKAVb<QnGGcPKhVCyML/YlENh!V[}/MPyCKyPK`P[/_/F7/GQ~MLD/cI`cHd9aO:dcOg<<_"" ^#R(³D \WI^$=W>ƛ$D LĜ"L^:ƜRKPLD|fJkȓE!ʒsT=ҙC"T=T;T'ѹG,ׄX-k՗C(ȅsD:p.Ek|&ߟ]%g̘U&҅y i_1iA%T+Efgae%̝TiR" ̈́^$xgiW(ן|z!тA:>ҁ$̄C&ׁT+^$k̒sk֓B:̙_ k p w}r ~p txx1I2025-02-27 00:17:20 info: [Puppeteer Page] Got cookies, applying... m>-fhn4%&}8s95qhx4%0op/43$spvRQfv{9'w/v,c:1b/&6+3sjyj7y!=sftw:8o$38dvkJT]o]K]ZţL}]_H]̡heVM\WKJ_cE8;2025-02-27 00:17:20 info: [Puppeteer Page] Set 0 cookies on the page *n+m-y_e0mZ>{v*5MB*56md:o5a*G.xr/me{Xp>UD>|'{kx0ad{g7/`7+iryu[+*Ҷ!=7=8+ʸl/t7=GPUs have been favored for training deep learning models due to their highly parallelized architecture. As a result, most studies on training optimization focus on GPUs. There is often a trade-off, however, between cost and efficiency when deciding how to choose the proper hardware for training. In particular, CPU servers can be beneficial if training on CPUs was more efficient, as they incur fewer hardware update costs and better utilize existing infrastructure.

This paper makes three contributions to research on training deep learning models using CPUs. First, it presents a method for optimizing the training of deep learning models on Intel CPUs and a toolkit called ProfileDNN, which we developed to improve performance profiling. Second, we describe a generic training optimization method that guides our workflow and explores several case studies where we identified performance issues and then optimized the Intel® Extension for PyTorch, resulting in an overall 2x training performance increase for the RetinaNet-ResNext50 model. Third, we show how to leverage the visualization capabilities of ProfileDNN, which enabled us to pinpoint bottlenecks and create a custom focal loss kernel that was two times faster than the official reference PyTorch implementation.

}, year = {2023}, journal = {Journal of Machine Learning Theory, Applications and Practice}, volume = {1}, month = {04/2023}, url = {https://www.journal.riverpublishers.com/index.php/JMLTAP/article/view/268}, doi = {https://doi.org/10.13052/jmltapissn.2022.003}, } WW@bZK4yQPdn^eB\4yGB,XoS^,Tv|:VePZsM`TpJ\[BUqFI6`NdFU6TZuHlF6'"F|@B7S;#1FpBF3_ThI0XB&{Chabaejf~Npf7AT=\npbjqfgvdefpcj/JŻE'R2;#Wq!V#l`w:[|`@u+S|P|-F5lBk/_;tIv,Xz:{;t/}-~+},y*v7z db}l*{j+Ap!\}l,~)vq){v- ! Zz_ p.v;3%sN2!}.}|x0J/fq7(:m!/n"/b!/c%.l*6l&!x>:30"h$4xwk)}v0"x",n*['\bQ/W*&/kWZ*-hos݁ZٿMOTEoCAѿBTƿᕫ픳 EOcee]W' CILSKMUPU^ ]WAK^kR^GYIEKTF]XP\   FQ_ HAKRKq_ y g{azx }a|  YBq8;$JXZRԐA YV[ɋJ GJS AqT\J7F XFU 4[HHHHH@HPPڋHHHHiG@ `ԩUE YP΄QSv-Y [ [ \ CMMA.XZ W.PEBPZs PوQ F.EP GGZs  _ZsE PZs-Y [ [ M_)4[JP @[ŊA|R GPP!QJ__ _ ^FJHJZ V _\PQ^]W AρJ^\Z][ FDN _|R[X[_P\P!QJ__ _ ^FJHJZ V _\PQ^]W AρJ^\Z][ FDN _|R[X[_PPP!QJ__ _ ^FJHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJZ V _\PQ^]JHJGGcPT T8GFRA P FPG [ PZ ]X ŤX\vPp+.pġ_ JH΁h5͕̓gr-M>=Y'_eIWc[YBI$eCӅk6>jrYG>d?\fI4=唴W/