An Automatic Array Distribution Technique for Multi-Bank Memory of High Performance IoT Systems

AUTHORS

Jungseok Cho,Computer Engineering, Yeungnam Univ., South Korea
Jonghee M. Youn,Electrical & Electronic Engineering, Sunchon National Univ., South Korea
Doosan Cho,Electrical & Electronic Engineering, Sunchon National Univ., South Korea

ABSTRACT

Mobile devices designed for IoT exploit a variety of system optimization techniques to maximize performance while reducing power consumption. These technologies apply to communication modules, to memory system, and to the central processing unit. Most of the technologies are developed and applied at the design stage of the system, but not many technologies are applied at the system integration stage. In the system integration stage, the major power consuming parts are the communication part and the memory part. Since communication has a lot of variables depending on the network environment, there are some limited technologies available, but in the case of memory, a large benefit can be obtained depending on the technology applied. Mobile or IoT system’s memory structures can be classified in many different ways, of which we focus on multi-bank memory. Multi-bank memory refers to a method of dividing a large memory into several smaller memories. Using multi-bank memory can reduce operating power consumption and support parallel memory accesses, resulting in improved performance, which is often used in commercial products. A compiler should generate the access instruction and data placement properly. Therefore, the system performance is determined by the compiler performance. In this paper, we introduce a compiler optimization technique for multi-bank memory to overcome the compiler performance. The proposed technique can improve energy consumption by up to 20% in multi-bank memory systems.

 

KEYWORDS

Energy consumption, IoT system, Multi-bank memory, Memory structure, Compiler technique, System optimization

REFERENCES

[1]     Hartej Singh, Guangming Lu, Eliseu Filho, Rafael Maestre, Ming-Hau Lee, Fadi Kurdahi, and Nader Bagherzadeh, “Morphosys: case study of a reconfigurable computing system targeting multimedia applications,” In Proceedings of DAC, pp.573-578, (2000) DOI: 10.1145/337292.337583(CrossRef)(Google Scholar)
[2]     Jean-Francois Collard and Daniel Lavery, “Optimizations to prevent cache penalties for the intel Itanium 2 processor,” In Proceedings of the CGO, pp.105-114, (2003)
[3]     P. Grun, N. Dutt, and A. Nicolau, “Access pattern based local memory customization for low power embedded systems,” In Proceedings of the conference on DATE, pp.778-784, (2001) DOI: 10.1109/DATE.2001.915120(CrossRef)(Google Scholar)
[4]     M. Gupta and P. Banerjee, “Demonstration of automatic data partitioning techniques for parallelizing compilers on multicomputers,” IEEE Trans. Parallel Distrib. Syst., vol.3, no.2, pp.179-193, (1992) DOI: 10.1109/71.127259(CrossRef)(Google Scholar)
[5]     Chunho Lee, Miodrag Potkonjak, and William H. Mangione-Smith, “MediaBench: a tool for evaluating and synthesizing multimedia and communications systems,” In Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture (MICRO 30), Washington, DC, USA, 330-335, (1997) DOI: 10.1109/MICRO.1997.645830(CrossRef)(Google Scholar)
[6]     Hyunchul Park, Kevin Fan, Manjunath Kudlur, and Scott Mahlke, “Modulo graph em­bedding: mapping applications onto coarse-grained reconfigurable architectures,” In Proceedings of CASES, pp.136-146, (2006) DOI: 10.1145/1176760.1176778(CrossRef)(Google Scholar)
[7]     A. Hatanaka and N. Bagherzadeh, “A modulo scheduling algorithm for a coarse-grain re­configurable array template,” In Proceedings of the IPDPS, pp.1-8, (2007) DOI: 10.1109/IPDPS.2007.370371(CrossRef)(Google Scholar)
[8]     Yoonjin Kim, Mary Kiemb, Chulsoo Park, Jinyong Jung, and Kiyoung Choi, “Resource sharing and pipelining in coarse-grained reconfigurable architecture for domain-specific optimization,” In Proceedings of DATE pp.12-17, (2005) DOI: 10.1109/DATE.2005.260(CrossRef)(Google Scholar)
[9]     Kathryn McKinley and Steve Carr, “Improving data locality with loop transformations,” ACM Transactions on Programming Languages and Systems, vol.18, pp.424-453, (1996) DOI: 10.1145/233561.233564(CrossRef)(Google Scholar)
[10]  B. Mei, S. Vernalde, D. Verkest, H. De Man, and R. Lauwereins, “ADRES: An architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix,” In Proceeding of Field Programmable Logic, FPL, pp.61-70, (2003) DOI: 10.1007/978-3-540-45234-8_7(CrossRef)(Google Scholar)
[11]  Michael Joseph Wolfe, “High performance compilers for parallel computing,” Addison-Wesley Longman Publishing Co., USA, (1995)
[12]  Michael E. Wolf, Dror E. Maydan, and Ding-Kai Chen, “Combining loop transformations considering caches and scheduling,” In MICRO, pp.274-286, (1996)
[13]  Michael E. Wolf and Monica S. Lam, “A data locality optimizing algorithm,” In Proceedings of the ACM SIGPLAN, pp.30-44, (1991)
[14]  Wei Li, “Compiling for numa parallel machines”, Ph.D. dissertation, Ithaca, NY, USA, (1993)
[15]  X. Pan, A. Bacha and R. Teodorescu, “Respin: Rethinking near-threshold multiprocessor design with non-volatile memory,” IEEE International Parallel and Distributed Processing Symposium (IPDPS), Orlando, FL, , pp.265-275, (2017) DOI: 10.1109/IPDPS.2017.109(CrossRef)(Google Scholar)
[16]  S. Lumetta, L. Murphy, X. Li, D. Culler, and I. Khalil, “Decentralized optimal power pricing: The development of a parallel program,” In IEEE Parallel and Distributed Technology, pp.240-249, (1993) DOI: 10.1109/SUPERC.1993.1263450(CrossRef)(Google Scholar)
[17]  Kai Li, “Shared virtual memory on loosely coupled multiprocessors,” Ph.D. dissertation, (1986)
[18]  Daniel Edward Lenoski, “The design and analysis of DASH: a scalable directory-based multiprocessor,” Ph.D. dissertation, Stanford, CA, USA, (1992)
[19]  Chau wen Tseng, “Compiler optimizations for eliminating barrier synchronization,” ACM SIGPLAN, pp.144-155, (1995) DOI: 10.1145/209936.209952(CrossRef)(Google Scholar)
[20]  V. Balasundaram and K. Kennedy, “A technique for summarizing data access and its use in parallelism enhancing transformations,” In Proceedings of the ACM SIGPLAN, pp.41-53, (1989) DOI: 10.1145/73141.74822(CrossRef)(Google Scholar)

CITATION

  • APA:
    Cho,J.& Youn,J.M.& Cho,D.(2019). An Automatic Array Distribution Technique for Multi-Bank Memory of High Performance IoT Systems. World Journal of Wireless Devices and Engineering, 3(1), 15-20. 10.21742/WJWDE.2019.3.1.03
  • Harvard:
    Cho,J., Youn,J.M., Cho,D.(2019). "An Automatic Array Distribution Technique for Multi-Bank Memory of High Performance IoT Systems". World Journal of Wireless Devices and Engineering, 3(1), pp.15-20. doi:10.21742/WJWDE.2019.3.1.03
  • IEEE:
    [1] J.Cho, J.M.Youn, D.Cho, "An Automatic Array Distribution Technique for Multi-Bank Memory of High Performance IoT Systems". World Journal of Wireless Devices and Engineering, vol.3, no.1, pp.15-20, Nov. 2019
  • MLA:
    Cho Jungseok, Youn Jonghee M. and Cho Doosan. "An Automatic Array Distribution Technique for Multi-Bank Memory of High Performance IoT Systems". World Journal of Wireless Devices and Engineering, vol.3, no.1, Nov. 2019, pp.15-20, doi:10.21742/WJWDE.2019.3.1.03

ISSUE INFO

  • Volume 3, No. 1, 2019
  • ISSN(p):2207-5968
  • ISSN(e):2207-5976
  • Published:Nov. 2019

DOWNLOAD