Bigdata and the Internet of Things(IoT) Opportunities and Challenges Thiab Taha Computer Science Department University of Georgia Athens, GA USA [email protected] The 8th International Conference on Information Technology ICIT 2017 Internet of Things May 17-18, 2017 Amman, Jordan
1 What is Internet of Things(IOT)? https:// www.nsf.gov/pubs/2017/nsf17072/nsf17072.jsp?WT.mc_id=USN SF_25&WT.mc_ev=click IoT is generally understood to refer to the internetworking of physical devices that contain electronics, sensors, actuators and software, and that are able to collect and exchange data about, and in some cases interact with, the physical environment. In brief: The IOT refers to the devices that collect & transmit data via the internet. 2
http://www.dyogram.com/2017/04/internet-of-things-io t-market-potential-trends-in-2017-and-beyond/ 3 Internet of Things(IOT) The IoT faces several challenges including security, privacy, scalability, design complexity, safety resulting from the lack of human control over systems, software flexibility, etc. Another major challenges of IoT is the processing, storing, and analyzing the large amount of data that comes from so many different resources. On the other hand, the IoT has many applications that are extremely useful in our daily life, such as smart cars, smart cities, home appliances and security, health
tracking wearable devices, weather monitors, etc. The IOT has the potential to increase efficiency, accuracy, safety, and convenience through increased interconnection and intelligence of the integrated physical and computing environments. It has the potential to impact every aspect of daily life. 4 Internet of Things(IOT) At the same time, the trend toward connecting "everything" is rapidly expanding the perimeter or surface of concern that must be secured, and the potential for exposure of personally-identifiable information.
5 Protecting the Internet of Things While the car hacking work garnered the most public attention, there are other important security weaknesses. Tadayoshi Kohno was an author on the first publications demonstrating the security risks of wirelessly reprogrammable pacemakers and defibrillators. Doctors disabled wireless mechanism in Dick Cheneys(Former US Vice President)pacemaker to thwart hacking. Kohno stresses that the benefits of these devices outweigh the security risks and that patients should have no qualms using them. However, he believes that device manufacturers must improve the security of current and future devices.
6 Given the projected impact of IoT in nearly every industry sector including healthcare, agriculture/farming, manufacturing, energy, transportation, communication, security, finance, clothing, and sports, foundational precompetitive research is important to enable designs and applications that meet critical performance, security, and privacy guarantees. 7
Internet of Things(IOT) According to the ABI Research's latest data on the Internet of Everything (IoE) shows that there are more than 10 billion wirelessly connected devices in the market today; with over 30 billion devices expected by 2020 (https ://www.abiresearch.com/press/more-than-30-billion-devices-will-wirelessly-c onne /). This year we will have 4.9 billion connected things(forbes.com) "The emergence of standardized ultra-low power wireless technologies is one of the main enablers of the IoE, with semiconductor vendors and standards bodies at the forefront of the market push, helping to bring the IoE into reality," said Peter Cooney, practice director. "The year 2013 is seen by many as the year of the Internet of Everything, but it will still be many years until it reaches its full potential. The next 5 years will be pivotal in its growth and establishment as a tangible concept to the consumer.
8 9 Gartner, Inc. forecasts that 8.4 billion connected things will be in use worldwide in 2017, up 31 percent from 2016, and will reach 20.4 billion by 2020. Total spending on endpoints and services will reach almost $2 trillion in 2017. Regionally, Greater China, North America and Western Europe are driving the use of connected things and the three regions together will represent 67 percent of the overall Internet of Things (IoT) installed base in 2017. Consumer Applications to Represent 63 Percent of Total IoT Applications in 2017 The consumer segment is the largest user of connected things with 5.2 billion units in 2017, which represents 63 percent of the overall number of applications in
use (see Table 1). Businesses are on pace to employ 3.1 billion connected things in 2017. "Aside from automotive systems, the applications that will be most in use by consumers will be smart TVs and digital set-top boxes, while smart electric meters and commercial security cameras will be most in use by businesses," said Peter Middleton, research director at Gartner. http://www.gartner.com/newsroom/id/3598917 Gartner, Inc. (NYSE: IT) is the world's leading information technology research and advisory company. 10 Table 1: IoT Units Installed Base by Category (Millions of Units)
Category 2016 2017 2018 2020 Consumer 3,963.0
5,244.3 7,036.3 12,863.0 Business: CrossIndustry 1,102.1 1,501.0 2,132.6 4,381.4
Business: Vertical-Specific 1,316.6 1,635.4 2,027.7 3,171.0 Grand Total 6,381.8
8,380.6 11,196.6 20,415.4 11 To give you some perspective on IoT being the next big thing, here are what analysts are predicting for the IoT market: Bain Capital(an investment firm) predicts that by 2020 annual revenues could exceed $470B for the IoT vendors selling the hardware, software and comprehensive solutions. McKinsey & Company(consulting firm) estimates the total IoT market size in 2015 was up to $900M, growing to $3.7B in 2020 and has a potential economic impact of $2.7 to $6.2T until 2025. General Electric predicts investment in the Industrial Internet of Things (IIoT) is expected to top $60 trillion during the next 15 years.
IHS(Indian Health Services, USA) forecasts that the IoT market will grow from an installed base of 15.4 billion devices in 2015 to 30.7 billion devices in 2020 and 75.4 billion in 2025 12 https:// www.facebook.com/MaciejKranzInnovation/posts/1819969151611805. Roundup Of Internet Of Things Forecasts And Market Estimates, 2016 Below is a global market potential (in billions) of IoT devices owned from 2015 to 2025. In those 10 years, that market will increase by 489.55% globally . . . Wowzer! We really should care about IoT because empirical data show a promising future and an exponential market growth. With 75.5
billions IoT devices in the world by 2025, every person and business will own and use many devices. We should be actively engaged and knowledgable about this space. Not only should we be engaged from an investing perspective, but IoT can totally change your life or business for the better. 13 In addition to smart meters, applications tailored to specific industry verticals (including manufacturing field devices, process sensors for electrical generating plants and real-time location devices for healthcare) will drive the use of connected things among businesses through 2017, with 1.6 billion units deployed. From 2018 onwards, cross-industry devices, such as those targeted at smart buildings (including LED lighting, HVAC and physical security systems) will take the
lead as connectivity is driven into higher-volume, lower cost devices. In 2020, crossindustry devices will reach 4.4 billion units, while vertical-specific devices will amount to 3.2 billion units. Business IoT Spending to Represent 57 Percent of Overall IoT Spending in 2017 While consumers purchase more devices, businesses spend more. In 2017, in terms of hardware spending, the use of connected things among businesses will drive $964 billion (see Table 2). Consumer applications will amount to $725 billion in 2017. By 2020, hardware spending from both segments will reach almost $3 trillion. 14 Table 2: IoT Endpoint Spending by Category (Millions of Dollars) Category 2016
2017 2018 2020 Consumer 532,515 725,696 985,348 1,494,466 Business: Cross-Industry 212,069 280,059 372,989 567,659 Business: Vertical-Specific 634,921 Grand Total 683,817
736,543 863,662 1,379,505 1,689,572 2,094,881 2,925,787 Source: Gartner (January 2017) http://www.gartner.com/newsroom/id/3598917 15 16
IHS Automotive: The number of cars connected to the Internet worldwide will grow more than six fold to 152 million in 2020 from 23 million in 2013. Navigant Research: The worldwide installed base of smart meters will grow from 313 million in 2013 to nearly 1.1 billion in 2022. Morgan Stanley: Driverless cars will generate $1.3 trillion in annual savings in the United States, with over $5.6 trillions of savings worldwide. Machina Research: Consumer Electronics M2M connections will top 7 billion in 2023, generating $700 billion in annual revenue. On World: By 2020, there will be over 100 million Internet connected wireless light bulbs and lamps worldwide up from 2.4 million in 2013. Juniper Research: The wearables market will exceed $1.5 billion in 2014, double its value in 2013-17 18
Farmers are more closely monitoring crops with the help of sensor networks to ensure a better yield, and factory owners are monitoring operations to spot maintenance issues without requiring costly shutdowns. Major contractors have begun to add sensors to buildings and other large infrastructure as theyre being built, hooking them up to simulation engines to spot flaws, inefficiencies, and costly over-engineering before the problems are baked into the design. A few forward-looking cities such as Singapore are using IoT to monitor water networks for leaks, and the shipping industry is beginning to add sensors to crates of perishable food or medicines. 19
NSF: IoT is expected to become ubiquitous, with implementations in the smart home management of energy use, control of appliances, monitoring of food and other consumables; consumer applications - health and fitness monitoring, condition diagnosis; manufacturing and industrial settings - supply chain management, robotic manufacturing, quality control, health and safety compliance; utility grids and other critical infrastructure - grid optimization, automated fault diagnosis, automated cyber security monitoring and response; and automotive/transportation - optimization for driving conditions, assessing driver alertness, collision/accident avoidance, managing vehicle health. Market verticals that are potentially impacted by innovations in this area include Connected Cities and Homes, Smart Transportation, Smart Agriculture, Industrial IoT, and Retail IoT. In the home, Energy savings lead the way here, followed by security and
remote monitoring and automation gizmos such as a timed sprinkler system. Proposals are encouraged that address key challenges across the full range of IoT applications 20 In March 29, 2017, the National Science Foundation(NSF) calls for what is called:The Dear Colleague Letter (DCL) encourages collaborations between industry, academe, and government in research related to IoT specifically and, more broadly, cyberphysical systems. The aim is to establish multi-Industry-University Cooperative Research Centers (IUCRCs) that, in collaboration with their industry partners, are capable of collectively addressing large-scale and cross-disciplinary challenges in the broad context of IoT.
NSF therefore welcomes and encourages proposals in response to the IUCRC program solicitation, NSF 17-516, in the areas outlined in this DCL. 21 NSF: Potential areas of precompetitive research that are of interest include, but are not limited to:
Mobile technologies and applications; Healthcare and biomedical technologies; Smart grids and energy management; IoT Platforms, sensors, controls, and actuators; Agriculture and farming-based applications; Smart City/Community applications; Transportation and traffic management systems; Industrial and Manufacturing applications;
Metrics, measurements, and benchmarking; Standards, practices, and policies (e.g., legal, regulatory); and Trust, security, and privacy in IoT. 22 NSF: The successful realization of an IoT-enabled world will thus depend not only on solving technical and engineering challenges, but will also require significant collaboration among academe, industry, and government to develop thoughtful and well-crafted standards, practices, and policies (including legal and regulatory) that take into account the complexities and societal implications of the IoT. To this end, any proposed IUCRC in any area related to IoT must
include a clear and compelling plan to address relevant trust, security, and privacy issues within the overall mission of the proposed Center. 23 Technical challenges also remain before IoT will reach its true potential. Yet all the key technologies have passed the thresholds required for substantial ROI. Sensors, wireless radios, and processors are getting smaller, cheaper, and more power efficient. The hard part is hooking it all up. There are a good number of open source software projects for the Internet of Things: (https://www.linux.com/news/21-open-source-projects-iot) Home Assistant(https://home-assistant.io/ )-- This up and coming grassroots
project offers a Python-oriented approach to home automation. See our recent profile on Home Assistant. Mainspring(http://www.m2mlabs.com/framework) -- M2MLabs Java-based framework is aimed at M2M communications in applications such as remote monitoring, fleet management, and smart grids. Like many IoT frameworks, Mainspring relies heavily on a REST web-service, and offers device configuration and modeling tools. 24 Physical Web/Eddystone(https://google.github.io/physical-web/) -- Googles Physical Web enables Bluetooth Low Energy (BLE) beacons to transmit URLs to your smartphone.
Its optimized for Googles Eddystone BLE beacon, which provides an open alternative to Apples iBeacon. The idea is that pedestrians can interact with any supporting BLEenabled device such as parking meters, signage, or retail products. 25 The Thing System(http://thethingsystem.com/) -- This Node.js based smart home steward software claims to support true automation rather than simple notifications. Its self-learning AI software can handle many collaborative M2M actions without requiring human intervention. The lack of a cloud component provides greater security, privacy, and control. ThingSpeak(https://thingspeak.com/) -- The five-year-old
ThingSpeak project focuses on sensor logging, location tracking, triggers and alerts, and analysis. ThingSpeak users can tap a version of MATLAB for IoT analysis and visualizations without buying a license from Mathworks. 26 Title: Tech Insider Webinar(05/11/2017): Test & Reliability Challenges in the Internet of Things: The Internet of Things (IoT) is an extremely fragmented market and can be defined to include anything from sensors to small servers more than 30 billion of them by 2020. It has become crucial for today's IoT chips to use a range of new solutions during the design stage to ensure robustness of manufacturing test, field
reliability, and security. Design-for-testing (DFT) engineers need to use new test and reliability solutions to enable power reductions during test, concurrent test, isolated debug and diagnosis, pattern porting, calibration, and uniform access. Moreover, per-unit price of IoT devices remains a key factor in high volume production. Thus, minimizing test cost while accommodating these technical issues is a major challenge for IoT. This webinar, besides discussing the key trends and challenges of IoT, will cover solutions to handle the wide range of potential robustness challenges during all periods of the IoT lifecycle from design to post silicon bring-up, volume production, and in-system operation. 27 28 29
https://en.wikipedia.org/wiki/Big_data Big data is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them. Challenges include capture, storage, analysis, data curation, search, sharing, transfer, visualization, querying, updating and information privacy. The term "big data" often refers simply to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set. "There is little doubt that the quantities of data now available are indeed large, but thats not the most relevant characteristic of this new data ecosystem. "Analysis of data sets can find new correlations to "spot business trends, prevent diseases, combat crime and so on. Scientists, business executives, practitioners of
medicine, advertising and governments alike regularly meet difficulties with large data-sets in areas including Internet search, finance, urban informatics, and business informatics. Scientists encounter limitations in e-Science work, including meteorology, genomics, connectomics, complex physics simulations, biology and environmental research. 30 Data sets grow rapidly - in part because they are increasingly gathered by cheap and numerous information-sensing mobile devices, aerial (remote sensing), software logs, cameras, microphones, radio-frequency identification (RFID) readers and wireless sensor networks.The world's technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s;as of 2012, every day 2.5 exabytes(2.51018) of
data are generated. 31 32 Visualization of daily Wikipedia edits created by IBM. At multiple terabytes in size, the text and images of Wikipedia are an example of big data. 33 34
Big Data can be described by the following characteristics: Volume The name Big Data itself contains a term which is related to size and hence the Characteristic. (18.9 Billion Network Connections, 6 Billion People have cell phones, 2.5 Quintillion Bytes of Data are created everyday, 40 Zetabytes of data will be created in 2020 this was taken from http://www.ibmbigdatahub.com/infographic/four-vs-big-data) Variety - Different forms of Data Types(structured, unstructured, text, multimedia) Velocity - The speed of generation of data or how fast the data is generated and processed to meet the demands and the challenges which lie ahead in the path of growth and development. Analysis of Streaming Data, millisecond to seconds to respond. Veracity - The quality of the data being captured can vary greatly. Accuracy of analysis depends on the veracity of the source data. Uncertainty of Data(how much of their data was accurate) 35
Complexity - Data management can become a very complex process, especially when large volumes of data come from multiple sources. These data need to be linked, connected and correlated in order to be able to grasp the information that is supposed to be conveyed by these data. This situation, is therefore, termed as the complexity of Big Data. 36 37 Big data challenges Difficult to scale computing performance and storage capacity with the increased data size
80% of data unstructured and growing 15 times faster Data expected to grow 40 zettabytes by 2020 Total data in WWW was estimated 4 zettabytes Challenge is much of the insight and actionable pattern lies in unstructured data Data comes from variety of sources like sensors, logs, social media, pictures, videos, transaction records etc. Expose key insights for improving customer experiences, enhancing marketing effectiveness and mitigating financial risks. 38 CISE Distinguished Lecture Series - Kathy Yelick - May 20, 2015 Kathy Yelicks research is in programming languages, compilers, and algorithms for parallel machines.
She earned her Ph.D in Electrical Engineering and Computer Science from MIT and has been a professor at UC Berkeley since 1991 with a joint research appointment at LBNL since 1996. She was Director of the National Energy Research Scientific Computing Center (NERSC) from 2008 to 2012 and currently leads the Computing Sciences directorate at LBNL In the same way that the Internet has combined with web content and search engines to revolutionize every aspect of our lives, the scientific process is poised to undergo a radical transformation based on the ability to access, analyze, and merge large, complex data sets. Scientists will be able to combine their own data with that of other scientists to validate models, interpret experiments, re-use and reanalyze data, and make use of sophisticated mathematical analyses and simulations to drive the discovery of relationships across data sets. This
scientific web will yield higher quality science, more insights per experiment, an increased democratization of science, and a higher impact from major investments in scientific instruments. 39 At the same time, the traditional growth in computing performance is slowing, starting with flattening of processor clock speeds, but eventually also in transistor density. These trends will limit our ability to field some of the largest systems, e.g., exascale computers, but the cost in hardware, infrastructure and energy will limit the growth in computing capacity per dollar at all scales. Fundamental research questions exist in computer science to extend the limits of current computing technology through new architectures, programming models and algorithms, but also to explore options for
post-Moore computing. While the largest computing capabilities have traditionally been focused on modeling and simulation, some of the data analysis problems arising from scientific experiments will also require huge computational infrastructure. Thus, a sophisticated understanding of the workload across analytics and simulations is needed to understand how future computer systems should be designed and how technology and infrastructure from other markets can be leveraged. 40 In her talk, she gave some examples of how science disciplines such as biology, material science and cosmology are changing in the face of their own data explosion, and how mathematical analyses, programming models, and workflow tools can enable different types of scientific exploration.
This will lead to a set of open questions for computer scientists due to the scale of the data sets, the data rates, inherent noise and complexity, and the need to fuse disparate data sets. Rather than being at odds with scientific simulation, many important scientific questions will only be answered by combining simulation and observational data, sometimes in a real-time setting. Along with scientific simulations, experimental analytics problems will drive the need for increased computing performance, although the types of computing systems and software configurations may be quite different. 41 HPC
In nondistributed architecture, data stored in a central server and applications access this central server More compute power and storage is added as data grows Querying against a huge, centrally located data makes system slow, inefficient and performance suffers. 42 HPC meets big data Analytics methods applied to established HPC domains in industry, government, and academia
High-end commercial analytics pushing up into HPC e.g Paypal The journey from science to industry/commerce can be relatively short 43 HPC and Big data boundary dissolving IDC( International Data Corporation) study shows two third of HPC sites are performing big data analysis HPC vendors are increasingly targeting commercial markets, whereas commercial vendors are seeing HPC requirements. The goal is to successfully bring the two data-intensive computing paradigms together to reinvent the wheel
Producing environments have performance of HPC and usability and flexibility of the commodity big data stack IDC termed it as HPDA 44 HPDA : Data intensive simulation and analytics HPDA = tasks involving sufficient data volumes and algorithmic complexity to require HPC resources Structured data, unstructured data, or both Regular (e.g., Hadoop) or irregular (e.g., graph) patterns Smarter mathematical algorithms Higher security and more realisms 45
Factors driving HPDA High complexity Allows companies to aim more complex, intelligent questions at their data infrastructures advantages in todays increasingly competitive markets useful to discover unknown patterns and relationships in data e.g fraud detection, to reveal hidden commonalities within millions of archived medical records Transition from static searches to higher-value, dynamic pattern discovery. High time criticality Information not available quickly have little or no value weather report for tomorrow is useless if its unavailable until the day after tomorrow
Data analysis using HPC technology corrected this problem. 46 Robust growth in HPC and HPDA individual computers in a cluster are called nodes A cluster size is between 16 and 64 nodes, or from 64 to 768 cores. clusters connected via a high speed interconnect fabric, typically InfiniBand According to IDC, the server market will continue to grow at rate of 7.3% CAGR from 2011 2016 Generate about $14.6 billion in revenues by 2016 47
HPDA Storage will continue to be the fastest-growing segment within HPC Growing nearly 9% through 2016, and are projected to become a $5.6 billion market IDC forecasts market for HPDA servers will grow at 13.3% CAGR from 2012 to 2016, and will approach $1.4 billion in 2017 revenue. HPDA storage revenue will near $800 million by 2017, with growth of 18.1% CAGR by 2016 48 Software for Building HPDA Solutions
The ability to ingest data at high rates, then use analytics software to create competitive or innovation advantages Use of Hadoop with HPC infrastructure High-performance, scalable to support near real-time analytics and HPDA capabilities Reduce the processing time for the growing volumes of data in todays distributed computing environments IT organizations are using Hadoop as a cost-effective data factory for collecting, managing and filtering large data 49 Hadoop overview Open-source software framework, processes data-intensive
workloads having large volumes of data and distributed Hadoop configuration is based on three parts: Hadoop Distributed File System (HDFS) Hadoop MapReduce application model Hadoop Common The initial design goal was to use commodity technologies to form large clusters Capable of providing the cost effective high I/O performance 50 Hadoop HDFS have these key characteristics:
Use large files Perform large block sequential reads for analytics processing Access large files using sequential I/O in append-only mode. HDFS splits large files into blocks and distributes them across servers HDFS replicates each data block on more than one server Data and processing are distributed and is done through MapReduce 51 Big Data Consulting Services and Training center at UGA: The primary goal of this project is to establish a Big Data Consulting Services and Training center at the University of Georgia(Georgia(
http://research.franklin.uga.edu/bigdata/) with the following goals in mind: To learn about the needs of the researchers across disciplines at the University of Georgia who are involved with big data; To gather and organize information concerning the local and national resources available for collecting, processing, storing, accessing, sharing, and curating such data; To educate researchers on the available resources; To provide a web-based, searchable, and sustainable resource that contains the necessary information in the form of tutorials, applications, human contacts, and videos on the local and national resources and how to access and use them. The consulting services will emphasize three broad areas: (1) data management, (2) high performance computing, and (3) cross-disciplinary coordination around big data.
52 1. Data Management In the area of data management, it is becoming increasingly evident that the simple storage provisioning that has worked in the past may no longer be sufficient. Data management does certainly involve storage, but also a host of other issues including those associated with data ownership, curation, history, and organization; dynamic data streams; data access, availability, and security; ontologies, formatting, and standards. The consulting services will focus on identifying and organizing resources to help researchers to account for these issues, thus enabling them to develop strong data management plans; to effectively discover and evaluate discipline-specific repositories; and to structure data for maximum efficiency. A data management plan documents the processes for handling the flow of data from collection through analysis, including
software and hardware systems, as well as quality control and validation of these systems. The center will collaborate with the University of Georgia Libraries to address data management issues. The library website will be expanded and integrated with the center website. http://guides2.galib.uga.edu/subject-guide/21-Data-Management-Plans 53 2. High Performance Computing In the area of high performance computing, our aim will be to assist researchers in identifying the most appropriate resource for addressing their research and educational needs. Further, the center will assist in training researchers on available software for high performance computing environments such as MPI (message passing interface),
OpenMP, CUDA (compute unified device architecture) and Hadoop that supports the processing of large data sets in a distributed computing environment. We will do so through a focus on MapReduce which is a programming model for processing large data sets with a parallel, distributed algorithm on a cluster. In addition, the center will assist researchers in exploring a variety of national resources, including the Extreme Science and Engineering Discovery Environment (XSEDE), the Titan supercomputer at Oak Ridge National Laboratory, and the open source Integrated Rule-Oriented Data System (iRODS) for managing, sharing, publishing, and preserving digital data, as well as the University of Georgias own dedicated Hadoop environments in the Georgia Advanced Computing Research Center (GACRC) and the Information Technology Services (ITS) Group at the Terry College of Business. 54
3. Cross-Disciplinary Coordination around Big Data The center will actively promote its services throughout the University of Georgia. This effort will complement the many efforts throughout the university that address big data, which include a semi-annul interdisciplinary big data workshop (organized by GACRC, Office of the VP for Research, Office of the Enterprise Information Technology, and the Computer Science and Management Information Systems Departments). Also, UGA has already won a number of grants on issues associated with big data, including two recent NSF CAREER awards focusing on dealing with big data (Krashen & Perdisci) and an NSF-funded Research Coordination Network (RCN) for understanding the management issues associated with cyberinfrastructure (Berente). Also, UGA is a leader in energy informatics, which is a new field involving the use of big data to solve energy problems (and, among other things, involves a public-private partnership in this effort known as the Georgia Energy Informatics Cluster, or GEIC). Finally, researchers at the university have begun looking into ways that big data can be leveraged in the social sciences. In UGAs Terry College of Business, for example, the Management Information Systems (MIS) department has developed a
formal arrangement with Hortonworks (a Hadoop commercialization and service organization) 4and is initiating a project with XSEDE leadership to investigate techniques for eliciting multidimensional social network data from unstructured data sources. The Big Data Consulting Services and Training Center can act as a catalyst to bring these multiple initiatives more closely together, and to leverage the variety of disparate efforts to make the entire science enterprise more productive. http://news.uga.edu/releases/article/uga-researchers-receive-nsf-funding-to-conduct-math-and-malware-research/ http://www.terry.uga.edu/news/releases/uga-helps-create-public-private-coalition-to-boost-georgias-energy-efficien 4 http://hortonworks.com/ 55 Many of the elements are already in place at UGA to support data intensive research. For example, the Georgia Advanced Computing Resource Center
(GACRC)(http://gacrc.uga.edu/) and the CUDA Teaching Center(http://teachingcuda.uga.edu) and the CUDA Research center(http://cuda.uga.edu) already offer resources and some training on the use of high performance computing; the UGA Libraries offer services to help researchers write data management plans; and there are a variety of campus-wide and national resources that researchers can use for their data generation, analysis, and management needs. However, currently there is no clearinghouse or coordinating organization to which any researcher on campus can turn into in order to best incorporate data-intensive approaches into their research. To meet these needs, we plan to a) form a campus-wide committee comprised of faculty, administrators, information technology (IT) staff, librarians, and students; b) survey the campus in order to get a better understanding of data management needs; c) visit with individuals who are working with data that are large, long-term, and/or structurally complex to learn about their needs in more detail; d) identify the campus and national resources that are available for big data users. We will
then develop a set of web resources that can serve as both a central resource for the campus and a starting point for further efforts to enhance UGA's cyberinfrastruture capabilities. Finally, we will continue to conduct the popular university-wide big data events under the auspices of the center to aid in communicating, coordinating, and disseminating activity around data-intensive research. 56 Intellectual Merit: Across disciplines, researchers are working to meet the opportunity of big data by incorporating data-intensive approaches in their research. These disciplines include a wide range of study areas, including, but not limited to, physical sciences, mathematics and computer science, social sciences, engineering, audiovisual arts, as well as the medical and biological sciences.
These disciplines are focused on using data to advance research in widely varying application domains from internet-based research and scientific computing to medical records management and energy utility management. The goal of the center will be to coordinate across these disciplines and to provide an effective enterprise for moving data intensive research forward in the university while avoiding redundant duplication of effort. 57 Broader impact: This project will involve faculty, students, librarians, technology staff, and students. The goal is to build this consulting center to serve the UGA campus and then extend it to other institutions. As resources become
available, this centers reach can be extended across the Georgia University system, which includes an abundance of minority-serving institutions, and outside the state of Georgia. In addition to the research component necessary to compile and organize information on big data resources, there will be significant effort devoted to education, training, outreach and the curation of up-to-date information that will be relevant to all members of a university community. The center will sponsor big data events and outreach efforts to both promote and enable data-intensive science across disciplines. 58 Data curation is the management of data throughout its lifecycle, from creation and
initial storage to the time when it is archived for posterity or becomes obsolete and is deleted. The main purpose of data curation is to ensure that data is reliably retrievable for future research purposes or reuse. 59 From Wikipedia, the free encyclopedia An actuator is a component of a machine that is responsible for moving or controlling a mechanism or system. An actuator requires a control signal and a source of energy. The control signal is relatively low energy and may be electric voltage or current, pneumatic or hydraulic pressure, or even
human power. The supplied main energy source may be electric current, hydraulic fluid pressure, or pneumatic pressure. When the control signal is received, the actuator responds by converting the energy into mechanical motion. An actuator is the mechanism by which a control system acts upon an environment. The control system can be simple (a fixed mechanical or electronic system), software-based (e.g. a printer driver, robot control system), a human, or any other input. 60 61
Cyber-Physical Systems(http://cyberphysicalsystems.org/) Cyber-Physical Systems (CPS) are integrations of computation, networking, and physical processes. Embedded computers and networks monitor and control the physical processes, with feedback loops where physical processes affect computations and vice versa. The economic and societal potential of such systems is vastly greater than what has been realized, and major investments are being made worldwide to develop the technology. The technology builds on the older (but still very young) discipline of embedded systems, computers and software embedded in devices whose principle mission is not computation, such as cars, toys, medical devices, and scientific instruments. CPS integrates the dynamics of the physical processes with those of the software and networking, providing abstractions and modeling, design, and analysis techniques for the integrated whole.
62 63 Beacons are small, often inexpensive devices that enable more accurate location within a narrow range than GPS, cell tower triangulation and Wi-Fi proximity. Beacons transmit small amounts of data via Bluetooth Low Energy (BLE) up to 50 meters, and as a result are often used for indoor location technology, although beacons can be used outside as well. As an example of how beacons can be used, when a customer is in a store a beacon in that location can communicate with the store's app on the customer's phone
to display special offers or additional information for specific products or services the company is currently offering. 64