DEVELOPMENT OF A DISTRIBUTED BIG DATA FUSION ARCHITECTURE FOR MACHINE-TO-MACHINE COMMUNICATION
Need help with a related project topic or New topic? Send Us Your Topic
DOWNLOAD THE COMPLETE PROJECT MATERIAL
DEVELOPMENT OF A DISTRIBUTED BIG DATA FUSION ARCHITECTURE FOR MACHINE-TO-MACHINE COMMUNICATION
Chapter One:
Introduction
1.1 Background of the Study
Data fusion is a recognised structure used to indicate the union of data from many sources, and its transmission imparts the procedures and tools for the cooperation of data from different sources (Akita 2002; Wald, 1998).
This method focuses on the synergy provided by the many sources. The overarching goal is to improve training data for a specific learning problem by combining data from other related problems in order to learn from numerous sources.
The approach is beneficial in cases where the information contained in a single source is only partially relevant to the problem at hand. With this understanding, it is advantageous to extract complementing information from a variety of data sources.
Data fusion is also unprofitable when the various data sources available for a specific problem are incomplete and noisy. The principles of data fusion are to focus on enhancing classification accuracy, which increases the likelihood of obtaining useful information (Wald, 1998).
Data fusion is defined as a formal framework for integrating distinct data obtained from multiple sources with the only goal of acquiring higher quality information, which must be associated with the development of more relevant information (Wald, 1998).
The primary reason for developing data fusion is the growing demand for relevant information, a solid technique for managing data efficiently, and reliability and performance.
Data fusion has applications in research, engineering, biometrics, finance, governance, remote sensing and geographic data, and wireless sensor networks, among others. Lower level data fusion, intermediate data fusion, and high-level data fusion are all methods for merging information from several sources.
Esteban et al. (2005) found that categorization depends on the stage of data fusion. Low-level fusion, also known as raw data fusion, is the combining of raw data from numerous sources into new raw data that should be explanatory, whereas intermediate-level or feature fusion is the fusing of data from distinct sources (Esteban et al., 2005).
To achieve the intermediate degree of fusion, a distinct feature from the raw data source must be extracted and aggregated into a unified feature.
As a result, this level can be referred to as fusion, which is based on object-dependent feature extraction, but the high level is a fusion technique that is decision-driven, merging results from many algorithms to get a final fused judgement.
In the realm of study, numerous approaches to this method of fusion exist, including the statistical method, majority voting, and ensemble method. As a result, each level of fusion carries some trade-offs.
The most important aspect is determining the best fusion level for the data supplied. Other reported disadvantages are highlighted in the table below (Esteban et al., 2005).
Table 1.1: Tradeoff for Various Levels of Data Fusion (Esteban et al., 2005).
Parameters
Data Feature Decision
Bandwidth
Potentially very large Medium
Small amount of information
No loss.
Same, medium performance, no loss.
Some operational complexity.
High Medium Low
Several data fusion designs have been created and implemented. Multisensor systems collect information from multiple sources using methods such as Bayesian inference, ensemble learning, Dumpster-Shafer evidence theory, fuzzy logic, and neural networks (Alsheikh, 2014).
Big Data Fusion (BDF) learns and updates data models as they process information. This applies to both static and real-time data. The models are updated at each level of processing, and any identified relations or patterns are nearly immediately applied to the fusion task (Wheeler 2014).
Before a user may use a cloud application, the data must be directly preloaded in the data centre. This architecture simply operates on the parallelism of federated systems/servers communicating and completing a certain task (Abhishek & Benjamin, 2012; Calheiros et al., 2011).
The demand and necessity to optimise network resources economically, as well as the technical expense of remotely managing data and IT infrastructure, prompted the use of cloud computing techniques.
BDF has been used in cloud computing architecture, as documented in various publications (Abhishek & Benjamin, 2012; Calheiros et al., 2011; Calheiros, 2011). This process encompasses big data management, processing, storage, analytics, and fusion.
Cloud computing service providers such as Microsoft, Facebook, Google, and Amazon (Evans, 2011) manage massive amounts of data from billions of networked devices, leading to the emergence of big data, Internet of Things, and M2M (Sneps-Sneppe & Namiot, 2012; Chen & Lin, 2014; Evans, 2011).
1.2 Statement of Problem
Popular centralised big data fusion uses the Hadoop MapReduce architecture, which has high latency and connection bandwidth. The latency and bandwidth are not computationally cost effective.
To address the issues of centralised design, it’s crucial to create a distributed architecture that processes and fuses massive data closer to the source.
1.3 Aims and Objectives
The goal of this study is to create a distributed big data fusion architecture for machine-to-machine (M2M) communication utilising Ensemble Learning.
The research aims are as follows:
1) Create a centralised big data fusion architecture.
2) To create a distributed large data fusion architecture based on the Fog Computing technique with ensemble learning.
3) To assess the performance of the created architectures utilising accuracy and latency as performance indicators,
I. A comparison of Adaboost’s performance in centralised and distributed data fusion architectures.
II. Performance comparison of AdaBoost in distributed architecture versus Ensemble Adaboost in distributed data fusion architecture.
1.4 The Significance of Research
The purpose of this study is to ensure that data is fused and dispersed at the edge utilising fog computing techniques in a distributed architecture. To illustrate the usage of Weikato Environment for Knowledge Analysis (Weka) as a tool for data fusion using ensemble methods.
5
1.5 Scope of the Study
The goal of this project is to use the WEKA data mining suite to create a distributed data fusion architecture with one terabyte of spatial picture data.
Need help with a related project topic or New topic? Send Us Your Topic