Post Views:
0
Abstract
With the high availability of computing facilities, a huge amount of data is available in electronic form. Processing of huge data is required to discover new facts and knowledge. But dealing with huge datasets is challenging because real-world data is generally incomplete, inconsistent, contains errors or outliers. More than 80% of the data is unstructured or semi-structured. The data is prepared by data preprocessing. Data preprocessing has become an essential step in data mining. Data Preprocessing takes 80% of the total efforts of any data mining project and it directly affects the quality of data mining. The selection of the right technique and tool for data preprocessing helps to enhance the speed of data mining process. This paper discusses different preprocessing techniques, different tools available for text preprocessing, carries out their comparison and briefs the challenges faced such as knowledge of sentence structure of a language to perform tokenization, difficulty in constructing domain-specific stop words list, over stemming and under stemming etc.
Related
Previous articleNumerical Modeling of 3D Site-City Effects Including Partially Embedded Buildings Using Spectral Element Methods in Medium Stiffness SoilsNext articleOptimal Path and Path-Following Control in Airborne Wind Energy Systems
INSTRUCTIONS AFTER PAYMENT
- 1.Your Full name
- 2. Your Active Email Address
- 3. Your Phone Number
- 4. Amount Paid
- 5. Project Topic
- 6. Location you made payment from