An Analysis of Crop Yield Prediction by  Data Mining Techniques using optimalparametersVARA LAKSHMI G , Associate Professor, Aurora's Technologicaland Research Institute, Dept of CSE ABSTRACT: Agricultural sector is the emerging area for research as inIndian economy agriculture plays a vital role. Moving towards digitalization ofagriculture sector. .so the government and researchers  are more focused on how to improve the cropyield  using latest technologies like BigData Analytics , IOT and considering optimal parameters for predicting.we areusing different Data Mining Techniques to extract the patterns to predictfuture crop production.

Here we have observed the most important parameter thatis type of cultivation like Traditional cultivation, SRI cultivation and GodRice cultivation has to be considered as one of the optimal parameter for yieldprediction .KEYWORDS: Big Data, IOT, Chameleon, Random Forest,Support vector Machine, Regression I .INTRODUCTION Many Countries are applying latest technologieslike Big Data and IOT to digitalize the agricultural data and applying advancedata mining techniques on dataset. The reduced production of food material and higher cost offood products mostly due to deficient in primary nutrients of the agriculturalsoil in India. In present time soil fertility is on a verge of decreasing trenddue to use of Fertilizers, Pesticides, Insecticides, Salinity, Unscientific Cultivationand urbanization lot of research is being done with the help of bioinformatics,biotechnology and data analytics. We apply technology to increaseproduction in parallel to have less impact on Soil Fertility, Air Pollution, WaterPollution and Health of people who consume the Food products II.

TECHNIQUES DataMining Techniques for yield Prediction ·        Clustering is an un-supervisedlearning. Clustering techniques can be categorized into Partitioning method,Hierarchical method have 2 approaches they are Agglomerative and Divisive,Density based methods, Grid-based methods and Model based clustering methods. ·        Classification techniques for findingknowledge that are rule Based Classifiers, Bayesian Networks, NearestNeighbour, Support Vector Machine, Decision Tree, Artificial Neural Network, RoughSets, Fuzzy Logic and Genetic Algorithms. III.

LITERATIVE STUDY Thepaper1 focuses on  finding optimalparameters like year, State-Karnataka (28 districts), District, crop (cotton,groundnut, jowar, rice and wheat), season (kharif, rabi, summer), area (inhectares), production (in tonnes), average temperature (°C), Average Rainfall(mm), Soil, PH value, Major Fertilizers, Nitrogen (kg/Ha), Phosphorus(Kg/Ha),Potassium(Kg/Ha), Minimum Rainfall , Minimum Temperature  to maximize the crop production using datamining techniques like DBSCAN method is used to cluster the data based ondistricts which are having similar temperature, rain fall and soil type. Tocluster the data based on the districts which are producing maximum cropproduction PAM and CLARA are used . They compared PAM, CLARA and DBSCAN usingthe external quality metrics like Purity, Homogenity, Completeness, V Measure,Rand Index, Precision, Recall and F measure. To predict the annual crop yieldthey used Multiple linear regression method    2In this paper theyhave collected the agricultural data from several sources such asdes.

kar.nic.in and raitamaitra.kar.nic.in.

The dataset ranges from year 2005 to2013 of rice production. They considered 30 districts in Karnataka state and1200 rows of data and 18 parameters. The Input Dataset consist of 9 year datawith following parameters Year, State-Karnataka (30 districts),District , Crop(Rice) , Area (in Hectares) , Production (in Tonnes), Yield, Average Rainfall(mm), Soil, canals, wells,water(Cusec), Nitrogen(kg/Ha),Phosphorus(Kg/Ha),Potassium(Kg/Ha), Actual Rainfall , Zone and Insecticides.Theyused data mining techniques like Chameleon using 2 phase algorithm derived thebest soil required by rice and soil fertility improvisation, Random forest theyderived, for the available water and fertilizers what kind of yield is expected,multiple regression technique for the available set of selected multipleparameters what kind of yield can be expected, an increase of parameters howthe yield can maximized and logistic regression summarizes how yield isaffected by different parameters like water, nitrogen, phosphorous, potassiumthrough different plots. 3Inthis paper authors to predict the crop production they used data miningtechniques like Multiple Linear Regression (MLR) and Density-based ClusteringTechnique and results so obtained were verified and analyzed. The data theyused collected from 1955 to 2009 for East Godavari district considering 8parameters year, Rainfall, Area of Sowing, yield, Fertilizers(Nitrogen,Potassium and Phosphorus) and Production.

  4 In this paper a Frame work  was proposed i.e., The data in the form of pictures can be captured through our smartphones can be sent to the bank. The Agricultural bank contains necessary toolsto analyze the data and within a short period, the farmer gets the solution tothe problems like Pesticide Usage , Seed Usage , Crop Diagnosis ,Temperatureand climate , Loan  , Rain fall . 5In this paper an architecture was proposed named as  crop yield prediction model consist of Inputmodule contains crop name, land area, soil type, soil pH, pest details , weather,water level, seed type .The Feature selection module to select subset ofattributes from crop details.

The Crop yield prediction model used to predictplant growth, plant diseases. This presents new research possibilities for the applicationof new classification methodologies to the problem of yield prediction. Thestudy on the effect of temperature and Rainfall on agricultural production ofrice has been done prior. In different cropping seasons Bangladesh offersseveral varieties of rice 6.

They have taken Temperature and Rainfall andperformed regression analysis. Temperature plays a vital role on the cropproduction.The data has been taken from the "Bangladesh Agricultural ResearchCouncil (BARC)" for past 20 years with 7 attributes: rainfall, max and mintemperature, sunlight, speed of wind, humidity and cloud-coverage. The completedataset was divided in 3 month duration phases (March to June, July to October,November to February) during pre-processing. This pre-processing has been donefor each kind of rice variety. For this duration, the average for every attributehas been taken and associated with it.

 7In this work the data mining techniques used are Support vector machine whichis a black box technique used for classificationand prediction, SVM combines the concept of regression as well as clusteringand Artificialneural network . The data used in this paper has been from, ISRIC-World SoilInformation, They havemore than 50 attributes, out of which we selected 10 attributes as follows:Depth, pH, Organic Carbon, Available Nitrogen, Available Phosphorus, AvailablePotassium, Porosity and Water Holding Capacity. In case of ANN we achieved highestperformance of 55 percent with 7 hidden nodes.SVM is applied with threedifferent kernel, polynomial, Radial Basis and Hyperbolic tangent.SVM we haveachieved much better results with 74% using Radial basis kernel.

 8Inthis work, they designed an algorithm to select a sequence of crops over theseason to achieve net yield rate of crops using crop selection method(CSM).Inthis they have taken the name of the crop, sowing period, Harvesting period,Growing days or Plantation days and Predicted yield rate influenced bydifferent parameters. This method results in high performance and accuracy whenpredicted values are accurate. 9C4.5 decision tree algorithm is used to build Rice Disease Classification (RDC)based on symptoms .

The experiment is done over Indian Rice Disease. Decisiontree,C4.5, is used to automatically acquire knowledge from empirical data ofIndian Rice Disease. The advantage of C4.5 is interpretable.

The algorithm,C4.5, can effectively built a tree with high predictive power and gives moreaccurate result on test data set. 10Machine learning techniques are used in prediction of crop diseasesclassification .Couple of machine learning techniques are studied such as C4.

5decision tree algorithm, support vector algorithm and artificial neural networkto develop agriculture applications.