Exp 3 [PDF]

  • 0 0 0
  • Suka dengan makalah ini dan mengunduhnya? Anda bisa menerbitkan file PDF Anda sendiri secara online secara gratis dalam beberapa menit saja! Sign Up
File loading please wait...
Citation preview

Delhi Technological University



Machine Learning Lab



EXPERIMENT : 3 AIM : To implement Decision Tree ID3 (Iterative Dichotomiser 3) Algorithm.



THEORY : Decision tree builds classification or regression models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes. A decision node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast and Rainy). Leaf node (e.g., Play) represents a classification or decision. Entropy is a measure of the amount of uncertainty in the dataset. It is defined for a binary class with values a/b as: Entropy = - p(a)*log(p(a)) - p(b)*log(p(b)) Information gain IG(A) is the measure of the difference in entropy from before to after the dataset S is split on an attribute A. In other words, how much uncertainty in S was reduced after splitting set S on attribute A.



ALGORITHM ID3 (Examples, Target_Attribute, Attributes) Create a root node for the tree If all examples are positive, Return the single-node tree Root, with label = +. If all examples are negative, Return the single-node tree Root, with label = -. If number of predicting attributes is empty, then Return the single node tree Root, with label = most common value of the target attribute in the examples. Otherwise Begin A ← The Attribute that best classifies examples. Decision Tree attribute for Root = A. For each possible value, vi, of A, Add a new tree branch below Root, corresponding to the test A = vi. Let Examples(vi) be the subset of examples that have the value vi for A If Examples(vi) is empty Then below this new branch add a leaf node with label = most common target value in the examples Else below this new branch add the subtree ID3 (Examples(vi), Target_Attribute, Attributes – {A}) End Return Root



ALGORITHM def find_entropy(df): Class = df.columns[-1] #To make the code generic, changing target variable class name entropy = 0 values = df[Class].unique() for value in values:



fraction = df[Class].value_counts()[value]/len(df[Class]) entropy += -fraction*np.log2(fraction) return entropy



def find_entropy_attribute(df,attribute): Class = df.columns[-1] #To make the code generic, changing target variable class name target_variables = df[Class].unique() #This gives all 'Yes' and 'No' variables = df[attribute].unique() #This gives different features in that attribute (like 'Hot','Cold' in Temperature) entropy2 = 0 for variable in variables: entropy = 0 for target_variable in target_variables: num = len(df[attribute][df[attribute]==variable][df[Class] ==target_variable]) den = len(df[attribute][df[attribute]==variable]) fraction = num/(den) entropy += -fraction*log(fraction) fraction2 = den/len(df) entropy2 += fraction2*entropy return abs(entropy2)



def find_winner(df): Entropy_att = [] IG = [] for key in df.columns[:-1]: # Entropy_att.append(find_entropy_attribute(df,key)) IG.append(find_entropy(df)-find_entropy_attribute(df,key))



return df.columns[:-1][np.argmax(IG)]



def get_subtable(df, node,value): return df[df[node] == value].reset_index(drop=True)



def buildTree(df,tree=None): Class = df.columns[-1] #To make the code generic, changing target variable class name #Here we build our decision tree #Get attribute with maximum information gain node = find_winner(df) #Get distinct value of that attribute e.g Salary is node and Low,Med and High are values attValue = np.unique(df[node]) #Create an empty dictionary to create tree if tree is None:



tree={} tree[node] = {} #We make loop to construct a tree by calling this function recursively. #In this we check if the subset is pure and stops if it is pure. for value in attValue: subtable = get_subtable(df,node,value) clValue,counts = np.unique(subtable['class'],return_counts=True) if len(counts)==1:



#Checking purity of subset



tree[node][value] = clValue[0] else: tree[node][value] = buildTree(subtable)



#Calling the function recursively



return tree



OUTPUT



DISCUSSION A decision tree is a map of the possible outcomes of a series of related choices. It allows an individual or organization to weigh possible actions against one another based on their costs, probabilities, and benefits. They can be used either to drive informal discussion or to map out an algorithm that predicts the best choice mathematically. Using decision trees in machine learning has several advantages:  The cost of using the tree to predict data decreases with each additional data point  Works for either categorical or numerical data



 Can model problems with multiple outputs But they also have a few disadvantages:  When dealing with categorical data with multiple levels, the information gain is biased in favor of the attributes with the most levels.  Calculations can become complex when dealing with uncertainty and lots of linked outcomes.  Conjunctions between nodes are limited to AND, whereas decision graphs allow for nodes linked by OR.



FINDINGS AND LEARNINGS Advantage of ID3:    



Understandable prediction rules are created from the training data. Builds the fastest tree. Builds a short tree. Only need to test enough attributes until all data is classified.



Disadvantage of ID3:   



Data may be over-fitted or over-classified, if a small sample is tested. Only one attribute at a time is tested for making a decision. Difficulty in classifying continuous.