梦朦胧6620
Classification of knowledge acquisition are data mining to achieve one of the important tasks, the core problem to solve classification model are constructed and classification algorithm implementation. In this paper, decision tree classification method has the representation of the method C 4.5 as an example, the introduction of data mining in a classification of one decision tree classification method and its building and applied research. Keywords: C4.5 decision tree data mining 1 Introduction Generated from the data classifier is an effective way to set up the decision tree (Decision Tree). Express decision tree method is the most widely used logic one way, it was no order from the one without the rules of reasoning examples express the form of decision tree classification rules. Decision tree classification method used to top-down recursive way, in the decision tree internal node property value for comparison, and according to different property values to determine the next node from the branch, in the decision tree leaf nodes conclusion. From the decision tree root to the leaf nodes on a path corresponds to a Conjunctive rules, decision tree on one group corresponds to the rules of disjunctive expression. 2 the basic concepts of decision tree Decision tree classification is divided into two types of trees and regression tree, classification tree for discrete variables make the decision tree, regression tree for continuous variables make the decision tree. General data mining tool, select split permit conditions and pruning rules, as well as control parameters (the minimum node size, the greatest depth of the tree and so on) to limit the decision tree. Decision tree as a tree, the tree root node is the entire data set space, each node is a single variable test, the test data set space is separated into two or more blocks. Each leaf node are classified under a single record. Decision tree for the purpose of producing a collection of training set for each record must have a good sub-category. Property to determine which domain (Field) as the classification of the best indicators. General practice is exhausted all the property domain, for each domain to split the property to quantify the quality of calculated one of the best split. Different algorithm for calculating the standard split-domain property is not the same. Second, repeat the first step, until every leaf node within the records belong to the same category, up to a complete tree. The purpose of the decision tree structure is to identify the property and type of relationship question, use it to predict the future unknown types of record types, this function has a prediction system called decision tree classifier. At classification analysis, the decision tree model is the most popular model, mainly because it is very easy to use graphical (tree structure) approach the performance of excavation results, adapted to enterprise management to make a decision. 1986 J. Machine LearningJournal at Ross Quinlan delivered a speech entitled "Indudion of Decision Trees" thesis, the introduction of a new ID3 algorithm. On this basis, he also carried out on the ID3 algorithm to add and improve, put forward a more advanced algorithm C4.5. This article discusses how to apply the C4.5 decision tree algorithm for structural classification of customers. 3 decision tree generated by the process of Decision tree is a flow chart similar to several structures, one of each internal node, said at a property on the test, each branch represents a test output, and each leaves the representative of the class or category of point distribution. (1) To collect customer information, information of data to merge, forming the structure of a unified customer information data sources. (2) of the data source for data pre-processing, removing the decision-making has nothing to do with the property and the high branches of property, value-type property will be generalized to deal with vacancies, as well as the value of property to form a decision tree training set. (3) of the training set for training, computing the information gain of each property and access rate, access to the rate of selection but at the same time The largest information gain access to the property not less than the average of all the property, as the current master node property for the The property of every possible value to build a branch. The sub-node contains a subset of samples to implement the above recursive process, until the subset of data records in the main on the property values are the same, or not for the division of property can be used to generate the initial decision tree. (4) of the initial decision tree to tree pruning. Mainly after the pruning algorithm to generate the initial decision tree for pruning, and pruning using a pessimistic estimate to compensate for the tree to generate optimism at the time of deviation. (5) obtained by the decision tree classification rules extraction. From the root to the leaves of each path to create a rule to form a rule set. Rule set will be displayed to the user, the user filter rules have been credited to consider possible rules database. (6) When a new customer at the company register, the system derived from the use of decision tree rules for the new client information for data analysis, prediction of the behavior of customers belong to which class, and thus for the company's customer marketing strategy to support decision-making.
只爱小火锅
当潜在的决策树是非常复杂的和明确的解决办法的基本决策树难以获取,企业应利用模拟评估的决定。在一个复杂的决策树有数以千计的可能路径,可能是由于第一阶段的最后。转移概率是用来生成概率加权随机路径的决策树。对于每一个路径,分阶段的决定以及回报评价。路径产生这样一种方式的可能性,产生的路径在模拟是一样的概率中的路径决策树。后产生的许多道路和评价回报在每一种情况下,收获期间获得仿真用作代表性的贿赂,将产生的决策树。的现值,然后发现,平均每场获得的回报在模拟和贴现他们回本。 模拟方法是很好的评估作出决定,而道路本身并不是决定依赖。换言之,转移概率从一个时期到下一个不依赖于作出决定的期间。的也可以考虑到真正的字的限制以及复杂的决策规则。此外,他们可以很容易地处理不同形式的不确定性,甚至在情况中,不同因素之间的不确定性相关。 仿真模型需要较高的安装成本和运营开始与决策树工具。然而,它们的主要优势是,它们可以提供高品质的评价复杂的情况。-----------------------------------------GOOGLE翻译的 >_.<虽然语法很有错误,大意可以看懂不?您这是模式识别???
假小肥仔
决策树算法在大学生心理健康测试系统中的应用_有道翻译翻译结果:The decision tree algorithm in the application of the college students' mental health testing system
inesthreebears
tree 英[tri:] 美[tri] n. 树; 木料; 树状图; 宗谱; vt. 把…赶上树; 使处于困境; 把鞋型插入(鞋内); [例句]I planted those apple trees.我栽了那些苹果树。[其他] 第三人称单数:trees 复数:trees 现在分词:treeing 过去式:treed过去分词:treed
吃货独依
coarticulation协同发音(指的是一个音受前后相邻音的影响而发生变化,从发声机理上看就是人的发声器官在一个音转向另一个音时其特性只能渐变,从而使得后一个音的频谱与其他条件下的频谱产生差异),所以将一个音素划分为几个亚音素单元。如:数字“three”,音素的第一部分与在它之前的音素存在关联,中间部分是稳定的部分,而最后一部分则与下一个音素存在关联。 考虑上下文关联,一个senone的上下文依赖比单纯的左右上下文复杂得多,它是一个可以被决策树或者其他方式来定义的复杂函数(英语的上下文相关建模通常以音素为基元,由于有些音素对其后音素的影响是相似的,因而可以通过音素解码状态的聚类进行模型参数的共享。聚类的结果称为senone。决策树用来实现高效的triphone对senone的对应,通过回答一系列前后音所属类别(元/辅音、清/浊音等等) phones->subword units = syllables音节是一个比较稳定的实体,因为当语音变得比较快的时候,音素往往会发生改变,但是音节却不变。 subwords->word,相对于全部组合,只有少数组合有意义(是存在的单词) words+filters(呼吸,um,uh,咳嗽等)=utterance waveform->根据silences分割为多个utterances,识别每个utterance waveform分割成10ms的frame,每个frame提取39维特征 senone->GMM 特征向量和模型匹配 每个senone的声学属性,其包括不依赖于上下文的属性(每个音素phone最大可能的特征向量)和依赖于上下文的属性(根据上下文构建的senone) 从单词words到音素phones之间的映射。还可以通过运用机器学习算法去学习得到一些复杂的函数去完成映射功能。 用来约束单词搜索的。它定义了哪些词能跟在上一个已经识别的词的后面(匹配是一个顺序的处理过程),这样就可以为匹配过程排除一些不可能的单词。 代表识别的不同结果的有向图。一般来说,很难去获得一个最好的语音匹配结果。所以Lattices就是一个比较好的格式去存放语音识别的中间结果。 和lattices有点像,但是它没有lattices那么密集(也就是保留的结果没有lattices多)。(N-best搜索和多遍搜索:为在搜索中利用各种知识源,通常要进行多遍搜索,第一遍使用代价低的知识源(如声学模型、语言模型和音标词典),产生一个候选列表或词候选网格,在此基础上进行使用代价高的知识源(如4阶或5阶的N-Gram、4阶或更高的上下文相关模型)的第二遍搜索得到最佳路径。) 从lattice的边缘得到的一个严格的节点顺序序列。 参考资料
优质英语培训问答知识库