In this section, we briefly present the methodology that we used to construct the predictive models for the task of 30-day hospital readmission prediction using PCTs cite{blockeel1998top, vens2008decision, kocev2013tree}. We first present a general algorithm for constructing PCTs (for more details please refer to cite{blockeel1998top}). Next, we outline the specific PCTs to predict all of the concrete diseases simultaneously but ignore the hierarchical information (i.e., address the task of 30-day hospital readmission prediction as a multi-label classification task) cite{blockeel1998top, kocev2013tree}. Furthermore, we trained PCTs to predict all of the diagnoses simultaneously and exploit the hierarchy information cite{vens2008decision, kocev2013tree}.subsection{Predictive Clustering Trees}The PCT framework views a decision tree as a hierarchy of clusters. The top-node corresponds to one cluster containing all examples from dataset.

Further, top-node is recursively partitioned into smaller sub-clusters using standard top-down induction of decision trees algorithm. Method for partitiong of dataset into smaller sub-clusters can be viewed as a heuristic which reduce the variance of the examples in sub-clusters making sub-cluster more homegeneius. In other words, the cluster homogeneity is maximized by maximizing the variance reduction. Since examples in one sub-cluster are homegeneious it is expected to achieve better predictive performance. cite{kocev2013tree}The PCTs algorithm, therefore, uses the variance function, for partitioning of dataset, and the prototype function, that computes a label for each leaf, as parameters that can be instantiated for a given learning task.

In this work, we instantiated PCTs for both multi-label classification cite{madjarov2012extensive, kocev2013tree} and hierarchical multi-label classification cite{vens2008decision}.Multi-label classification uses Gini index instantiation of PCTs, which is able to predict multiple binary targets simultaneously. Namely, the variance function for the PCTs for the task of multi-label classification is computed as the sum of the Gini indices of the target variables, i.e. $Var(E) = sum_{i = 1}^T Gini(E,Y_i)$. The prototype function returns a vector of probabilities that an instance is labeled with a given label (binary target variable) using Euclidean distance.

Finnaly, the labels are calculated by applying a threshold on vector of probabilities.For the task of hierarchical multi-label classification, the variance of a set of examples $E$ is defined as the average squared distance between each example’s label vector $(L_i)$ and the set’s mean label vector $(L_i): Var(E) = frac{1}{vert E vert} sum_{E_i in E} d(L_i, overline{L})^2$, where the label vector is defined as a vector with binary components. The $i$-th component of the vector is 1 if the example is labeled with the label $l_i$ and $0$ otherwise. In the hierarchical multi-label meaning, the similarity at higher levels of the hierarchy is more important than the similarity at lower levels. This is reflected by using a weighted Euclidean distance: $d(L_1, L_2)= sqrt{(sum_{(l = 1)}^{vert L vert} w(l_l) (L_{1,l} – L_{2,l})^2}$, where $L_{i,l}$ is the $i$-th component of the label vector $L_i$ of an instance $E_i$, $vert L vert$ is the size of the label vector, and the label weights $w(l)$ decrease with the depth of the label in the hierarchy ($w(l) = w_0 ? w(p(l))$, where $p(l)$ denotes the parent of label $l$ and $0 < w_0 < 1$. The prediction for an example that arrives at the leaf can be obtained by applying a user defined threshold $ au$ to the probability (label) vector, while preserving the hierarchy constraint (the predictions comply with the parent-child relationships from the hierarchy).

However, one might have question how the hierarchy was created.subsection{Hierarchies}Hierarchy employed in this paper is based on Clinical Classifications Software (CCS) cite{healthcare2010clinical} for the ICD-9-CM taxonomy of diagnoses. The CCS for ICD-9-CM is one in a family of databases and software tools developed by Healthcare Cost and Utilization Project cite{hcupnet2003utilization}, which is based on a Federal-State-Industry partnership and sponsored by Agency for Healthcare Research and Quality. CCS for ICD-9-CM is created to inform decision-making at the national, State, and community levels. It is developed as a tool for clustering patient diagnoses and procedures into a manageable number of clinically meaningful categories. CCS for ICD-9-CM used in this paper is multi-level CCS (hierarchical system) which groups single-level CCS categories into broader body systems or condition categories (e.

g., “Diseases of the Circulatory System”, “Mental Disorders”, and “Injury”). The multi-level system has four levels for diagnoses and three levels for procedures, which provide the opportunity to examine general groupings or to assess very specific conditions and procedures. Both, diagnosis and procedures are considered as concepts. Each concept is connected to another with an ‘is-a’ relationship. The CCS for ICD-9-CM hierarchy enables a traverse from top level concepts to bottom level concepts. It is noticeable that CCS for ICD-9-CM resembles a forest (defined by a majority of trees which are not connected). A fragment of this hierarchy is presented in Fig.

1, representing a generic concept name on the top and a corresponding CCS for ICD-9-CM code range associated with that concept up to a most specific concept on the bottom level.egin{figure}h! includegraphicsscale=0.41{figure1} caption{Excerpt of CCS hierarchy}end{figure}One can, instead of using the domain CCS hierarchy, try to derive a hierarchy from the sets of readmitted diagnosis and use this hierarchy in the learning and prediction phases in order to improve the predictive performance.

While building hierarchy over the output (label) space, there is only one constraint that we should take care of. The original multi-label classification task should be defined by the leaves of the label hierarchy. In particular, the labels from the original multi-label classification problem represent the leaves of the tree hierarchy, while the labels that represent the internal nodes of the tree hierarchy are so called meta-labels (that model the correlation among the original labels).

Creation of data-driven hierarchies for PCTs are explained in work cite{madjarov2014evaluation}. They consider flat label sets and construct label hierarchies from the label sets that appear in the annotations of the training data by using clustering approaches based on balanced k-means clustering cite{tsoumakas2008effective}, agglomerative clustering with single and complete linkage cite{manning2009information}, and clustering performed with PCTs. This is convenient because we reduce chance of overfitting. In this paper, for deriving the hierarchy of the (original) multi-label classification problem, we employ the balanced k-means and the agglomerative clustering approaches.