Faculty

Lab Director
Research Interests: Big Data, Data Mining, Mobile Systems, Healthcare Systems, Fuzzy Logic
(Collaborator, Isfahan University of Medical Sciences)
Research Interests: Nanotechnology in Drug Delivery, Bioinformatics
(Collaborator, Carleton University, Ottawa, Canada)
Research Interests: biomedical informatics, bioinformatics, pattern classification, patient monitoring, proteomics

PHD Students

Research Interests: data fusion, spatial data, uncertainty
Start: Sep. 2012
Research Interests:
Start: Sep. 2015
Research Interests: Biomedical informatics
Start: Sep. 2017

MSc Students

Research Interests: Big Data, bioinformatics
Start: Sep. 2015
Research Interests: Big Data, Data Mining, Bioinformatics
Start: Sep. 2015
Research Interests: big data, urban data mining
Start: Sep. 2016
Research Interests: Complex networks, bioinformatics, Spark, big data
Start: Sep. 2016
Mehdi Samani
Research Interests: Big data, complex networks
Start: Sep. 2017
Samaneh Samiei
Research Interests: Bioinformatics, machine learning
Start: Sep. 2017

Alumni

  • (Supervisor Dr. Hossein Moradi, Advisor: Dr. Ghadiri)
    Research Interests: Spatial Decision Support System, Environmental Impact Assessment of Road Networks
    Start: Sep. 2013
    Finish: Jan. 2018
    Thesis Title: Development of a Spatial Decision Support System (SDSS) for Quantitative Environmental Impact Assessment of Road Network
    Thesis Abstract: In Environmental impact assessment process, Vulnerability assessment can play as a suitable tool for increases the objectivity of results. Vulnerability assessment along with environmental impact assessment of road network can awareness decision makers about the possible ecological crisis due to road network. Considering the environmental effects of roads beyond the local scale caused the environmental impact assessment of these linear structures acts in multi-scale assessment. Road ecological footprint as a road impact index makes it used for assessment in macro scale. In this thesis, environmental impact assessment along vulnerability assessment and ecological footprint used for analysis of road network. Due to the importance of environmental effects of roads in mountainous areas, road network of Lorestan province was selected for environmental analysis. Roads in mountainous areas are the most important factor in air quality pollution and noise turbulence. In mountainous areas, the main receptors of these impacts are residents of the villages. Additionally, crossing roads from natural areas causes changes in the sound balance of habitats close to these linear infrastructures. In this dissertation, a modeling method has been used to quantify the air quality and noise quality due to road network. The CALPUFF model used for the determine of CO2, NO2, PM10 and SO2 exposure gases from road traffic. Noise propagation due to road traffic simulated based on CRTN model. Road network air pollution and noise pollution impacts was aggregated with Spatial Impact Assessment Methodology (SIAM). Three component vulnerability include exposure, sensitivity and adaptive capacity used for vulnerability assessment of road network. Infrastructure Fragmentation Index (IFI), Residential neighborhood index, road noise and fractal dimension used for quantify of exposure component of vulnerability. Erosion and topographic position indices were used in assessing the sensitivity. Adaptive capacity was quantified based on the Connectivity index and Dominance degree. In order to determine the road ecological footprint, road physical footprint and road energy footprint was aggregated. Finally, the synthesis of the results of the road impact assessment, vulnerability assessment and road ecological footprint was analyzed. The use of spatial and quantitative data based these methodology, led to make a Spatial Decision Support System (SDSS) as a plugin in QGIS Software.
  • Research Interests: Bioinformatics, Evolutionary Computation
    Start: Sep. 2013
    Finish: Jun. 2016
    Thesis Title: Constructing reliable interaction networks based on fuzzy inference and detecting protein complexes
    Thesis Abstract: Bioinformatics is a new scientific research area in which computers, specialized software, and databases are exploited to solve biological problems especially in molecular and cellular contexts.A working domain in bioinformatics is protein complex detection. Protein complexes are groups of proteins that collaborate to perform a particular function inside the living cell. In this filed, researchers have focused on designing algorithms to improve this grouping over the proteins. presented algorithms in this filed,identification protein complex from Graph made of the interactive network. A group of these methods A class of these methods on weighted graphs and other hands on the graphs weightless apply on weighted graphs and other hands apply on the unweighted graph. Surveys on interaction networks show that about 50 percent of detected interactions are false positives. Based on these studies, a general improvement for protein interaction networks could be building the network from multiple biological data sources to determine the weight of each interaction, and to obtain a more reliable network. In this research, several biological data sources are used to build the network. As every biological data source is prone to errors, by using different data sources and fusing them using fuzzy logic, we will be able to detect the errors and fade their effect on the detection of protein complexes. Moreover, we have proposed a general protein complex detection algorithm by exploiting the strong points of other algorithms and existing hypotheses regarding real complexes. Finally, the proposed method applied on Gavin, Krogan, Collin, DIP Data sets. Precision act better respectively 0.25, 0.23, 0.10, 0.27 and F-measure respectively 0.12, 0.14, 0.04, 0.18.
  • Research Interests: Big Data
    Start: Sep. 2014
    Finish: Aug, 2017
    Thesis Title: Improving the Scalability of Spatio-Textual Skyline Query on Big Data
    Thesis Abstract: In recent years, massive amount of spatial data described by using texts (labels or sentences) are generated by modern applications. Because of wide use of such data in various applications, many researches focus on how to efficiently retrieve desired data points. Skyline operator that has different types and is based on the concept of dominance, is one of the available responses to this challenge. A data point dominates another one, if it is not worse than another one in all dimensions and better in at least one dimension. Spatio-textual Skyline query uses skyline operator, and retrieves the desired data points (spatially close and textually relevant to the query point) that are not dominated by other data points. The heavy computatios required for this type of queries, as well as big data phenomenon turnes out optimal and efficient answering to these queries to be a serious challenge. Consequently, most of the solutions proposed in this area suffer from lack of scalability. In this research, we proposed two approaches to deal with the scalability challenges. The first solution is an approximate algorithm that offers a compromise between accuracy and efficiency by pruning the search space. The second solution is an efficient distributed approach that uses the map-reduce distributed programming model. Extensive evaluations confirm the high scalability of proposed algorithms.
  • Research Interests: Data Mining, Big Data, Databases, Medical Text Mining, Machine Learning
    Start: Sep. 2014
    Finish: Jan. 2017
    Thesis Title: Single-document and multi-document concept-based biomedical text summarization
    Thesis Abstract: In recent decades with rapid increase in the volume of available textual information resources, automatic text summarization has become a useful tool to acquire and mange intended information. Using text summarization tools, clinicians and researchers in the biomedical domain can save their time and effort to manage numerous textual information resources. Various summarization methods have been developed so far using different approaches. Some available summarizers utilize term-based methods and generic criteria to measure the informativeness of sentences. Regarding the characteristics of biomedical text, it seems that there is a requirement to employ more efficient measures by biomedical summarizers. To address this issue, we propose a method that uses concept-level analysis of text in combination with itemset mining to identify the main subtopics of input text. In this method, the informativeness of each sentence is measured according to its meaning and the appearance of main subtopics in the sentence. Some biomedical summarizers use the frequency of concepts extracted from input text to select related sentences. To address challenges related to such methods, we propose another summarization method that utilizes concept-level analysis and Bayesian inference. This summarizer estimates the probability of selecting sentences for final summary by following the distribution of important concepts within input document. We performed extensive experiments to evaluate the performance of these two methods for single-document and multi-document summarization. The results of evaluations show that compared to the competitor methods, the two summarizers proposed in this thesis improve the performance of biomedical text summarization.
  • Somayeh Davari
    Research Interests:
    Start: Sep. 2012
    Finish: Jan. 2015
    Thesis Title: Implementation of Fuzzy Region Connection Calculus and its Application in Spatial Relationship of Diseases
    Thesis Abstract: Spatial data are important in today applications, so that every day we witness the extended use of them. Spatial data include the location and properties of spatial objects such as points, lines and regions. Spatial relations constitute an important form of human understanding of spatial formation. In this context, the relationships between spatial objects, especially topological relations between regions, have attracted considerable attention. However, real-world spatial regions such as lake or forest have no exact boundaries and are fuzzy. So it would be better to define the relationships between them in as fuzzy relations. Fuzzy topological relations have application in many contexts, including path tracking algorithms based on fuzzy relations, medical diagnosis of the patient's file, extracting topological relations from the Web, image interpretation, robot navigation and manipulation, brain MRI segmentation, soil science and many other contexts. So far, several researches are conducted on modeling fuzzy spatial topological relations, and progresses have been made. Some methods for modeling fuzzy spatial regions and fuzzy relationships between them have been proposed. However, given the huge amount of data stored in spatial databases, and the fact that existing spatial database systems are based on non-fuzzy relations, we require data processing methods that are based on fuzzy spatial relations. Therefore, the fuzzy enrichment of relations in spatial databases can improve data processing techniques and decision making based on them, as well as improving the user interface in comparison to most today systems. In this thesis, a novel method is proposed for implementing fuzzy relations in spatial databases that is applicable to many applications. As an important application, the relationships used to analyse the spatial relationship between diseases will be evaluated. Additionally, a method based on fuzzy RCC relations for fuzzification of an important group of spatial queries namely the skyline operator is proposed that can be used in decision support, data visualization and spatial databases applications. The proposed algorithms have been implemented and evaluated on real-world spatial datasets. The results of the evaluation of these algorithms show that more flexibility in comparison with existing methods, and speed and quality of the results are appropriate.
  • Amir Hossein Goudarzi
    Research Interests: Spatial data, open source software
    Start: Sep. 2012
    Finish: Sep. 2014

    Last news: PhD Student at Iran University of Science and Technology, Tehran, Iran

    Thesis Title: Classification of spatial data in order to manage the development of urban regions using MOSES algorithm
    Thesis Abstract: Spatial data is one of the most important and sensitive elements of social, economic and political decision making in life today. This is why many needs, goals and different organizational activities are dependent to the knowledge earned from spatial data which is especially important to strategic planning. Researches done in this field are usually neglecting the deep knowledge mined from geographical databases and are based on pure statistical methods. So, classification of urban regions may represents a comprehensive basis for land use and finally makes the decision making based on the deep knowledge mined from geographical databases. Due to the huge volume of data gathered in spatial databases, mining association rules and high level knowledge representation matters a lot. Specifically and in context of maps and spatial data, many spatial data mining algorithms have been proposed. Despite, there are few algorithms who can manage geographical and non-geographical data using topological means, many decision making problems like developing urban areas require such perception and reasoning. For, in this thesis an approach based on genetic programming, statistical modelling and knowledge representation, is represented. To apply MOSES mining rules considering fuzzy topological relations from spatial data, a hybrid architecture called GGEO benefiting from fuzzy region connection calculus is proposed and implemented. Overcoming the problem of time consuming topological relationships calculations, this method is based on data preprocessing. GGEO analyzes and learns from geographical and normal data simultaneously and uses topological distance parameters, representing a series of arithmetic-spatial formulas as classification rules. This approach is resistant to noisy data. Also all its stages run in parallel increasing speed. This approach may be used in different spatial data classification problems as well as representing an appropriate method of data analysis and economic policy making. To represent the application of the mined knowledge in decision making problems in urban planning domain, the method’s been used in a highway planning problem with limited funds.
  • Research Interests: Data Mining, Machine Learning, Big Data, Privacy and Security
    Start: Sep. 2012
    Finish: Jul. 2014

    Last news: PhD Student at Florida State University.

    Thesis Title: Security in location-based social networks
    Thesis Abstract: By the technology developments and especially advent of smart phones with location sensors, location-based services become more popular everyday. The main attribute of these services is using user’s location information and supplying response based on location information. These services are so attractive because they are correlated to user’s real life, but they can also be threatening in case of misuse, thus privacy and security is an important issue in these services. There are many proposed methods to achieve this goal and the most important aspect of all methods is that the information send and reveal to whom and as much as needed for the service. Spatial and temporal cloaking, adding noise and trusted anonymizer server are some of most known methods, but none of them achieve this goal completely and they have some disadvantages either. In this thesis after a survey on previous methods, we propose a novel method based on fuzzy clustering and Bayesian theorem to improve clustering precision and it has been shown that this algorithm could deal with noisy datasets and clustered records with low certainty and thus it helps us to users location information more accurately even with added noise and we could extract users common seen places and thus determine user and his pattern. By this method we show that previous methods could not be trusted and especially if an invader has access to trusted server, he could achieve so many information about users even if they use pseudonym and thus trusted server is untrustable either. Then we propose a method based on separating queries from each others and users and we use ticket instead of ID for users. In this method we use cryptographic and hashing table to distribute queries and responses irregularly and thus the correlation between continuous queries will be eliminated and thus it is impossible for the intruder to determine users based on a sequence of queries. Proposed methods has been tested on some datasets such as MSR location dataset and experimental results show significant improvements in both methods.
  • Ilnaz Khodadadi
    (Supervisor: Dr. Seyed Rasoul Mousavi, Advisor: Dr. Ghadiri)
    Research Interests:
    Start: Sep. 2011
    Finish: Dec. 2013
    Thesis Title: Computational methods for haplotype reconstruction
    Thesis Abstract: Human genomes vary from each other in certain positions. Single nucleotide polymorphism (SNP), is the most common type of these variations. SNPs are of importance in drug design and medical diagnostic applications.And they also offer highest resolution for tracking disease genes. In diploid organisms such as human, the chromosomes come in two copies (one inherited from the mother and one from the father). The SNP sequence on each copy of chromosome is called haplotype. Current sequencing technologies can only provide fragments in at most several thousand base pairs and they cannot tell which copy of chromosome the fragments belong to. Hence computational methods are used to rebuild two haplotypes from these fragments. Sequencing technologies provide quality value Q, which is an integer mapping of q (the probability that the corresponding base call is incorrect). The current state-of-the-art single individual haplotyping algorithm uses Max-SAT. In this research, we have proposed a novel method that uses weighted Max-SAT and WMLF model instead of Max-SAT and MEC model, with purpose of using quality values to make results more accurate. There are several models for single individual haplotyping problem. In this research, is shown that a probability model and the MEC model are equivalent under reasonable approximations. Although there are some criticisms on the MEC model, this shows its rationality. We also have proposed a novel metric "Weighted Reconstruction Rate". To evaluate proposed algorithm, we compared it with two other methods using MEC and proposed metric. The results of this comparison on real data correspond to NA12878 show slight improvements. It should be noted that the accuracy of methods are more than 90%.
  • Bagher Saberi
    Research Interests: Spatial Data, Data Quality
    Start: Sep. 2011
    Finish: Jan. 2014

    Last news: Network Admin at Iran Blood Transfusion Organization

    Thesis Title: Spatial Data Quality Assessment: A Sample-Based Approach
    Thesis Abstract: Spatial data is playing an emerging role in new technologies such as web and mobile mapping and Geographic Information Systems (GIS). Important decisions in political, social and many other aspects of modern human life are being made using location data. Decision makers in many countries are exploiting spatial databases for collecting information, analyzing them and planning for the future. In fact, not every spatial database is suitable for this type of application. Inaccuracy, imprecision and other deficiencies are present in location data just as any other type of data and may have a negative impact on credibility of any action taken based on unrefined information. So we need a method for evaluating the quality of spatial data and separating usable data from misleading data which leads to weak decisions. On the other hand, spatial databases are usually huge in size and therefore working with this type of data has a negative impact on efficiency. To improve the efficiency, we need a method for shrinking the volume of data. Sampling is one of these methods, but its negative effects on the quality of data are inevitable. In this paper we are trying to show and assess this change in quality of spatial data that is a consequence of sampling. We used this approach for evaluating the quality of sampled spatial data related to mobile user trajectories in China which are available in a well-known spatial database. The results show that sample-based control of data quality will increase the query performance significantly, without losing too much accuracy. Based on this results some future improvements are pointed out which will help to process location-based queries faster than before and to make more accurate location-based decisions in limited times.
  • Amin Beiranvand
    Research Interests: Linked Data, Federated Queries
    Start: Sep. 2011
    Finish: Jan. 2014

    Latest news: Java Developer at Informatics Services Corporation

    Thesis Title: Adaptive Processing of Federated Queries over Linked Data based on Tuple Routing
    Thesis Abstract: Recent achievements of linked data implementations and increased number of datasets available on the web as linked data, has given rise to the need and tendency toward processing federated queries over these datasets. Due to distribution of linked data across the web, the methods that process federated queries through a distributed approach are more attractive to users and have gained more prosperity. In distributed processing of federated queries, we need methods and procedures to execute the query in an optimal manner. Most existing methods perform the optimization task based on statistical information, whereas the query processor does not have precise statistical information about their properties, since the data sources are autonomous. When precise statistics are not available, the possibility of wrong estimation highly increases and may lead to inefficient execution of query at runtime. Another problem of existing methods is that in optimization phase, they assume that runtime conditions of query execution are stable, while the environment in which federated queries are executed over linked data is dynamic and non-predictable. By considering these two problems, there is a great potential for exploiting query processing techniques in an adaptive manner. In this paper, an adaptive method is proposed for processing federated queries over linked data which is based on the concept of routing the tuples. The proposed method is able to execute the query effectively without any need to prior statistical information. This method can change the query execution plan at runtime so that less intermediate results are produced, and it can also adapt the execution plan to new situation if unpredicted network latencies arise. Evaluation of our method by running real queries over well-known linked datasets shows very good results especially for complex queries.
  • Research Interests: Data Mining , Big Data
    Start: Sep. 2015
    Finish: Jan. 2018
    Thesis Title: Graph-based Biomedical Text Summarization
    Thesis Abstract: Today, the volume of biomedical text information available for physicians and researchers in different forms including scientific research and Electronic Health Records is growing explosively. In order to be aware of up-to-date knowledge, get familiar with modern tools and achievements, and deliver proper patient care, biomedical physicians and clinicians require efficient access to patients’ records and scientific articles. In addition to physicians, it is also necessary for researchers to manage a substantial volume of biomedical literature so as to generate new hypotheses and ideas. Studying and skimming a large number of Electronic Health Records, biomedical articles, and scientific texts is one of the challenges that physicians and researchers of this domain are faced with. The use of data mining techniques and text summarization systems is a practical solution for saving time and having easy access to all information. There have been a variety of methods for domain-independent text summarization using statistical, machine learning, optimization, clustering, and graph-based approaches as well as for biomedical text summarization employing concept extraction, machine learning, graph-based, and other approaches. One of the most important disadvantages of general text summarization approaches is their worse performance in comparison with domain-specific approaches, which is due to complex concepts and characteristics of biomedical literature. Among domain-specific methods, graph-based ones have a good performance since they take advantage of graph structure to represent the text and do not require training data. One of the weak points of these approaches is that they do not take into account different aspects of text components and relations in graph creating process. This can lead to less coverage as well as high redundancy in the final summary. The purpose of this research is to propose a graph-based biomedical text summarization system to address the weakness mentioned above. To this aim, we propose a summarization system which represents the text using concept-based analysis and itemset mining technique. It then identifies different main topics of the text employing graph clustering concept. This way, the summarizer extracts those parts from the original text that sufficiently represent the gist of the source text, and it also introduces them as the system generated summary. The innovations of such summarizer include the use of domain-specific knowledge and itemset mining technique in graph creating as well as clustering based on itemsets. Extensive experiments have been carried out to assess the performance of the proposed summarization system in comparison with other methods. The obtained results revealed that exploiting concept extraction and itemset mining technique in graph creating as well as discovering main topics with the use of clustering can improve the performance of the graph-based biomedical text summarization systems.
  • Research Interests: bioinformatics, drug repositioning, complex networks, big data
    Start: Feb. 2012
    Finish: Jan. 2018
    Thesis Title: An integrative approach for network-based drug repositioning
    Thesis Abstract: Drug development is still a time consuming and costly process, and the breadth of available data in this area makes it more complicated. Using existing drugs for diseases which are not developed for their treating (drug repositioning) provides a new approach to developing drugs at a lower cost, faster, and more secure. Usually, the therapeutic effect of drugs occurs through their targets in the cell. Therefore it is necessary to consider the relationship between the three concepts of drug, target, and disease together. Although drug repositioning has attracted many researchers' attentions recently and various computational methods have been proposed in this area, their effectiveness is a significant challenge preventing them to be widely accepted. Most of the computational methods available for drug repositioning are machine learning methods based on the similarity between drugs and similarity between drug targets. In addition to limited usage of biological data as inputs, the outputs of these methods are also not well-suited to biological and medical sciences research. On the other hand, the general approach of most of existing drug repositioning methods is supervised learning that requires positive as well as negative training instances. So far there has not been a satisfactory solution to provide negative instances in this area. Lack of attention to the local and global structures of relations between concepts such as drugs, drug targets and diseases together, and the lack of a proper model for the use of topological structures of these relationships are other causes of the inefficiency of existing drug repositioning methods. Our proposed method in this research, focuses on resolving the shortcomings of the existing methods. Moreover, it can accurately predict simple and complex relationships between drugs, drug targets and diseases. Since biological networks typically present a suitable model for relationships between different biological concepts, our primary approach is to analyze graphs and complex networks in the study of drugs and their therapeutic effects. Given the nature of existing data, the use of semi-supervised learning methods is crucial. So, in our research, we have developed a label propagation method to predict drug-target, drug-disease and disease-target interactions (Heter-LP), which integrates various data sources at different levels. The predicted interactions are the most prominent relationships among the millions of relationships suggested to the related researchers for further investigation. The main advantages of Heter-LP are the effective integration of input data, eliminating the need for negative samples, and the use of local and global features together. The results of statistical analysis have proved the effectiveness of the proposed method. The main steps of this research are as follows. The first step is the construction of a heterogeneous network as a data modeling task, in which data185 are collected and prepared. The second step is predicting potential interactions. We present a new label propagation algorithm for heterogeneous networks, which consists of two parts, one mapping, and the other an iterative method for determining the final labels of the entire network vertices. Finally, for evaluation, we calculated the AUC and AUPR with 10-fold cross-validation and compared the results with the best available methods for label propagation in heterogeneous networks and drug repositioning. Also, a series of experimental evaluations and some specific case studies have been presented. The result of the AUC and AUPR for Heter-LP was much higher than the average of the best available methods. In cases where Heter-LP’s parameters are weaker than some of the methods, this difference is minimal. In fact, in experimental analysis, Heter-LP could outperform others, and Heter-LP is the only algorithm that correctly predicts many important items.

BSc. Students (Alumni)

  • Research Interests: Supercomputing, Parallel Processing, Machine Learning, Big Data
    Start: Sep. 2011
    Finish: Sep. 2016

    Last news: PhD Student at Florida State University.

  • Research Interests: Cloud Computing
    Start: Sep. 2011
    Finish: Sept. 2015

    Latest news: Javascript Developer at Mahan Airlines

  • Research Interests: Cloud Computing, Networking, Database, Artificial Intelligence
    Start: Sep. 2011
    Finish: Sept. 2015

    Last news: Masters student at Iran University of Science and Technology

  • Mahbod Milanizadeh
    Research Interests: Android Software Architecture
    Start: Sept. 2011
    Finish: Sept. 2015
  • Elaheh Ebrahimi
    Research Interests:
    Start: Sep. 2011
    Finish:
  • Fahime Berenjkoub
    Research Interests:
    Start:
    Finish:
  • Research Interests: Big Data, Data Mining, Machine Learning, Parallel Processing
    Start: Sep. 2012
    Finish: Sep. 2016
  • Sepehr Bayat
    Research Interests: Machine Learning, Speech Processing
    Start: Sep. 2012
    Finish: Sep. 2016

    Latest news: MSc student at Carleton University

  • Research Interests: Social computing, Information Visualization, Machine Learning, Human-Computer Interaction, Complex Networks
    Start: Sep. 2012
    Finish: Jul. 2016

    Latest news: Graduate Research Assistant in Data Science at University of Toronto

  • A Calendar Application on Android
    Research Interests:
    Start: Sept. 2010
    Finish: Sept. 2014

    Last news: MSc Student at Sharif University of Technology

تحت نظارت وف ایرانی