- Similarity
search in time series data sets
- Similarity search on time-series data sets is of growing
importance in data mining. With the increasing amount of data of
time-series in many applications, from financial to scientific, it is
important to study the methods of retrieving similarity patterns
efficiently and user friendly for business decision making. The thesis
proposes methods of efficient retrieval of all objects in the
time-series database with a shape similar to a search template. The
search template can be either a shape or a sequence of data. Two
search modules, subsequence search and whole sequence search, are
designed and implemented. We study a set of linear transformations
that can be used as the basis for similarity queries on time-series
data, and design an innovative representation technique which
abstracts the shape notion so that the user can interactively query
and answer the multi-level similarity patterns. The wavelet analysis
technique and the OLAP technique used in knowledge discovery and data
warehousing are applied in our system. The retrieval technique we
propose is efficient and robust in the presence of noise, and can
handle several different notions of similarity including changes in
scale and shift.
- An
interactive system for modeling the human perception of resemblance in
patterns
- Our concern is the modeling of the brain's most fundamental
cognitive activity, that of 'perception', from which all other
cognitive activities of the brain emerge. A computer-based, highly
interactive system of empirical knowledge acquisition has been
developed for modeling human perception of resemblance. The modeling
offers a framework for formulating formal description schemata, which
may be used in real-time extraction of experiential knowledge,
embedded in 1-D patterns of sensory data. We present our modeling
approach, consisting of segmentation of the sensory patterns and the
use of concatenated fuzzy polynomials for the representation of the
vagueness of human perception. We have developed an interactive
methodology of modeling, we have implemented the algorithms, and,
during the research and development effort, we have tested in parts
and as a whole the highly interactive modeling procedures. We have
also illustrated the application to experiential, i.e. directly
perception-based, recognition, of perception models developed by our
interactive modeling procedures. The validity of our developed
modeling approach and methodology has been shown with an interactive
empirical knowledge acquisition case, involving a human expert in
meteorology.
- Fuzzy-based
two-dimensional string matching for image retrieval
- This paper describes a method in which a query is made to retrieve
specific images from an image database based upon spatial similarity
to a query pattern. The database images and query patterns are
represented as encoded symbolic pictures. The encoding scheme uses
two-dimensional strings to depict symbolic projections of object
centroids. The objects are represented as object class possibility
vectors, whereas object classes are considered fuzzy sets. The search
engine returns an ordered set of matches, based upon
application-specific quality functions, thresholds, and spatial
criteria. A local quality function and local matching threshold
controls object to object matching. A global quality function and
global matching threshold facilitates weighted partial subpicture
matching. The spatial criteria for matching is defined in a function
tuned according to the specific application. Potential applications
for the proposed method include map image database querying and
on-line image catalogs.
- Temporal
extensions to the relational data model (database)
- Database designers have devoted considerable attention in the past
few years to developing database systems which incorporate the
temporal dimension. The relational data model is a popular model for
standard snapshot database design. The popularity of the relational
data model has made it the focus of considerable effort in the
development of techniques to integrate the temporal dimension into
database systems. We have developed a new set-valued temporal logic.
This temporal logic provided the foundation for the development of
both a temporal relational calculus query language and a temporal
relational algebra query language. Both query languages are efficient,
elegant, and can be incorporated in most temporal relational data
models. To provide a more complete integration of the temporal domain,
we have extended the standard view mechanism to the temporal
relational data model. We have developed definitions for three types
of temporal views. We have developed an algorithm for handling the
maintenance of the view relations as the underlying base relation
experiences updates as well as an algorithm for reflecting view level
updates back into the base relation. Support for hypothetical temporal
relations has also been developed. Database systems which model
real-world situations should incorporate reality as closely as
possible. Many situations involve an imprecise knowledge of when
events have happened or will happen. To integrate this temporal
imprecision into the temporal relational data model, we have de
veloped a fuzzy temporal logic which handles temporal intervals in
which the temporal values are not well defined. The temporal fuzzy
logic is a generalized temporal logic. The non-fuzzy temporal logic is
shown to be a special case of the fuzzy logic in which all time points
are well-defined. Our fuzzy temporal logic can provide a framework for
temporal database systems which more accurately model the real world.
- Analysis
methodologies for integrated and enhanced problem-solving (expert
systems, conceptual modelling)
- As knowledge acquisition (KA) remains a bottleneck in the
development of knowledge-based systems, methodologies and techniques
are needed in both manual and automatic KA directions. This thesis
produces results in three areas related to KA: (1) conceptual
modelling of real-time expert systems; (2) concept learning from
examples; and (3) case-based reasoning. In the area of conceptual
modelling, we present a new methodology, ORTES, that analyses the
acquired knowledge for building a real-time expert system and
represents the analysed knowledge in an object-oriented formalism.
ORTES tackles problems in both KA and object-oriented analysis (OOA).
Most OOA methods are data-driven modelling approaches, in which
analysis is mainly based on identifying and decom posing objects in
the real-world, but ignores the issue of systematically specifying the
system functionality. On the other hand, KA approaches usually center
around modelling problem-solving strategies, but lack support for
effectively connecting the sys tem's functional and data components.
ORTES is proposed to overcome these problems by providing guidelines
for both object and task decomposition and by representing a system in
terms of objects and their relationships. To support task
decomposition, ORTE S provides a generic task structure for real-time
monitoring and control. To support object decomposition, ORTES
supplies a classification scheme for identifying and organising the
objects involved in a real-time control system. Methods for specifying
obj ects and their relationships in an object-oriented context are
also provided. To illustrate the modelling method, we present an
application of ORTES to conceptual modelling of an expert system for
monitoring and control of a water supply system. Another w ay to
overcome the knowledge acquisition bottleneck is to conduct automatic
KA using machine learning. We present a new inductive learning system,
ELEM2, that generates rules based on attribute-value pair selection
and incorporates several new ideas to im prove the predictive accuracy
of induced rules. A heuristic function that represents the degree of
relevance of an attribute-value pair is provided to evaluate
attribute-value pairs. A rule of quality measure that is used for
post-pruning generated rules is proposed based on a discussion among a
number of alternatives. Case-based reasoning (CBR) is another
problem-solving and learning method that solves a new problem by
recalling and reusing specific knowledge obtained from past
experience. Due to the com plementary properties of CBR and rule
induction, integration of the two techniques appears advantageous. We
propose a new integrated method, ELEM2-CBR, that makes use of a hybrid
representation of rules and cases to solve both classification and
numeric p rediction problems. (Abstract shortened by UMI.)
- Automatic
generation and reduction of the semi-fuzzy knowledge base in symbolic
processing and numerical calculation (fuzzy sets)
- Typical fuzzy expert systems can only model human behavior on a
rule-base level, but they cannot create the comprehensible rules which
are difficult to acquire from experts. There is also a lack of logical
dimension reduction method for the reduction of a n existing rule base
generated by experts or analytical modelling. We have proposed an
inductive learning method with semantic intervals (SI) sufficiently
approximated from normal convex fuzzy sets for generation (Zhao et al
1992) as well as reduction (Tu rksen and Zhao 1992) of the semi-fuzzy
knowledge bases by using input-output data collected from objective
processes. The validity of the approximation above is proved by the
criterion of uncertainty compromise in approximation to adjacent fuzzy
sets. The semi-fuzzy knowledge base consists of two main parts, i.e.,
a data base with the triangular semi-fuzzy sets (TSFSs) derived from
the SI and a rule base containing the rule sets with the TSFSs. The SI
plays a key role in symbolic processing for inductive learning. To
explore the validation, verification for this automatic knowledge
acquisition scheme, an equivalence between the inductive learning with
SI and a valid pseudo-Boolean logic simplification is proved. Based on
the equivalence, the reliability, implementability and learnability
are analyzed and acknowledged for the automatic generation and
reduction of the rules with the TSFSs. The TSFSs are functional
numerical calculations of an inference engine. The interval valued
compositional rule of infer ence (Turksen 1989) is extended as an
adequate inference engine on the TSFSs to carry out the linguistic and
numerical values. The advantage of introducing the SI with the
associated TSFS (the SI-TSFS pair) is to integrate symbolic processing
and numerica l calculations. The reduced semi-fuzzy knowledge base is
generated through the SI-TSFS pair to overcome the difficulty of the
fuzzy logic simplification. Originally this difficulty exists in the
conventional fuzzy qualitative modelling technique. Furtherm ore, the
derivation of the SI-TSFS is consistent with the separation theorem
(Zadeh 1965). In practical applications even when the condition for
the equivalence is not satisfied, the proposed scheme can still
provide the semi-fuzzy knowledge base with bet ter testing results in
both the classification and inference of a singleton numerical value.
The proposed method has been shown to be successful in the modelling
of continuous and discrete complex processes such as chemical vinylon
synthesis, a repair par ts service center, search and rescue
satellite-aided tracking (SARSAT), human operation of a chemical plant
and stock market activities.
- A
comparative study of different methods of predicting time series
- This thesis work presents a comparative study of different methods
for predicting future values of time series data and implement them to
predict the currency exchange rates. The current thesis focuses mainly
on two approaches in predicting a time series. One of them is the
traditional statistical approach which involves building models based
on certain assumptions and then applying them to do the predictions.
The models considered in this thesis are multiple regression,
exponential smoothing, double expo nential smoothing, Box-Jenkins
method, and Winter's method. The second approach is using the concept
of training neural nets and pattern recognition. This involves in
designing a neural network and training it using different learning
methods. The learnin g algorithms used in the current work involves
the backpropagation method, recurrent nets learning method, adaptively
trained neural nets, and fuzzy learning methods. In addition to these,
some methods for forecasting a chaotic time series and fractional
differencing are also mentioned in the thesis. In order to compare the
performances of different techniques of forecasting the future values
of a time series, experiments were conducted using the exchange rates
of different currencies with respect to the US dollar. These exchange
rates exhibit a lot of randomness in their behaviour and hence it was
very challenging to predict their future values. Different prediction
zones were selected to conduct the experiments and analysis of the
results have been pres ented towards the end of the thesis.
- Fuzzy
associative conceptual knowledge for supporting query formulation (fuzzy
knowledge)
- Dealing with currently existing information bases requires the
human to adapt himself to the terminology and organization of the
data. Users who are not totally familiar with the specific information
base they are interacting with need the help of an inte rmediary to
access the data. This thesis describes a knowledge-based system which
supports the user of an object-oriented database to use the
terminology required by the database when formulating his query. This
is done by semantic term set enlargement, w hich maps the concept
given by the user into a set of similar concepts. This mapping takes
the semantics of the concepts into consideration and therefore
requires a model of the concepts of the database domain. A major
objective of our work is to model no more knowledge than necessary for
our task, thus keeping the knowledge model simple, and reducing the
effort of knowledge acquisition. For the conceptual knowledge model we
propose a semantic net where concepts are related by association and
generalizati on relationships. These relationships are fuzzy, i.e.
they are of varying strength. Because we do not assume the
completeness of the concept net, relationships which have not been
explicitly specified are inferred from the existing relationships by
utiliz ing their transitivity. For the construction of a knowledge
base we have implemented a direct-manipulation editor. Fuzzy
relationships can be specified by positioning concepts relative to
each other, without giving numerical values for the strength of the
relationships. Semantic term set enlargement can be used in other
application areas, too. We discuss how it can be applied to support
the integration of heterogeneous databases and the access of full text
databases.
- Fuzzy
knowledge acquisition using cognitive learning
- This thesis deals with the concepts of the nature of machine
knowledge and the different ways of knowledge acquisition and
representation. Through the process of machine learning, the
underlying principles involved in designing an expert system is
discuss ed. A design of a system which learns using class description
and certainty factors is described and simulated. Following this, a
discussion on fuzzy systems, including fuzzy logic and fuzzy sets is
made along with Bayes' probability theory. A system has been designed
that uses the concepts of fuzzy values and probability that could be
used in a backward chaining system in which probability formulae are
applied to the rules of a knowledge base.
- Identification
of pre-query fuzzy search rules and data mining techniques for
integrated decision support frameworks
- Information is more than a by-product of the daily operation of an
enterprise. Information systems are now being utilized for
transforming operational information into knowledge intended for
decision making support. This new breed of information system is
referred to as an analysis-based decision-oriented processing system.
A recent literature survey concluded that companies desiring to
successfully implement these new systems require an architected
framework that incorporates analysis of decision scenari os in
addition to establishing information requirements. This research
focuses on the study of how decision scenarios can be enriched by
incorporating appropriate information technology and artificial
intelligence techniques within modern enterprises. An original
architected decision support framework is proposed with the intention
of complementing the Zachman Framework for information systems
development. This research presents an original methodology for
developing a priori selection rules for applying predefined and ad hoc
queries to large datasets.
- An
implemented framework for the construction of hybrid intelligent
forecasting systems
- This thesis presents an implemented architectural framework for
construction of hybrid intelligent forecasters for utility demand
prediction. The framework has been implemented as the Intelligent
Forecasters Construction Set (IFCS) which supports the inte lligent
techniques of fuzzy logic, artificial neural networks, knowledge-based
and case-based reasoning. This tool provides a rapid application
development (RAD) environment for constructing forecasting
applications. IFCS is also a hybrid-programming tool , which allows
developers to implement forecasters by means of object-oriented visual
programming, knowledge-based programming and procedural programming.
IFCS was implemented on the real-time expert system shell G2$/sp1$
with G2 Diagnostic Assistant (GDA $/sp1$) and NeurOn-Line$/sp1$ (NOL)
modules. Rules, procedures and flow diagrams are organized into a
hierarchy of workspaces. The modularity of IFCS allows subsequent
addition of other modules of intelligent techniques. A chief benefit
of IFCS is that it allows developers to concentrate on problem solving
and conceptual modeling instead of dealing with complicated
programming tasks. It also expedites implementation of forecasters.
The framework and the IFCS tool were tested on two problem domains.
The fi rst application is to predict daily power load of the City of
Regina. The second application is to forecast consumer demand on the
water distribution system of the City of Regina. The data of each
problem was separated into several classes, then a neural network
module was applied to model each of them. The results from this
approach were compared to those from a linear regression (LR) and a
case based reasoning (CBR) program. The forecasting results and
performance comparisons among the forecasters will be discussed. ftn
$/sp1$ G2, GDA and NeurOn-Line are trademarks of Gensym Corp., U.S.A.
- Knowledge-based
image retrieval using spatial and temporal constructs (query processing)
- A knowledge-based approach is introduced for retrieving images by
content using spatial and temporal constructs. It supports the
answering of conceptual image queries involving similar-to predicates,
spatial semantic operators, and references to conceptua l terms, as
well as temporal, evolutionary, and stream constructs. Interested
objects in the images are represented by contours segmented from
images. Image content such as shapes and spatial relationships are
derived from object contours according to dom ain-specific image
knowledge. Sequences of image objects are represented as streams for
retrieving image (sequences) based on their temporal change. A
three-layered model is proposed for integrating image representations,
extracted image features, and ima ge semantics. With such a model,
images can be retrieved based on the features and content specified in
the queries. A knowledge-based spatial temporal query language (KSTL)
is also presented to express and process image queries with
conceptual, spatial, temporal, evolutionary, and stream constructs.
The implementation of KSTL via extending ODMG's object-oriented query
language OQL (Cat94) is also presented. The knowledge-based query
processing is based on a query relaxation technique. The image
features are classified by an automatic clustering algorithm and
represented by Type Abstraction Hierarchies (TAHs) for knowledge-based
query processing. Since the features selected for TAH generation are
based on context and user profile, and the TAHs can be gene rated
automatically by a clustering algorithm from the feature database, the
proposed image retrieval approach is scalable and context-sensitive.
- Knowledge
discovery with medical databases: a case-based reasoning approach
- Medical informatics projects are accumulating enormous numbers of
clinical cases in hospital information systems. Efficient extraction
of clinically useful knowledge patterns from these clinical databases
to improve health care quality is a challenging re search topic.
Though the progress in Knowledge Discovery in Databases (KDD) provides
a basis for medical data mining development, the characteristics of
the medical practice requires an unique medical knowledge exploration
process. In patient care, physic ians utilize knowledge extracted from
basic principles and cases they have experienced. In medical education
and practice, the Case-Based, or problem-based learning and reasoning
approaches are widely used. Integrating Case-Based Reasoning (CBR)
principle s in Knowledge Discovery with Medical Databases (KDMD)
development is intuitive. The hypothesis of this research is that by
combining the CBR paradigms, KDD principles, and clinicians'
expertise, the knowledge patterns extracted from clinical databases ca
n be utilized to improve health care quality. A KDMD working model is
proposed to test the hypothesis. Three basic phases: goal and data
discovery, knowledge exploration, and knowledge refinement are
introduced. In this working model, clinicians can expre ss their
concerns and preferences to guide knowledge exploration from the data.
When applying the derived knowledge patterns in clinical work,
clinicians can further justify the decision support information and
then refine the scope of the knowledge with the help of CBR paradigms.
To achieve this objective, a KDMD support system called MIKE (Medical
Interactive Knowledge Explorer) has been developed. The knowledge
exploration examples in this research manifest how the system learned
from both clinicians' expertise and evidence in the data. Tests using
breast cancer data shows that the expert-guided decision tree
construction strategy + combined with case similarity assessment
outperformed pure inductive learning methods. An application of the
working mode l on coronary artery disease verified the functional
proficiency of MIKE. The plot of the learning curve after each
training session demonstrates the incremental knowledge discovered.
Using clinical data on difficult airway prediction, MIKE yields 58%
sen sitivity, compared to current rule-based airway risk alert
algorithm (36% sensitivity) and the other airway evaluation methods,
such as the Mallampati test and the Wilson Risk-Sum ($<$50%
sensitivity). The improvement from this trial demonstrated that the
working model is capable of increasing the predictability of difficult
airways versus anesthesiologists rule based methods. Furthermore, the
medical knowledge discovery working model should be applicable to many
different data and experience rich fields.
- A
multicriteria data retrieval model: an application of multiattribute
preference model to data retrieval
- This dissertation proposes a new data retrieval model as an
alternative to exact matching. While exact matching is an effective
data retrieval model, it is based on fairly strict assumptions and
limits our capabilities in data retrieval. A new category of data
retrieval, multi-criteria data retrieval, is defined to include
many-valued queries, (which require partitioning of data entities into
more than two, possibly infinite, subsets), and multi-derived data,
(which are derived by non-homogeneous multiple rules). A metric-based
preference model is proposed as a referential model for multi-criteria
data retrieval. The model is based on the idea that we human beings
prefer outcomes close to an ideal alternative (the 'positive anchor')
and far removed from t he worst imaginable alternative (the 'negative
anchor'). A 'relative distance metric' is proposed to operationalize
the concept of closeness in matching. Many-valued and multi-derived
data retrieval queries are formalized within the framework of the
metri c-based preference model. Query interpretation is defined as
measuring the relative distances of data entities from the (positive
and the negative) anchors. The viability of the proposed data
retrieval model is proved by analyzing its logical properties a nd by
evaluating its performance against the current data retrieval models
for both exact matching and non-exact matching. The multi-criteria
data retrieval model is proved to satisfy the De Morgan logic and
therefore has the same query interpretation val ues as the exact match
data retrieval model for the conventional data retrieval queries. With
regard to many-valued query interpretation, the proposed relative
distance metric is proved to better represent a user's actual
preferences for data entities tha n the current fuzzy metric or the
Euclidian distance metric. With regard to retrieval of multi-derived
data, the proposed model is proved to result in fewer errors than
current exact matching. These findings show that, both at the logical
level and at the performance level, the proposed multi-criteria data
retrieval model retains all the desirable features for data retrieval.
- Multi-model
fuzzy control for nonlinear systems (distributed contol, decision
making)
- Fuzzy logic technology has emerged as a promising tool for dealing
with control and decision making problems in complex systems. Fuzzy
control provides an effective algorithm which can convert heuristic
knowledge and experience of human experts into the f orm of linguistic
fuzzy control rules. However, there is still a lack of a systematic
control design procedure and general theoretical analysis, mainly due
to the explicit model-free nature of the methodology and its nonlinear
nature. This thesis is conce rned with a multi-model fuzzy control
approach for nonlinear systems. It investigates the basic
architecture, the modeling, the stability analysis, and the control
design methodology of fuzzy model based control using the
Takagi-Sugeno fuzzy model as seen from the control engineering
perspective. The model framework is based on an operating region
decomposition of the nonlinear system which is modeled with a simple
local linear model at each operating region. The local linear models
are aggregated togethe r using fuzzy membership functions with a
smooth interpolation technique. This framework also supports the
development of a hybrid model with combined qualitative and
quantitative knowledge. The thesis also presents the conception and
formation of a fuzzy supervisory control architecture with
hierarchical multi-level structure which allows the introduction of
high level qualitative linguistic expert control strategies into the
quantitative low level compensator loop. Our work extends the early
work by oth er researchers in a way that allows the techniques of
modern control theory to be applied directly to analyse and design
fuzzy controllers for complex nonlinear systems. It makes it possible
to define a new tool in nonlinear control that incorporates the
advantages of knowledge-based and linear control theory methods. The
developed methodology also provides a formalised framework or
theoretical foundation for the ad hoc multi-model local control
approach commonly used by control engineers in industry, and it also
has the advantage of being relatively easily implemented on a modern
distributed control system. A case study of fuzzy model based control
for a steam generation drum-boiler power plant is given to illustrate
the potential of this multi-model fuz zy control methodology.
- Question-driven
information retrieval systems (knowledge based, natural language
processing, embedded systems)
- An approach is presented to building question-driven information
retrieval systems to answer natural language questions from
collections of free-text question-answer pairs. The question-answering
task is conceptualized as the retrieval of answers to quest ions
similar to a submitted question. Similarity decisions are made by
combining numerical techniques of information retrieval with scalable,
knowledge- based approaches of natural language processing. Answer
retrieval is based on the identification of te rms' content-bearing
capacities from the sequential structure of free text and the
recognition of critical semantic relations among terms through a
general-purpose semantic network. An approach is outlined for
embedding question-driven information retriev al systems into
information sources such as organizations. A question-driven
information retrieval system is embedded in a source when it has some
knowledge of the source's structure and relies on it to answer
questions submitted to the source. Feedback f rom the source is
solicited and utilized after retrieval failures. New answers produced
by the source are indexed for reuse under the questions that initiated
their production. These ideas are implemented in two question-driven
information retrieval syste ms, FAQ Finder and the Chicago Information
Exchange (CIE). FAQ Finder answers questions from a collection of
Usenet files of frequently asked questions. CIE is embedded into the
University of Chicago's Computer Science Department to answer
questions on ce rtain topics of computer science.
- Representative
classification of protein structures (sequence similarity)
- This thesis deals with the classification of protein structures,
and, especially, the representativity of such classifications.
Naturally occurring proteins exhibit similarity at various levels of
their structure. The observed three-dimensional structural similarity
of proteins is partly due to sequence similarity. In this work, we
show that sequence similarity can be used as a basis for structural
classification. In numerous applications of computational molecular
biology, a representative characterizati on of the vast protein
structural space is desired. The basic solution is representative
selection. In this work, this problem is introduced and formalized in
a molecular biological context, and a generic clustering based method
for representative selecti on is developed. The method is applied to
structures in the Protein Data Bank. The resulting individual
representatives as well as the structural families formed are
exhaustively classified. Instead of an individual prototype, a more
substantial character ization of the families is needed for reliable
abstraction of the common features to support more advanced uses of
the information such as inductive inference. Two aspects of
representativity are identified here: comprehensiveness, or coverage,
and typica lity, or non-redundancy. Based on these ideas, we develop
the theory and algorithms of representative classification. The
methods are shown correct and efficient, and their usability is
demonstrated by empirical testing. Finally, applications of represent
ative classifications are discussed. Particularly, the significance of
representative learning sets for structural prediction methods is
evaluated. The performance of composition and sequence based methods
is studied. A fuzzy interpretation of the seconda ry structural
classification of proteins is suggested for better reliability.
- The
retrieval expert model of information retrieval
- The purpose of an information retrieval system is to meet
information needs. People who are expert at meeting information needs
go about satisfying them much differently and, in general, more
successfully than automated systems. The model that forms the b asis
for this dissertation is a descriptive model of how these experts
satisfy information needs. This model can be used prescriptively in
the design of an information retrieval system whose performance is
similar to that of a human expert.
- Retrieving
justifiably relevant cases from a case base using validation models
(case based reasoning, knowledge acquisition)
- Case-based reasoning (CBR) consists of two phases: case retrieval,
and case reasoning. The goal of case retrieval is to extract from a
memory of cases the items most appropriate to a particular problem. An
effective retriever must have both high recall an d precision, and
perform each retrieval operation quickly. One way to achieve the
necessary speed is for a retriever to extract cases using the low
level features, termed surface features, that characterize them. These
features can be acquired inexpensive ly but their information content
is low. By adding domain-specific surface feature knowledge to these
retrievers, the recall can be improved but the precision worsens. For
example, to achieve 100% recall using two databases with 200 and 355
real-world cas es 22 and 68 cases were retrieved respectively through
surface feature retrieval. On the average only 4.5 and 4 of the
retrieved cases respectively were relevant. In this dissertation I
present validated retrieval a method for retrieving cases that are ju
stifiably relevant to a new problem, and a system, called scSTAIN,
that implements this method. Validated retrieval improves the
precision, maintains the recall of surface feature-based retrieval,
and justifies the relevance of each retrieved case by augm enting
surface feature-based retrieval with a second processing step called
validation. Applying validation to parts of the same two databases
improved the precision by reducing the cases to 4.5 and 4 respectively
of the total number of cases in the datab ases while maintaining a
recall of 100%. The knowledge used during validation is organized in a
knowledge structure called the validation model. The validation model
is acquired through a methodology which utilizes the contents of the
cases. Two case-base d expert systems were implemented around scSTAIN,
and were subsequently used to evaluate the performance of validated
retrieval and the effectiveness of the knowledge acquisition
methodology. The evaluation of these expert systems showed that their
develo pment costs are six times smaller than the corresponding costs
of the rule-based expert systems, while their development time is four
times smaller.
- Time-based
clustering and its application to determining a signal's motivation:
deterministic chaos or random disturbance (chaos)
- The theory and applications of deterministic chaos have received a
great deal of attention during the last decade, with several new and
valuable approaches introduced that can be used to obtain a clearer
understanding of the origins of such signals and th e nature of the
systems responsible for their presence. Mutual information theory, for
example, a concept introduced by A. Fraser (Physical Review A, 1986),
can be used to address the choice of an optimal embedding time step in
order to avoid oversampling experimental data. For the most part,
however, current tools for the analysis of apparently chaotic signals
lack in their ability to adequately address the significance of time
evolution within their methodology. This dissertation introduces a new
method for probing whether a signal has a deterministic or purely
random origin. The approach employs a time-dependent clustering
quantizer (TBC) to transform the original waveform data into a symbol
train, which can then be analyzed for excluded symbol combina tions. A
hypothesis test is used to bound the likelihood of randomness of a
complex time series, using Markoff chains to calculate the probability
of missing and existing symbol combinations. Finally, J. Theiler's
technique of surrogate data (Physica D, 1 992) is employed to
strengthen these quantitative results. It is shown that the new TBC
quantizer unifies the concepts of mutual information theory with
attractor reconstruction time-embedding, as a means of obtaining
dynamically optimal signal coarsening . Future chaotic system research
and directions for applications of the TBC method include possible new
attractor reconstructions with a generalization of the underlying
time-dependent clustering method quantizer, development of
cluster-based models for c omplex dynamical systems such as weather
and communication phenomena, as well as the fundamental problem of
controlling the behavior of systems subject to chaotic behavior.
- Theory and
design of a hybrid pattern recognition system (sigmoidal theory, fuzzy
sets)
- Pattern recognition methods can be divided into four different
categories: statistical or probabilistic, structural, possibilistic or
fuzzy, and neural methods. A formal analysis shows that there is a
computational complexity versus representational power trade-off
between probabilistic and possibilistic or fuzzy set measures, in
general. Furthermore, sigmoidal theory shows that fuzzy set membership
can be represented effectively by sigmoidal functions. Those results
and the formalization of sigmoidal fun ctions and subsequently
multi-sigmoidal functions and neural networks led to the development
of a hybrid pattern recognition system called tFPR. tFPR is a hybrid
fuzzy, neural, and structural pattern recognition system that uses
fuzzy sets to represent mu lti-variate pattern classes that can be
either static or dynamic depending on time or some other parameter
space. Given a set of input data and a pattern class specification,
tFPR estimates the degree of membership of the data in the fuzzy set
that corres ponds to the current pattern class. The input data may be
a number of time-dependent signals whose past values may influence the
evaluation of the pattern class. The membership functions of the fuzzy
sets that represent pattern classes are modeled in thre e different
ways. Fuzzy sets with membership functions modeled through sigmoidal
functions would be used for simple pattern classes that can be
described concisely by a fuzzy set expression. A structural pattern
recognition method coupled with fuzzy compo nents would be used
whenever the pattern class under question would depend on some
parameter space (such as time). Finally, multi-sigmoidal neural
networks would depend on some parameter space (such as time). Finally,
multi-sigmoidal neural networks would be used to model the membership
function of a fuzzy set representation for a pattern class whenever it
would be difficult to obtain a formal definition of that function.
Although efficiency is a very important consideration in tFPR, the
main issues are k nowledge acquisition and knowledge representation
(in terms of pattern class descriptions). tFPR has been embedded in
the BB1 blackboard architecture but it can also run as a stand-alone
system. It is currently being applied in a system for medical monito
ring.
|