@halpinOptimalMatchingAnalysis2010
Optimal Matching Analysis and Life-Course Data: The Importance of Duration
(2010) - Brendan Halpin
Journal: Sociological Methods & Research
Link:: http://journals.sagepub.com/doi/10.1177/0049124110363590
DOI:: 10.1177/0049124110363590
Links::
Tags:: #paper #Methods #OptimalMatchingAnalysis
Cite Key:: [@halpinOptimalMatchingAnalysis2010]
Abstract
The optimal matching (OM) algorithm is widely used for sequence analysis in sociology. It has a natural interpretation for discrete-time sequences but is also widely used for life-history data, which are continuous in time. Lifehistory data are arguably better dealt with in terms of episodes rather than as strings of time-unit observations, and in this article, the author examines whether the OM algorithm is unsuitable for such sequences. A modified version of the algorithm is proposed, weighting OM’s elementary operations inversely with episode length. In the general case, the modified algorithm produces pairwise distances much lower than the standard algorithm, the more the sequences are composed of long spells in the same state. However, where all the sequences in a data set consist of few long spells, and there is low variability in the number of spells, the modified algorithm generates an overall pattern of distances that is not very different from standard OM.
Notes
“The optimal matching (OM) algorithm is widely used for sequence analysis in sociology” (Halpin, 2010, p. 365)
“However, where all the sequences in a data set consist of few long spells, and there is low variability in the number of spells, the modified algorithm generates an overall pattern of distances that is not very different from standard OM.” (Halpin, 2010, p. 365)
“OM has been broadly criticized as having had little success in applications (e.g., Levine 2000) and rather more acutely as being sociologically meaningless (Wu 2000)” (Halpin, 2010, p. 366)
“we see a real advantage in the ability to access sequence information holistically, if not to the degree of overthrowing ‘‘general linear reality’’ (Abbott 1988). That is to say, there are many genuinely effective but not paradigmshifting applications of OM in the sociological literature.” (Halpin, 2010, p. 366)
“In particular, it allows researchers to apprehend the overall structure of complicated longitudinal data, and it gives a holistic perspective that can help put the spell-focused hazard rate model, or the period-by-period transition-focused model, into context” (Halpin, 2010, p. 367)
“For the low-transition data set, the OMv distances between all pairs of sequence are on average only about one fifth of the OM distances, but the correlation is very high, at .963.” (Halpin, 2010, p. 374)
“ariability in the number of spells is also likely to be important: If all sequences tend to have similar numbers of spells of more or less similar length, pair comparisons will tend to be more often of like with like than if the data consist of sequences of very different numbers of spells.” (Halpin, 2010, p. 375)
“Nonetheless, it is clear that compared with truly discrete sequences of similar length, these sequences have quite low entropy, and as a result, it seems” (Halpin, 2010, p. 376)
“that the modified distance measure does not produce dramatically different results.” (Halpin, 2010, p. 377)
“While pairwise distances are the direct product of sequence analysis, the work does not usually stop there. Most typically, the pairwise distance matrices are used to generate empirical typologies, data-driven classifications of the sequences. To explore how much modifying OM affects the outcome, I present a cluster analysis of the BHPS maternal labor history data.” (Halpin, 2010, p. 377)
“An eight-cluster solution is chosen on the informal grounds that it represents a manageable number of distinct clusters” (Halpin, 2010, p. 378)
“The duration-sensitive algorithm clearly produces different results, with far lower costs for pairs where one or both sequences have long runs of the same value. For data sets with high variability in number of spells per sequence, this produces a very different set of pairwise distances than does conventional OM.” (Halpin, 2010, p. 383)
“While OM can readily be defended as meaningful for naturally discrete sequences, it does not naturally fit with episode data, and it is blind to the distinction between, say, deleting all of a one-month episode and deleting a month from a six-month episode. The OMv algorithm, however, provides a means of calculating distances that reduces the scale of this problem, by weighting the deletion cost inversely with the length of the sequence” (Halpin, 2010, p. 385)