|Home | About | Journals | Submit | Contact Us | Français|
Marriage histories are a valuable data source for investigating nuptiality. While researchers typically acknowledge the problems associated with their use, it is unknown to what extent these problems occur and how marriage analyses are affected.
This paper seeks to investigate the quality of marriage histories by measuring levels of misreporting, examining the characteristics associated with misreporting, and assessing whether misreporting biases marriage indicators.
Using data from the Malawi Longitudinal Study of Families and Health (MLSFH), I compare marriage histories reported by the same respondents at two different points in time. I investigate whether respondents consistently report their spouses (by name), status of marriage, and dates of marriage. I use multivariate regression models to investigate the characteristics associated with misreporting. Finally, I examine whether misreporting marriages and marriage dates affects marriage indicators.
Results indicate that 28.3% of men and 17.9% of women omitted at least one marriage in one of the survey waves. Multivariate regression models show that misreporting is not random: marriage, individual, interviewer, and survey characteristics are associated with marriage omission and marriage date inconsistencies. Misreporting also affects marriage indicators.
This is the first study of its kind to examine the reliability of marriage histories collected in the context of Sub-Saharan Africa. Although marriage histories are frequently used to study marriage dynamics, until now no knowledge has existed on the degree of misreporting. Misreporting in marriage histories is shown to be non-negligent and could potentially affect analyses.
Retrospective marriage histories collected in surveys are a valuable source of information on nuptiality. They usually contain information on respondents’ reported marriages, including marriage dates and how unions ended. Researchers have previously used survey-based marriage histories to calculate the probabilities of divorce and remarriage, examine the sociodemographic factors associated with these events, and identify respondents who divorced and/or remarried between survey waves (Amoateng and Heaton 1989; Anglewicz and Reniers 2014; Boileau et al. 2009; Brandon 1990; Fedor, Kohler, and Behrman 2015; Gage-Brandon 1992; Grant and Yeatman 2014; Hampshire and Randall 2000; Locoh and Thiriat 1995; Reniers 2003, 2008; Reniers and Tfaily 2008; Takyi and Gyimah 2007; Tilson and Larsen 2000). Despite their value, retrospective marriage histories, like other forms of survey data, may be incomplete or contain incorrect information. While researchers typically acknowledge problems with retrospective marriage histories, such as respondents omitting unsuccessful or short unions and misreporting dates (Boileau et al. 2009; Reniers 2008), it is unknown to what extent these problems occur and, more importantly, how marriage analyses are affected. Ideally, the validity of marriage histories would be measured by comparing them against public records: however, this is not feasible in many parts of Africa because civil marriages are not the norm (Enel, Pison and Lefebvre 1994; van de Walle and Meekers 1994).
An alternative method is to measure their reliability by comparing marriage histories reported by the same respondent at two or more points in time. Using data from the Malawi Longitudinal Study of Families and Health (MLSFH), this study investigates whether respondents consistently report their spouses (by name), status of marriage, and dates of marriage across two survey waves. This study also investigates the characteristics associated with marriage omission and marriage date inconsistencies and examines whether misreporting biases marriage indicators. Results indicate that a considerable amount of misreporting exists and that misreporting does not appear to be random. Several marriage, individual, survey, and interviewer characteristics are associated with misreporting and marriage indicators are shown to be affected by misreporting.
Two types of misreporting are common in surveys where respondents are asked to provide autobiographical information. The first type of misreporting relates to the reporting of the event itself. A large body of literature has shown that some respondents underreport and/or overreport events such as unemployment, migration, births, pregnancy, cohabitation, and sexual behavior (Courgeau 1992; Dare and Cleland 1994; Hayford and Morgan 2008; Hertrich 1998; Mathiowetz and Duncan 1988; Ratcliffe et al. 2002; Smith and Thomas 2003). Overreporting or underreporting events can lead to calculated rates, such as birth rates and divorce rates, being over- or underestimated. It can also result in biased population-level indicators. Furthermore, regression estimates can be biased if individuals are incorrectly coded as having experienced certain events, such as sexual debut.
The second type of misreporting occurs when respondents misreport event dates such as migration, marriage, and divorce, as well as the ages at which events occur, including age at first sex or marriage (Auriat 1993; Hertrich 1998; Mitchell 2010; Smith and Thomas 2003; Wringe et al. 2009; Zaba et al. 2009). Misreporting event dates can affect calculated rates by simultaneously increasing and decreasing the number of events occurring in two adjacent time periods, leading to both over- and under- estimates of rates during a particular time period. By changing the temporal ordering of events, misreporting event dates can also affect analyses attempting to assign causality. Lastly, misreporting event dates can lead to the misrepresentation of trends, such as age at first sex or marriage.
The survey response model proposes a framework for understanding how respondents answer survey questions that can help us diagnose misreporting (Sudman, Bradburn and Schwarz 1996; Tourangeau, Rips and Rasinski 2000). The ideal respondent provides accurate and complete answers by carefully and comprehensively following the four steps outlined in this model: comprehension, retrieval, judgment, and response formatting. Misreporting occurs when the respondent cannot or does not fully carry out each step of the process.
Comprehension refers to the ability of respondents to understand the question in the same manner as the researcher who designed the question intended. In Sub-Saharan Africa a variety of marriage forms exist, including free unions, consensual unions, customary marriages, and religious and civil marriages (Arnaldo 2004; Budlender, Chobokoane and Simelane 2004). Although surveys and censuses typically categorize all of these unions as marriages (van de Walle 1968), respondents may not define marriages in the same manner. Consequently, miscomprehending the question could result in respondents underreporting marriages.
Next, the respondent retrieves the relevant information to answer this question from his or her memory. If the respondent has experienced several events of a similar nature, such as marriage, he or she may have difficulty retrieving information for a particular event (Auriat 1991; Crowder 1976; Dykema and Schaeffer 2000; Gillund and Shiffrin 1984; Mathiowetz and Duncan 1988; Thompson et al. 1996). As a result, the respondent may misreport marriage dates. Once the respondent has retrieved the necessary information, the respondent judges whether the retrieved information answers the interview question. If it does not fulfill the objective of the question, then the retrieval stage is repeated. Alternatively, the respondent may decide that it is not worth the effort to repeat the judgment step, resulting in marriages being misreported.
Finally, the respondent evaluates the retrieved information and decides on the appropriate response format. If the respondent finds that the response would be embarrassing or portray the respondent in an unfavorable light, then he or she might edit the information in order to produce a socially desirable response (Tourangeau and Yan 2007). For example, a respondent may be embarrassed to admit that he or she has been divorced and, as a result, not report marriages that ended in divorce.
In Malawi, marriage is nearly universal, women marry young, polygamy is not uncommon, and payment of bridewealth is typically practiced among patrilineal ethnic groups. Marriage also frequently ends in divorce: approximately half of all rural women will have experienced a divorce at some point in their lives (Reniers 2003). There is, however, considerable variation in marriage and divorce patterns by region (Table 1). The northern region, where the Tumbuka are the dominant ethnic group, is largely patrilineal with mostly virilocal residence after marriage (i.e., the couple lives with or near the husband’s family). Marriage in this region tends to be more formal and bridewealth payments, though not substantial, are part of the marriage process. Despite being predominantly Christian, the north has the highest rates of polygyny: approximately 41% of women enter into polygynous first marriages. Also, women residing in the north have the lowest probability of divorce for first marriage: approximately 14% and 40% of first marriages end in divorce after 5 and 25 years, respectively. Though these are the lowest divorce rates in Malawi they are high relative to other African countries (Amoateng and Heaton 1989; Clark and Brauner-Otto 2015; Isiugo-Abanihe 1998; Locoh and Thiriat 1995; Ratcliffe et al. 2002). The southern region, where the Yao are the dominant ethnic group, is primarily matrilineal with mostly uxorilocal residence (i.e., the couple lives with or near the wife’s family) after marriage. Because the south is primarily matrilineal, marriages are not formalized through the payment of bridewealth, as they are in the north. As a result, marriages tend to be more casual and informal, and frequently end in divorce (Kaler 2001). The south, historically known for its lack of marital stability (Kaler 2001; Mitchell 1956; Tew 1950), has the highest probability of divorce for women, with 33% and 65% of first marriages ending in divorce after 5 and 25 years, respectively. Despite being predominantly Muslim, the south has the lowest rates of polygyny: approximately 23% of women enter into polygynous first marriages. The central region, where the Chewa are the dominant ethnic group, observes a mixture of patrilineal and matrilineal kinship structures, and residence can be either virilocal or uxorilocal after marriage. Statistics for the central region lie between those of the north and south.
Characteristics of Malawian marriages can affect the comprehension, retrieval, and judgment steps of the survey response process and lead to misreporting. Some respondents may not comprehend marriage-related questions in the manner the researcher who designed the survey intended. Many marriages, especially in the south, can be characterized as casual and informal, and are often short-lived. Though respondents may have considered them to be marriages at the time they were together, their perceptions of whether these unions constitute marriages may change over time, resulting in some marriage omission. High rates of marital instability as well as polygamy (among men) can also affect the ability of respondents to retrieve and judge details of marriages from their memory. Due to the saliency of first marriages as an important milestone, second and higher order marriages are more likely to be misreported. Because many individuals have experienced more than one marriage, there is a greater chance that they will confuse marriage details such as start and end dates. For the same reason, polygamous men may have greater difficulty than monogamous men in keeping track of and remembering details of specific marriages.
Long-standing interest in how the survey response process affects data quality (e.g., Neter and Waksberg 1964) has culminated in a large body of literature examining the characteristics associated with misreporting. Marriage, individual, survey, and interviewer characteristics lead to marriage omission and marriage date inconsistencies when the four steps of the survey response process are not carried out fully and completely (Table 2).
Marriage characteristics can affect the first three steps of the survey response process. Where marriages are casual and informal, respondents may not know whether they should report a union as a marriage. The lack of a wedding event makes it less clear when a marriage started. Longer duration states tend to be more memorable than those of a shorter duration (Auriat 1991; Cannell, Miller and Oksenberg 1981; Smith and Thomas 2003). Similarly, salient events, defined as events that induce emotions at the time of the event or mark a turning point or transition in one’s life, are more likely to be remembered than those of lesser importance (Mathiowetz and Duncan 1988; Neisser and Winograd 1995; Sudman et al. 1996; Thompson et al. 1996). As in many cultures, first marriages tend to be salient events, marking an important milestone in a person’s life: thus they are more likely to be remembered than later marriages.
Events taking place further in the past are less likely to be remembered (Cannell et al. 1981; Mathiowetz and Duncan 1988; Thompson et al. 1996). Though time may appear to be the primary factor leading to recall error, it is often the experience of multiple events of a similar nature that interferes with the ability of respondents to retrieve details of a particular event (Auriat 1991; Crowder 1976; Dykema and Schaeffer 2000; Gillund and Shiffrin 1984; Mathiowetz and Duncan 1988; Thompson et al. 1996).
Feelings surrounding an event could also affect the recall of events, as pleasant events are more likely to be remembered than unpleasant events (Schwarz et al. 1994; Thompson et al. 1996). Marriages ending in divorce or widowhood are therefore more likely to produce errors in marriage start dates than marriages ongoing at the time of interview. Because divorce is a process as well as an event, there is ambiguity as to when the marriage ended: respondents may report the date of initial separation rather than the date of divorce. By contrast, the death of a spouse is a clearly defined event and would generate more accurate reports than divorce or separation.
The following hypotheses can be offered for effects of marriage characteristics on marriage misreporting:
Sociodemographic characteristics can affect the cognitive abilities of respondents to retrieve and judge relevant information from their memory. Older respondents have been shown to have a greater tendency to misreport events than younger respondents (Borrini et al. 1989; Castro 2012; Dykema and Schaeffer 2000). Older respondents, by virtue of having lived longer, may have experienced several events of a similar nature, making it difficult to retrieve the particulars of a specific event. Furthermore, aging can negatively affect cognitive processes and lead to poorer memory recall (Glisky 2007). Although the relationship between gender and event misreporting has been shown to vary, women have generally been shown to be better at reporting events and providing more accurate dates than men (Grysman and Hudson 2013; Poulain, Riandey and Firdion 1992; Schwarz et al. 1994; Thompson et al. 1996). In addition, evidence suggests that more-educated respondents are better at recalling events as well as details surrounding these events (Auriat 1991; Castro 2012; Mitchell 2010; Peters 1988; Smith and Thomas 2003). Schooling may increase a set of skills related to the ability to recall information. On the other hand, marriage is a relatively rare event and should therefore pose less difficulty in recall.
I hypothesize that the following individual characteristics influence marriage misreporting:
Survey characteristics can reflect difficulties that respondents experience in the retrieval and judgment steps of the survey response process. Some respondents may become fatigued and deliberately underreport events as a way to shorten the interview (Murphy 2009). Misreporting occurs because fatigued respondents skip the retrieval and/or judgment steps of the survey response process. On the other hand, longer survey times may reflect respondents spending more time thinking about survey questions, resulting in a more thorough retrieval and judgment process. Respondent cooperation could also be associated with misreporting. In a study of married couples in Detroit, Michigan, respondents rated as being more difficult underreported events more often than respondents who attempted to remember events (Kessler and Wethington 1991). Uncooperative respondents may forgo the retrieval and judgment steps of the survey response process. I propose that the following survey characteristics predict marriage misreporting:
The interviewer plays an important role in the survey response process, serving as the conduit from which the attitudes, experiences, and perceptions of the respondent are transmitted and processed into data. Interviewer quality could affect how well respondents complete the first three steps of the survey response process. During the first step, interviewer quality can affect whether respondents comprehend the question as the researcher intended. For instance, higher quality interviewers may be better at explaining questions and/or providing clarification to respondents. During the second and third steps, higher quality interviewers can provide retrieval cues to help respondents retrieve and judge the relevant information from their memory. Interviewers can also affect responses during the final stage, response formatting. Outward characteristics of the interviewer such as gender or non-measurable characteristics such as the demeanor of interviewers could influence misreporting. For example, some interviewers may be better at creating a rapport with respondents and making them feel comfortable when answering questions. Prior studies have shown an inconsistent relationship between interviewer’s gender and survey responses. A study in Nepal found that female respondents were more likely to underreport current pregnancies to male interviewers (Axinn 1991): however, in Nigeria the interviewer’s gender did not matter for responses to sensitive questions about family planning (Becker, Feyisetan and Makinwa-Adebusoye 1995).
The present study defines higher quality interviewers as being male, ever-married, and having prior interviewing experience. Based on knowledge of the interviewer selection process, I assume that female interviewers are, on average, of lower quality than male interviewers. The research team set lower cutoff scores for selecting female interviewers than male interviewers in order to fulfill gender quotas. I classify ever-married interviewers as being of higher quality because they are more likely to establish a rapport with respondents and encourage respondents to open up about past and current marriages. Interviewers with prior interviewing experience likely are better skilled at probing for responses than those with no prior experience. I hypothesize that the following interviewer characteristics predict misreporting in marriage histories:
This study uses data from the Malawi Longitudinal Study of Families and Health (MLSFH), formerly known as the Malawi Diffusion and Ideational Change Project (MDICP). The MLSFH is a panel survey that interviewed ever-married men and women in three rural districts of Malawi: Rumphi (northern), Mchinji (central), and Balaka (southern). The first wave of data collection occurred in 1998 and interviewed 1,541 ever-married women, ages 15–49, and 1,065 of their husbands. Since 1998 five additional rounds of data collection have taken place (2001, 2004, 2006, 2008, 2010). See Kohler et al. (2014) for more details on the study sample and data collection procedures in the MLSFH. The quality of data collected in the MLSFH has been the subject of investigation in a number of studies: however, none of these studies directly examined the quality of marriage histories (Appendix 1).
The present study uses data from the 2006 and 2010 waves of the MLSFH. These waves were chosen because of the nature of the marriage histories collected and the availability of data on the interviewers. Seventy-four percent of the respondents interviewed in 2006 were re-interviewed in 2010. In general, refusal to participate in the survey was relatively rare: fewer than 5% of respondents who were successfully contacted refused to participate (Kohler et al. 2014). Furthermore, the principal reason for not being re-interviewed was migration out of the survey area (Anglewicz et al. 2009), primarily due to marital instability (Anglewicz 2012). Thus, this study will likely underestimate misreporting in marriage histories.
My potential sample consists of 2,014 respondents who provided marriage history data in both survey waves. It is important to note that the survey waves were organized to be independent: interviewers did not have information from previous waves when collecting data in the current wave. Respondents were asked to report the names of their current and past spouses, up to a maximum of ten spouses, beginning with the first spouse and ending with the current/most recent spouse. The MLSFH did not define marriage in this survey: rather, respondents themselves determined whether a past union constituted a marriage. This likely resulted in the inclusion of both formal and informal marriages. For each reported spouse, respondents were to answer a series of questions including the year the marriage began and whether they were still married to the spouse. If the marriage had ended, they were to report the year it ended and the main reason why it ended.
Data collection procedures in the 2006 and 2010 MLSFH differed in two ways. In 2006 three survey teams, ‘family listing’, ‘main survey’, and ‘biomarker collection’, interviewed respondents. Three separate visits were required to complete all sections of the survey. In 2010 biomarker collection did not occur and the family listing and main survey questionnaires were combined into a single questionnaire, resulting in only one visit. Consequently, respondents answered questions about marriage after a substantial amount of time had passed, increasing the likelihood of survey fatigue. In 2010 the MLSFH introduced a system of incentives to the survey teams for the first time. If a survey team completed a minimum number of interviews per day, then all members of the team (supervisors, interviewers, and driver) received a financial bonus. This system could have motivated some interviewers to rush through interviews to increase their team’s chances of receiving a bonus.
A dataset of reconstructed marriage histories (RMH) was created using data from the 2006 and 2010 waves of the MLSFH. Only marriages that began before the 2006 survey and were reported in either 2006 or 2010 were included in this dataset. To create the RMH, I first matched marriages across surveys for all respondents who reported marriage histories in both survey waves. Because names tend to be spelled differently across survey waves, mostly due to the interpretation of the interviewer, marriages were visually matched on a case-by-case basis. Spouse name was the primary criteria used to verify that a marriage listed in 2006 corresponded to a marriage listed in 2010. With few exceptions, spouse names were similar enough to match without difficulty. I also used marriage dates to verify matches. If a marriage began before the 2006 wave and was not reported in both 2006 and 2010, then it was coded as an “unmatched” marriage. If a marriage began before the 2006 wave and was reported in both survey waves, then it was coded as a “matched” marriage. I dropped 74 respondents for whom I could not match any marriages. That is, these respondents did not report any of the same spouses in 2010 as in 2006, raising suspicions as to whether the MLSFH interviewed the same respondent in both waves.
If reports of marriage histories were consistent across surveys, they were included in a reconstructed marriage history (RMH). If a marriage was reported in only one survey, it was also included in the RMH. If a marriage was reported in both surveys (i.e., the same spouse was listed by name in both surveys), but dates or other characteristics were reported inconsistently, then information provided in the earlier survey was used, if reported by the respondent. Reports become less reliable as the reported events took place further back in time (Sudman et al. 1996) and the marriage in question would have happened closer in time to the earlier survey. If a respondent reported “don’t know”, then data from the later survey were used (if this information was reported). The reconstructed marriage histories produced the following indicators for each marriage: marriage order (first, second, third, etc.), year marriage began, status of marriage at interview (still married, separated/divorced2, widowed), and year marriage ended.
The unit of analysis for this study is a marriage. The present study focuses on three types of reporting error: marriage omission (as measured by match status), start date inconsistency, and end date inconsistency. Match status coding was determined by whether a marriage was reported in only one survey year or both years: marriages that were reported in both survey waves are coded as “matched” marriages. Marriages that occurred before 2006 and were reported in only one survey year are coded as “unmatched” (or omitted) marriages. Marriages that had ended by 2010 were divided into two groups: matched-terminated and unmatched-terminated.3 Because respondents should be reporting continuous marriages (married to the same spouse in 2006 and 2010), I did not expect to observe unmatched marriages among these marriages.4 Thus all continuous marriages are considered to be “matched” marriages. Match status is coded in the following manner: unmatched-terminated (omitted), matched-terminated, and matched-continuous.
For the sub-sample of marriages that were reported in both survey waves, I constructed a variable measuring start date inconsistency. If a respondent reported different years for the marriage start date in 2006 and 2010, then the variable is coded as being inconsistently reported. If the same year is reported in both survey waves, then the variable is coded as being consistently reported. The same process was used to code end date inconsistency, except the sub-sample was further limited to marriages that ended before the 2006 survey.
In Table 3 I present descriptive statistics of the matching process. Only marriages that began before 2006 are included. A greater number of marriages are reported in 2006 than in 2010 and match rates indicate that the majority of unmatched marriages are reported in 2006 but not in 2010. Whereas 92.3% of men’s marriages and 94.8% of women’s marriages reported in 2010 are also reported in 2006, only 82.5% of men’s marriages and 89.1% of women’s marriages reported in 2006 are reported in 2010. In total, 1,468 men’s marriages and 1,718 women’s marriages were reported in at least one survey wave. Of these, approximately one in five marriages are unmatched. Since it is unknown whether respondents reported all of their marriages in 2006 and 2010, these numbers mark the lower bound of the true number of marriages. In terms of individual-level statistics, 28.3% of men and 17.9% of women omitted at least one marriage from one of the survey waves. Among respondents married multiple times, around 50% omitted one or more marriages. Of respondents omitting at least one marriage, approximately 20% failed to report two or more marriages in either 2006 or 2010. Roughly 60% of respondents who reported at least one marriage in both survey waves inconsistently reported marriage start and end dates.
Four potential sources of reporting error were observed in the data: marriage, individual, survey, and interviewer characteristics.
Region of residence is coded as central, southern, and northern. Marriage order is categorized as first, second, and third or higher: only 5% of all marriages were of order three or higher. Time since marriage began was calculated by subtracting marriage start dates from 2006. Marriages were classified as short (lasting five years or less) or long (more than five years). Among current marriages (continuously current in 2006 and 2010), marriages were classified as short if they began after 2000. Status of marriage (married, divorced, widowed) was coded according to the reconstructed status of marriage in 2010 and is only used in analyses of report date inconsistencies. “Entered into a polygamous marriage” is coded among men only and captures men who married a woman while still married to another woman. This variable can only be coded as “1” for second and higher order marriages.
Age is measured as a continuous variable and all models also include a term for age squared to allow for a non-linear relationship between age and misreporting. Educational attainment is coded as none, some primary, completed primary, and secondary. Age and educational attainment are taken from the 2006 survey. The inconsistency score, coded continuously from 0 to 3, measures the number of items (educational attainment, number of children ever born, and number of lifetime sexual partners) for which respondents provided inconsistent responses in the 2006 and 2010 survey waves. Inconsistency score is included in models as a potential mediator between individual characteristics and misreporting.
Length of survey time, only available in 2010, is coded into three categories: short, middle, and long. Short refers to the 25% shortest survey times, middle refers to the middle 50% of survey times, and long refers to the 25% longest survey times. In 2006 and 2010, interviewers reported on the respondent’s degree of cooperation. Because very few interviewers reported a “bad” degree of cooperation, I combined “bad” and “average” responses into the same category. The other categories are coded as “good” and “very good”.
At the end of data collection, interviewers answered questions about their background and work history. This information was merged with the survey responses. While the 2010 interviewer data are, for the most part, complete, a significant proportion of the 2006 interviewer data was found to be missing: 30.4% of respondents in the analytic sample lack 2006 interviewer data. This problem is not random and disproportionately affects respondents in the central region, where 47.7% have missing data. As a result, only 2010 interviewer characteristics were included in regression analyses. These characteristics include gender, ever-married, and prior interviewing experience.
In Table 4, I present the number of marriages men and women reported in marriage histories in 2006 and 2010. For reports to appear consistent, the number of reported marriages should remain constant or increase over time. The left side of the table corresponds to the reported number of marriages in 2006 and the top row corresponds to the same figure in 2010. Shaded areas denote declines in the reported number of marriages. Approximately 17% and 10% of men and women, respectively, reported fewer marriages in 2010.
Two other types of misreporting of marriages may occur: 1) an increase in the number of marriages even though a new marriage did not occur in the inter-survey period and 2) the same number of marriages reported even though a new marriage (i.e., a different spouse) took place between survey waves. To provide information on these processes I turn to analysis of marriages rather than respondents. I examine whether respondents consistently report status of marriage, marriage start dates, and marriage end dates across survey waves. Minimal discrepancy exists on marriage status: fewer than 2% of men’s marriages and 4% of women’s marriages had discrepancies. Approximately half of all marriages had discrepancies in marriage dates (Figures 1 and and22).5 Mean discrepancies in marriage dates are approximately the same for men and women, 1.3 to 1.4 years (not shown).
I use multinomial logistic regression to model match status: unmatched-terminated, matched-terminated, and matched-continuous. This method permits the inclusion of both continuous and terminated marriages in regression models.6 Keeping matched-continuous marriages in the analysis allows for the full range of covariation among marriage, individual, survey, and interviewer indicators.7 Because this analysis is concerned with characteristics associated with unmatched marriages, matched-terminated is chosen as the base outcome. All independent variables except status of marriage are included in regression models. Status of marriage is not included because continuous marriages predict continuous marriages perfectly. For men, I built two sets of models. Model 1 includes all variables except polygamous marriage and Model 2 includes all variables except marriage order. Due to collinearity I did not include the two measures in the same model. By definition, if a male respondent entered into a polygamous marriage it is a second or higher order marriage. Standard errors are adjusted to take into account clustering at the individual level because some individuals contribute multiple marriages to regression analyses.
Table 5 presents multinomial logistic regression results contrasting unmatched-terminated marriages (or marriages that have been omitted from one of two survey waves) to matched-terminated marriages. Due to space constraints and lack of theoretical significance, I do not present regression results contrasting matched-continuous marriages to matched-terminated marriages (available upon request). Almost all marriage characteristics are associated with men’s failure to report a terminated marriage at both interviews. Marriages of order three or higher and those of short duration are more likely to be omitted, consistent with Hypotheses 2 and 3. Men’s polygamous marriages are also more likely to be unmatched (Hypothesis 5). Among women, only short duration reduced the likelihood of reporting a terminated marriage at both interviews, consistent with Hypothesis 3.
Older women are more likely to omit marriages (Hypothesis 8). The direction of the age-squared term indicates that the relationship is non-linear, increasing at a decreasing rate and eventually plateauing at around age 70. Age is not associated with omitted marriages among men. Furthermore, a positive association between inconsistency score and marriage omission is observed for both men and women.
None of the survey characteristics were found to be associated with unmatched marriages for either men or women. Prior interviewing experience is negatively associated with marriage omission among men, consistent with Hypothesis 15.
Logistic regression is used to examine the characteristics associated with inconsistent reporting of marriage start and end dates. Because analyses are restricted to marriages reported in both survey waves, respondents whose marriages are included may be selected for better reporting. Logistic regression models are estimated separately for each outcome. All characteristics of the marriage, the respondent, the interview, and the interviewer are included in models of start date inconsistency. Again, I estimated two separate models for male respondents, one with marriage order and the other with polygamous marriage. For end date inconsistency, only terminated-matched marriages are included and, due to a substantial decline in sample size, men’s and women’s marriages are pooled.8 All independent variables are included except polygynous marriage, because it does not apply to women. In all models I adjust standard errors to take into account clustering at the individual level because some individuals contribute multiple marriages.
Table 6 presents logistic regression results for marriage start and end date inconsistencies. Southern marriages are significantly more likely to have inconsistently reported start dates than those in the central region (Hypothesis 1). Start dates of higher-order marriages, specifically second marriages, are more likely to be inconsistently reported, as hypothesized (H2). Although the coefficient is large and positive for third or higher order marriages it is not statistically significant, possibly due to low statistical power resulting from the small number of marriages in this category. When marriage order is not in the model (Model 2), time since marriage began is negatively associated with inconsistency for men, but the coefficient is small and is not significant when taking into account marriage order (Model 1). Start dates for short duration marriages are more likely to be inconsistently reported by women, but not by men, providing mixed results for Hypothesis 3. Widowhood increases inconsistency in reporting start dates, consistent with Hypothesis 6. Women are also more likely to report inconsistent start dates when the marriage ended in divorce.
As hypothesized (Hypothesis 10), educational attainment is negatively associated with inconsistent reporting of marriage start dates. Inconsistency scores also predict inconsistent reporting of marriage start dates among women but not men.
Among men but not women, shorter interview times are positively associated with misreporting start dates (Hypothesis 11b). Women reported by the interviewer to be less cooperative were more likely to inconsistently report start dates (Hypothesis 12), but no such differences were found for men. Prior interviewer experience is associated with greater consistency in reporting start dates, but only for men (Hypothesis 15).
The last column in Table 5 presents results for marriage end date inconsistencies for the pooled sample. Short duration marriages and greater time since marriage began generate more end date inconsistencies, supporting Hypotheses 3 and 4. Inconsistencies are less likely for marriages that ended in widowhood than for those ending in divorce (Hypothesis 7). Older respondents are more likely to inconsistently report marriage end dates (Hypothesis 8). Education is negatively associated with inconsistent end dates, consistent with Hypothesis 10. None of the survey or interviewer characteristics are significantly associated with inconsistencies in marriage end dates.
Finally, I investigate whether misreporting marriages or marriage dates affects marriage indicators by comparing marriage indicators calculated using data from the 2006 MLSFH with reconstructed marriage histories (RMH). Results indicate that marriage indicators are affected by misreporting: mean age at first marriage is lower and the number of times married and divorced is higher when calculated using RMH data. For example, 52.5% of men reported being married once in the 2006 MLSFH and 48.8% in the RMH, a difference of almost four percentage points. Although the two data sources show a difference of less than one percentage point in the percentage of men married twice, a larger discrepancy exists in the percentage married three or more times. Similar findings are observed for women, but the magnitude of the difference is smaller.
The present study examines the reliability of marriage histories collected as part of the Malawi Longitudinal Study of Families and Health. This study demonstrates that a substantial proportion of marriages are underreported and that marriage dates are often reported inconsistently. Regression analyses indicate that misreporting is not random. Several marriage, individual, survey, and interviewer characteristics are significantly associated with misreporting. Marriage indicators are also affected by misreporting.
The present study uses the survey response model as a framework for understanding why respondents omit marriages and misreport marriage dates. Misreporting occurs when respondents fail to follow one or more steps outlined in the survey response model (comprehension, retrieval, judgment, and response formatting). Marriages are omitted when respondents do not comprehend the question in the same manner as the researcher intended, have trouble retrieving and judging the relevant information from their memory, and/or feel the need to edit their responses. By contrast, misreporting of marriage dates occurs when respondents fail to carry out the retrieval and judgment steps completely and accurately. Marriage, individual, survey, and interviewer characteristics affect misreporting by influencing how well respondents carry out the four steps of the survey response model.
Of the two types of misreporting, marriage omission has more serious implications for marriage analyses. As shown in previous studies, respondents are more likely to omit short or unsuccessful marriages than longer and continuing marriages (Boileau et al. 2009; Reniers 2008). Given that fewer marriages were reported in 2010 than in 2006, some respondents may have retrospectively altered their perceptions of previous unions, viewing them as relationships rather than marriages. In many parts of Sub-Saharan Africa marriage is perceived as a process composed of multiple stages, including the exchange of gifts, initiation of sexual relations, provision of bridewealth, and birth of the first child (Meekers 1992; van de Walle 1993). These stages differ greatly across and within countries (Dekker and Hoogeveen 2002). Whether certain unions are perceived as marriages may change over time, especially for unions that have ended (van de Walle 1993), and may occur more frequently in cases where bridewealth has not been fully paid or a child has not been produced.
The ‘fuzziness’ of marriage points to the need for interviewers to provide respondents with context-appropriate definitions of marriage so that the first step of the survey response model, comprehension, is carried out optimally. Many African societies recognize a variety of marriage forms, including free unions, consensual unions, customary marriages, and religious and civil marriages (Arnaldo 2004; Budlender et al. 2004). In the case of Malawi, marriage could be defined as any union that was perceived, at any point during its duration, as a marriage by the couple and members of the community, even if traditional or formal ceremonies were not completed. Marriages would include unions that ended before full payment of bridewealth (if part of the local custom) and those that did not produce any children.
The tedious and cognitively demanding nature of survey participation (Krosnick 1991; Tourangeau et al. 2000) could make it difficult for some respondents to carry out the retrieval and judgments steps in an optimal manner. Analyses revealed that respondents with higher inconsistency scores were more likely to omit marriages and inconsistently report marriage dates, demonstrating that some respondents might be better or more willing to complete the retrieval and judgment steps of the survey response process. Rather than exert substantial cognitive effort to provide accurate and complete responses, some respondents may opt to provide satisfactory responses, a behavior known as “satisficing” (Krosnick 1991). Other respondents may have learned to condition their responses to the survey, a phenomena called panel conditioning (Halpern-Manners, Warren and Torche 2014; Warren and Halpern-Manners 2012). Since the MLSFH began the survey has become longer and more complex. The MLSFH has added modules asking respondents to list sexual partners, household members, individuals providing actual and potential transfers, and network questions about individuals with whom they have discussed HIV/AIDS. For each of these modules, respondents are to answer a series of questions about each individual. Furthermore, in the 2004, 2006, and 2008 waves, multiple survey teams visited respondents, resulting in more time spent being interviewed. Considering the time requirement for participating in the MLSFH, some respondents, especially those who have been interviewed in multiple waves, may conclude that there is little or no benefit to providing accurate responses. I assessed the possibility of panel conditioning by comparing the percentage of respondents who provided inconsistent reports of the number of times married across survey waves (Appendix Table A1). Results show that the percentage of respondents with inconsistent responses increased over time, evidence that panel conditioning might exist.
In order to increase the likelihood that respondents complete the retrieval and judgment steps in an optimal manner, survey designers should implement strategies that increase respondent motivation and reduce cognitive burden. For example, instructing interviewers to encourage respondents to remember events and praising them when they do have been shown to improve the quality of data collected (Cannell et al. 1981). Interviewers can also stress the importance of respondents in the data collection and research process. If respondents realize their potential influence, they may be more motivated to provide complete and accurate responses. In surveys where multiple kinds of histories (i.e., fertility, sexual partners, marriage) are collected, survey designers should consider utilizing event history calendars to collect this data. This format of data collection is less repetitive and makes it more difficult for respondents to intentionally omit events. Event history calendars have been used to capture major life events across various domains, including marriage, births, deaths, migration, employment, schooling, contraceptive use, and sexual relationships, and have been implemented in a wide range of contexts, including urban Kenya, Nepal, and the United States (Axinn, Pearce and Ghimire 1999; Freedman et al. 1988; Luke, Clark and Zulu 2011). Studies have shown that event history calendars result in more accurate reporting of events than standardized conventional survey instruments (Belli et al. 2007; Caspi et al. 1996; Glasner and van der Vaart 2009) and are useful for gathering event history information in populations who do not utilize calendar time, such as some ethnic groups in Nepal (Axinn, Pearce and Ghimire 1999).
Interviewer characteristics have the greatest potential to affect all four steps of the survey response process. Prior interviewing experience, a proxy for interviewer quality, reduced the likelihood of both marriage omission and inconsistent reporting of marriage start dates among men, but not women. Interviewers with prior interviewing experience likely have better interviewing skills such as probing for responses, which could increase the likelihood of obtaining complete and accurate responses. Thus future surveys should focus on strengthening interviewing skills, especially those of first time interviewers, by improving the quality of interviewer training.
A side-by-side comparison of marriage indicators calculated using data from the 2006 MLSFH and RMH indicate that differences are small. The implications of misreporting likely differ depending on how researchers use marriage history data. Misreporting has the potential to affect regression analyses. For example, a study examining the relationship between ever being divorced and HIV status could lead to erroneous conclusions if marriages and, hence, divorces are underreported. Misreporting may not affect marriage indicators at the population level to a considerable degree, depending on the size and direction of individual misreports and the distribution of misreports in the population. This finding is consistent with previous studies demonstrating that inconsistent reporting of age at first sex and marriage does not always bias population-level indicators (Cremin et al. 2009; Wringe et al. 2009; Zaba et al. 2009).
This study contains several limitations. The process of matching marriages across survey waves may contain some error. As described in the Data section, spouse name was the primary criteria used to identify matches. Because respondents do not always report the same names across surveys (Adams et al. 2013), some marriages may not have been matched, resulting in overestimates of underreporting. Moreover, this study is only able to observe a subset of marriages that are misreported in marriage histories, specifically marriages that occurred before 2006 and were reported in only one of two survey waves. Some marriages may be consistently underreported, a type of misreporting that this study cannot capture. This type of misreporting could also bias marriage indicators. In all likelihood both the true number of marriages and divorces and the proportions ever divorced and widowed are higher. The effect on age at first marriage, however, depends on whether first marriages are consistently underreported. If this is the case, then age at first marriage is likely lower. All other marriages would not affect age at first marriage.
This is the first study of its kind to examine the reliability of marriage histories collected in the context of Sub-Saharan Africa. Although marriage histories are frequently used to study marriage dynamics, no prior evidence exists on the degree of misreporting in marriage histories. How relevant these findings are beyond the MLSFH is not known. Marriage in Malawi is less formal and highly unstable compared to many parts of Sub-Saharan Africa, (Clark and Brauner-Otto 2015). Approximately 50% of women will experience a divorce at some point in their lives (Reniers 2003). In areas where marriage is more formal and stable, changing perceptions of marriage would probably result in the omission of some marriages. The proportion of marriages omitted, however, would likely be lower, as marital instability would be less common. Moreover, problems related to determining when marriages begin and end would probably still lead to misreporting. Levels of misreporting would likely be lower because respondents would have experienced fewer marriages, and would thus have fewer dates to recall. Finally, the results of this study could be applicable to other types of retrospectively collected data, such as sexual partner histories and cohabitation histories. In both cases, respondents may be more likely to omit short-lived relationships and misreport start and end dates of relationships.
The author would like to thank Susan Watkins, Hans-Peter Kohler, Jere Behrman, Sarah Hayford, and Véronique Hertrich for their helpful comments. The research was completed while the author was a graduate student at the University of Pennsylvania and was supported by grants from the National Institutes of Health (5-R01-HD-053781-05, 2-T32-HD-007242-31).
Bignami-Van Assche (2003) took advantage of an error that occurred during data collection in the 2001 wave of the MLSFH, whereby some respondents were interviewed twice, to evaluate the consistency of responses among this group of respondents. The author found that 60% to 80% of responses were consistently reported and that the distribution and means of selected variables did not differ significantly between the initial interview and the re-interview. Anglewicz et al. (2009) reached similar conclusions using data from the 2004 and 2006 waves and also investigated attrition and sample representativeness. While re-interviewed respondents were found to differ from respondents who were lost to follow-up on a number of characteristics, attrition did not significantly affect results of multivariate analyses. Furthermore, the MLSFH has admitted to having difficulties tracking down and interviewing the correct respondents in follow-up waves (Adams et al. 2013). In most cases, respondents were not re-interviewed because they had died, moved away, or refused. In other cases the MLSFH survey team encountered ‘imposters’, individuals in the community pretending to be respondents, usually to satisfy their own curiosity or to receive incentives that were given for participation. This problem is especially severe in the south, where population density is higher and houses are closer together, making it easier to pretend to be a respondent. Aware of this problem, the MLSFH has done its best to remove ‘imposters’ from its dataset.
Note: Inconsistent reports of number of times married refers to instances where a higher number of marriages were reported in the earlier survey than in the later survey. Significance levels are relative to the earliest survey wave in which the respondent participated.
2Because most separations are soon followed by divorce, I combined divorced and separated into the same category. From this point forward, I refer to this category as “divorced”.
3Among matched-terminated marriages, 26.4% ended between 2006 and 2010. Among unmatched–terminated marriages, 6.8% ended between 2006 and 2010.
4There are, however, four cases where unmatched marriages were found among continuous marriages. In all four cases, respondents were in polygamous marriages and did not report one of their current spouses in 2006. I deduced that they were married to these spouses in 2006 because they reported being married to them in 2010 and having been married before 2006. Although these marriages are not terminated, I included them in the category “terminated-unmatched”.
5I removed outliers from the graphs because they affect the overall presentation of data. For marriage start dates, I defined outliers as observations where the absolute difference is greater than 10 years. Outliers make up 4.6% of men’s matched marriages and 6.1% of women’s matched marriages. For marriage end dates, I removed observations where the absolute difference is greater than 15 years. Outliers make up 3.3% and 4.5% of men’s and women’s matched-terminated marriages, respectively.
6Marriages that ended between 2006 and 2010 are treated as terminated marriages.
7I obtained similar results when I restricted the sample to terminated marriages and estimated a logistic regression model.
8I tested interactions between gender and all other sources of reporting inconsistency. Only one statistically significant interaction was observed (not shown). Men were significantly more likely to inconsistently report the date of widowerhood than women were to report the date of widowhood.