Lots of research indicates that in many countries there is more gender equality in cities than in rural areas. Alice Evans has a great recent article exploring this in the context of Cambodia. Does this mean that women from rural households are more likely to want to move to cities than their male counterparts? It seems probable. But cities have also often been seen as spaces of danger, and urban life stereotyped as corrupting. Perhaps such depictions, biased and problematic as they are, obscure the attractiveness of cities for rural women?
To know whether women have been more optimistic than men about moving to a city, we can’t just count the number of men and women in villages and cities, since not everyone who wants to move to a city actually does move to one.
I looked at a survey from Taiwan in 1970. It turns out that women from farming households were more likely to say they would prefer to live in a city than men in the same households. 57% of the farming-household men said they would prefer to stay in the countryside, versus only 46% of the women. Not a massive difference, but significant.
But do these numbers understate the difference? More educated people also might prefer living in the cities, because they would more likely get a job in which they could use their higher level of education. Families often devoted more resources to their sons’ education, so it’s likely that the men in the farming-households were more highly educated. If we controlled for this, could there be an even bigger gender difference?
A logistic regression with education and gender as independent variables confirms that women in farming households were significantly less likely to want to stay in the countryside. But it suggests that levels of education actually didn’t make any difference. Why might that be? When we look at the reasons given for wanting to live in a city, it’s striking that relatively few people mentioned economic motivations. ‘Convenience’ and ‘entertainment’ were the two most popular reasons.
Note on the data: This survey was overseen by Wolfgang Grichting. Grichting was a committed Catholic from Switzerland. He had a PhD from Michigan. Many of the questions in the survey he carried out with local research assistants in Taiwan in 1970 asked about respondents’ moral values and religious belief and practice. Around 2000 people were surveyed.
In the future, I will use this data to look at social segregation between Mainlanders and Taiwanese, a topic that comes up in two of the modules I teach.
Tutorial on using this data in R to get these results
I’ve tried to write this for beginners, with my own students in mind, but if you’ve never used R before you’ll probably need a basic introduction to what R is first. Feel free to contact me if you have questions or comments.
If you are copy-pasting commands from below, you might have to re-type the quotation marks in R, since the quotation mark in the default WordPress font doesn’t seem to be recognized as a quotation mark by R.
Importing data
First, you’ll need to get the data from the ICPSR archive.
Some of ICPSR data is available for anyone to download. Other datasets, including the one I used for this post, require you to belong to a subscribing university. If you are not at a university or your university doesn’t subscribe, your country might have a data archive that can order the data for free for you. This is the case with the UK Data Archive. If you still can’t, you’ll have to ask someone for help. Feel free to contact me.
Once you’ve got the data, you’ll need to import it into R. The data is in ASCII format. This comes in sets of two files, a txt file with the actual data, and a ‘setup’ file with the file extension sps, which allows some software to turn it into something you can work with. To read this in R you need to install the asciiSetupReader package
Install the package to read ASCII data in R:
install.packages(“asciiSetupReader”)
Add the package to R’s working library for your R session:
library(asciiSetupReader)
Import the data into R, replacing what is in the quotation marks with the file-paths of the txt file and the sps file on your computer.
tw <- spss_ascii_reader(“/Users/josephlawson/tw.txt”, “/Users/josephlawson/tw.sps”)
note: ‘tw’ is just an arbitrary name that I gave to this dataframe, you could call it anything.
You can check that it worked using
colnames(tw)
This will give you the headings of all the columns (i.e. the names of the variables in the dataset).
At this point you should read through the codebook that came with the data. This explains what all these column headers mean, as well as how the data is coded. It also includes an important introduction to how the data was collected.
Subsetting data
Since we’re interested in rural people, it is useful to make a new dataframe, with all the people who answered that the head of their household was any of the occupations coded between 400 and 430 in the variable called R_HSBND_S_OCCUPATION_3_D (occupation of respondent or their husband). This range includes several types of farmer. You could also work with a broader range of households and include fishers and lumberjacks. But sometimes fishing people can live in relatively large settlements. I did this with narrowly and more inclusively defined subsets, and it didn’t make much difference to the results. You also might like to see whether there is a general gender difference in perceptions of cities vs the countryside, inclusive of all people in the dataset.
To make the new dataframe called ‘rural’ with just the people with occupations coded 400-430, use this command:
rural <- subset(tw, R_HSBND_S_OCCUPATION_3_D >= 400 & R_HSBND_S_OCCUPATION_3_D <= 430)
Tip: with the ‘subset’ command, you can replace ‘&’ with ‘|‘ if you want ‘or’ instead of ‘and’ linking the two conditions.
Summarising variables
Before we start analszing this, we have to tell R that the variable ‘XREASON_PREFER_CITY_1’ (whether or not someone would prefer to live in a city and if so why) is categorical data (not data in a numerical scale). Use the following code:
rural$XREASON_PREFER_CITY_1 <- as.factor(rural$XREASON_PREFER_CITY_1)
Now we can summarise the responses of all the people in the farming households to the question of if/why they would prefer to live in a city:
Summary(rural$XREASON_PREFER_CITY_1)
Tabulating data
Next we can create a table with gender cross-referenced with reasons for preferring (or not) cities:
table <- table(rural$R_S_SEX, rural$XREASON_PREFER_CITY_1)
This has created a new object in R that I’ve called ‘table’. To see it just write ‘table’ in the command line. I did it this way so that I could do the next step.
It is useful to have proportions of men and women giving each answer, rather than the raw numbers. here:
prop.table(table,1)
Note that the ‘1’ indicates that I want the proportions of the first variable (gender) in each category of the second variable.
This is what gives me the 57%-46% figures I referred to above; 57% of the men’s answers are coded ‘0’, i.e. they would prefer to stay in the countryside.
I will do a separate introduction to logistic regression. I have included the steps for the logistic regression below, but if you don’t already know what logistic regression is, you will need a separate introduction to this.
Data wrangling in preparation for logistic regression (re-coding data, transforming miscoded or ambiguous data into NAs)
The target variable in the regression would be ‘PREFER_LIVE_CITY_COUNTRY’
We have to tell R to treat it as a categorical variable
rural$PREFER_LIVE_CITY_COUNTRY <- as.factor(rural$PREFER_LIVE_CITY_COUNTRY)
But as we can see from the summary of this, even though this should work as the target variable for a logistic regression (it’s a categorical variable; almost all of the people either said they would prefer to live in the city, or they said they would prefer to live in the countryside), it won’t work because of an error in the coding and a very small number of other answers.
summary(rural$PREFER_LIVE_CITY_COUNTRY)
0 1 2 5 NA’s
1 246 7 258 1
To start with, according to the code book, there shouldn’t be a 0. It doesn’t mean anything and should be an NA.
The other numbers mean the following:
1 = would prefer to live in a city
2 = “depends”
5 = would prefer to live in the countryside.
“Depends” is a probably a meaningful answer, but not helpful for us here, so I’m going to exclude those observations too.
To replace something with NA, we can use the naniar package.
install.packages(“naniar”)
library(naniar)
farmers2 <- replace_with_na(rural, replace = list(PREFER_LIVE_CITY_COUNTRY = c(0, 2)))
This has created a new dataframe called ‘farmers2’, which has the 0s and 2s in the specified variable replaced with NAs.
Then we need to recode ‘5’ as ‘1’ and ‘1’ as ‘0’, to make the logistic regression work.
Install.packages(“dplyr”)
library(dplyr)
farmers2$PREFER_LIVE_CITY_COUNTRY <- recode(farmers2$PREFER_LIVE_CITY_COUNTRY, ‘1’ = “0”, ‘5’ = ‘1’)
Which gives us
summary(farmers2$PREFER_LIVE_CITY_COUNTRY)
0 2 1 NA’s
246 0 258 9
And one more recoding of the gender variable:
farmers2$R_S_SEX <- recode(farmers2$R_S_SEX, “1” = “0”, “5” = “1”)
Then tell R that the education variable is categorical
farmers2$R_S_EDUCATION <- as.factor(farmers2$R_S_EDUCATION)
At this point, have a look at a summary of the R_S_EDUCATION data. It’s not necessary in this case, but with sometimes you will have to use the “relevel” command to set a new reference category, if the default would be a category that only occurs a few times, or is a category that we wouldn’t be be very interested in.
And we are ready for the regression:
model <- glm(PREFER_LIVE_CITY_COUNTRY ~ R_S_SEX + R_S_EDUCATION, data=farmers2, family=binomial(link=logit))
To see the results, use
summary(model)
The table below is produced with
library(sjPlot)
library(sjmisc)
library(sjlabelled)
tab_model(model)
| Prefer to live in city | |||
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 1.33 | 0.94 – 1.88 | 0.106 |
| Female | 0.65 | 0.44 – 0.95 | 0.027 |
| 1-3 years’ schooling (reference = no schooling) | 0.77 | 0.35 – 1.66 | 0.508 |
| 4-6 years’ schooling | 0.89 | 0.59 – 1.35 | 0.598 |
| 7-9 years’ schooling | 1.53 | 0.60 – 4.25 | 0.387 |
| 10-12 years’ schooling | 0.56 | 0.11 – 2.66 | 0.466 |
| “Old style education (private tutorship) | 1.56 | 0.47 – 6.07 | 0.481 |
| Observations | 504 |