How to Convert String Variables With Non-Numeric Values to Numeric Variables in Stata
We can convert string variables with non-numeric values to numeric variables in Stata using the encode or egen commands.
Many survey questionnaires use a Likert or Likert-like scale, e.g.:
- Strongly Agree
- Agree
- Neutral
- Disagree
- Strongly Disagree
or
- Always
- Usually
- About Half the Time
- Seldom
- Never
Below is another example of non-numeric values in a variable:
- A
- B
- C
- D
- E
When analyzing data, it is often desirable to have numeric values (e.g., 0, 1, 2, 3, 4 or 1, 2, 3, 4, 5) instead of non-numeric ones. Stata recognizes these non-numeric values as “string” values, and their variables are called “string variables.”
In Stata, there are a few ways of converting string variables (with non-numeric values) to numeric variables (with numeric values). The commonest way to achieve this is probably by using the encode
command, i.e.:
. encode oldvar, generate(newvar)
where oldvar is the name of the old variable and newvar is the name of the new variable. If we use the encode command, the new numeric variable will have value labels added to it.
Another way of doing the same thing is by using the egen
command, i.e.:
. egen newvar = group(oldvar)
The new variable will have numeric values without value labels.
Dear Dr. Andy;
When I was searching about how to convert string variables in to numeric variables in stata, I found your document. It was really helpful for me. Thank you so much sharing your knowledge with others.
Warm Regards,
Shantha
@Shanta – You’re welcome. I’m glad I could help! 🙂
Respected sir,
i have non-numeric code in a variable in stata and i want to rename that non-numeric code (under that particular variable) into a numeric value. please tell me, how can i solve my problem?