0 of 6 Questions completed
Questions:
You have already completed the quiz before. Hence you can not start it again.
Quiz is loading…
You must sign in or sign up to start the quiz.
You must first complete the following:
0 of 6 Questions answered correctly
Your time:
Time has elapsed
You have reached 0 of 0 point(s), (0)
Earned Point(s): 0 of 0, (0) 0 Essay(s) Pending (Possible Point(s): 0)
The “*” character matches with any number of characters (including 0). So the variable list “child*bl” will match with any variable that starts with “child”, has 0 or more characters of any kind, and ends with the letters “bl”. Examples that would fit this pattern include: “child1bl”, “childxyzbl”, and “childbl”. Examples that would not fit this pattern include “child1bl2” and “1childbl”.
Note that another wildcard character is “?”, which matches with exactly one character. So examples that would fit the varlist “child?bl” include “child1bl” or “childxbl”. Examples that would not fit the varlist “child?bl” include “childbl” and “child123bl”.
Since all of the test score variables are listed in order on the dataset, you can specify the following varlist in the loop:
foreach var of varlist total_bl-english_diff_ely3{
Alternatively you could specify the following:
foreach var of varlist total* hindi* math* english*{
There are 28 variables that fit that pattern. You could count them manually, or the following commands will count them for you:
unab varcount: total_bl-english_diff_ely3 disp wordcount(“`varcount’”)
To compare the values of two variables you can use a count command:
count if `var’ != `var’_miss0
You could run this command for each of the 28 variables, or you could run a loop to run this command 28 times. If you include a counter in the loop then Stata will keep a running count of how many times this command returns a non-zero value.
local num_change 0 foreach var of varlist total_bl-english_diff_ely3{ count if `var’ != `var’_miss0 if `r(N)’ != 0{ local num_change = `num_change’ + 1 } } disp `num_change’
Running this code will show that values changed for 24 of the 28 variables.
To calculate the average test score for each student, run the following command:
gen total_avg = (total_bl_miss0 + total_ely1_miss0 + total_ely2_miss0 + total_ely3_miss0)/4
Alternatively you could use an ‘egen’ command:
egen total_avg = rowmean(total*miss0)
These two commands only produce equivalent values if there are no missing values for any observations. If an observation has a missing value for one of the variables, then the first command will replace the total_avg value for that observation with missing, whereas the egen command will ignore the missing value and take the average of the non-missing values for that observation.
After creating total_avg, find the average value using the summarize command:
sum total_avg
The following code creates binary variables for each age:
sum child_age_bl forval y=`r(min)’/`r(max)'{ gen child_age_bl_`y’ = (child_age_bl == `y’) if !missing(child_age_bl) }
Note that I refer to the minimum and maximum values from the summarize command programmatically, rather than hardcode the minimum and maximum ages.
You can either count the number of new variables manually or programmatically, like this:
unab varcount: child_age_bl_* disp wordcount(“`varcount’”)
If you run the command “regress total_ely1 child_age_bl_*”, you’ll see that the largest coefficient is on child_age_bl_4, and thus (surprisingly) 4-year olds have the highest average Year 1 endline test score. You’ll notice that Stata needs to omit one variable from the regression. The omitted variable’s coefficient can be read off of the coefficient on the constant (“_cons” in the regression output). If you want Stata to omit the constant instead, then you could run the following command:
regress total_ely1 child_age_bl_*, nocons
The coefficients in this regression are the same as the previous regression, except that the constant value of 8 has been added to each of them.
1. Which variables would be included in the following varlist: child*bl
2. How many variables are included in your loop?
3. How many variables had at least 1 value change in this loop?
4. Create a new variable total_avg that is a student’s average score across the 4 rounds of testing (total_bl, total_ely1, total_ely2, and total_ely3), using your new variables that replaced missing with 0. What is the average total_avg across the dataset? Round to the nearest 0.1
5. How many new variables were created in the last step?
6. Regress total_ely1 on the set of binary variables that you created (use the regress command). Which age group has the highest average test scores at the end of Year 1?
6 December 2024
5 December 2024
4 December 2024
3 December 2024
12 September 2022
Username or Email Address
Password
Remember Me