baby names

Author	Message
Howard Mahler	Howard Mahler posted 19 Years Ago #5894 #
Forum Member Group: Forum Members Posts: 3, Visits: 1	www.ssa.gov/OACT/babynames/ This sight has interesting information on the names of babies born in the U.S. by year of birth. Information on the time series for a particular name might be the basis for an interesting project. Howard Mahler 0
	Reply
ugh	ugh posted 19 Years Ago #5911 #
Forum Newbie Group: Forum Members Posts: 1, Visits: 1	Thanks for the idea, Dr. Mahler! I might look into that. 0
	Reply
NEAS	NEAS posted 19 Years Ago #6002 #
Supreme Being Group: Administrators Posts: 4.5K, Visits: 1.6K	Some candidates have asked about this web site: "What sort of ARIMA models might we expect?" Here are some ideas. If a person does something remarkable and is widely praised by the media, some couples may name their babies after this person. This is a stochastic event. Suppose the name Rebecca is given to 1% of babies, on average. If a person named Rebecca does something remarkable in January 20X5, and she is praised by news anchors during 20X5, perhaps 2% of babies born in 20X6 will be named Rebecca. This is a stochastic event; nothing in the past history of baby names suggests a higher percentage of Rebeccas in 20X6. We consider the effects on future years. One possibility is In 20X7, some news about the remarkable Rebecca lingers, and 1.1% of babies are named Rebecca. In 20X8, 1.0% of babies are named Rebecca, as before 20X6. This process has a memory of one period. It is a moving average process, with θ₁ = –0.100. Another possibility is In 20X7, couples still like the name Rebecca, and 1.5% of babies are named Rebecca. In 20X8, 1.25% of babies are named Rebecca. In 20X9, 1.125% of babies are named Rebecca. This is an autoregressive process with a φ₁ = 50%. A student project may ask: "Do increases in a particular name dies out quickly, as in a moving average process, or do they die out slowly, as in an autoregressive process?" Jacob: I looked at this web site, and the percentages of different names shows short term and long-term trends. The percentage of babies named Rebecca may be 1.0% in 20X5, 1.2% in 20X6, 1.4% in 20X7, and so forth. Which process is this? Rachel: This process has a trend, so it is not stationary. Take first differences to remove the trend, and analyze the time series of the first differences. Jacob: The trends are not constant. The incidence of a name may increase for 10 years, then remain level for 15 years, then decrease for 10 years. Rachel: You may analyze this two ways. The first differences may be an autoregressive process with a slow decay. The expected first difference may be zero. If a stochastic event causes the first difference to be positive, it may decay back to zero over the next ten years. If another stochastic event causes the first difference to be negative in the seventh year, it may remain negative for a while, decaying slowly back to zero. If the processes differ in the two or three periods, you may separate the time series into two or three periods. Jacob: Do we use all the names on the web site? Rachel: You are not doing a thesis on baby names. Select one, two, or three names. You will find it easier to select names that show different patterns. Model the time series for each name with an ARIMA process. Explain the statistical tests you perform. You can select a name that shows different patterns in two or more periods. Model each period with an ARIMA process. If one name particularly intrigues you, examine that name in more detail. Graph the pattern for the name, its first differences and second differences. Use correlograms and the statistical tests to select an optimal ARIMA process. Jacob: Do we have to explain why a particular process makes sense? Rachel: The student project is a statistical project. You demonstrate that you can model a time series with an ARIMA process. You can add comments about why a process makes sense, but that is not necessary. Jacob: What non-statistical items should we be careful about? Rachel: Baby names have large random effects. We do not expect perfect fits with the ARIMA process. High order processes, such as ARIMA(4,1,6) instead of ARIMA(1,1,0), is generally a poor solution. Jacob: How can we tell which ARIMA process is the better solution? Rachel: Suppose the more complex process fits better over an interval of 40 years. Fit the processes over the first 35 years and forecast the next 5 years. Very often, the more complex model has the better in-sample fit but the worse out-of-sample forecast. If so, we choose the more parsimonious sample with the better forecast. Jacob: Do we expect the same ARIMA process to apply to all names? Rachel: Most statisticians would say no. In any case, your goal is not to determine the proper ARIMA process for baby names. Your student project shows that you understand how to use the statistical techniques. You can show this by a study of one or two baby names; you don’t have to use them all. Attachments Baby names time series student project.pdf (950 views, 33.00 KB) 0
	Reply
joeorez	joeorez posted 18 Years Ago #6647 #
Forum Newbie Group: Forum Members Posts: 7, Visits: 1	A few thoughts on the baby names series: The Social Security database is not perfect - what is? The site admits it omits people who never applied for a SS card - which includes people who worked before SS became effective, people who died before applying a SS card, and illegal immigrants. The illegal immigrant item is interesting - people will sometimes use a phony SS number to get work, pay SS taxes, and then not be eligible for SS benefits; but if they never applied for a SS card, they are not in the SS name database. Some interesting insights on baby names are found in Levitt and Dubner's "Freakonomics", chapter 6. I found many series are U shaped. I also found a few series with a name temporarily made more popular by a famous person (Franklin for FDR, Marilyn for Marilyn Monroe, Shirley for Shirley Temple), but these names were popular before those people became famous too. The percentage of the Hispanic population to total US population is increasing, and consequently Hispanic names are being more popular. I believe this affects the popularity of non-Hispanic names. [NEAS: Good points] Thanks, Joe Orez 0
	Reply