🍐 我们总结了加拿大代写中——代写R语言的经典案例，如果你有任何统计代写的需要，可以随时联络我们。CoursePear™ From @2009。(STAT代写)

Problem

Clearly define the target population for the study. Be sure that you indicate what the units are in the target population.

In this study you are asked to analyse the following variates: health, covid, vaccine, media, first.tweet, retweets, time.of.day, likes, username, is.retweet. For each variate specify its type and give an attribute in the target population for this variate.

Example:

The variate retweets is a ___ type of variate. An attribute of interest for this variate is ___.

The Problem step includes the motivating questions for the study which are stated in terms of the attributes of the target population.

Motivating Questions

(a) In the target population what proportion of tweets contain a particular keyword (health, covid or vaccine)?

(b) Suppose that 50 tweets are drawn at random from the target population. What is the probability that at least half of these tweets will contain the keyword?

(c) In terms of media items, is there a di§erence between Örst tweets of the day compared to not Örst tweets of the day in the target population?

(d) What is the mean number of retweets in the target population?

(e) Does the Exponential model Öt a transformation of the variate retweets in the target population? (f) Is the distribution of the number of retweets in the target population di§erent depending on whether the tweet contains a particular keyword or not?

(g) In the target population is the mean number of likes received by tweets which are tweeted during the time period 9 : 00  12 : 00 di§erent compared to the mean number of likes received by tweets which are tweeted during the time period 12 : 00  15 : 00?

(h) Is there a di§erence among the provincial health agencies with respect to how often they use information retweeted from other accounts versus their own original tweets?

Specify the type of problem (descriptive, causative, predictive) for each of these motivating questions by completing the following sentence and including it in your report:

Motivating questions (),…,() are descriptive problems, (),…,() are causative problems, and (),…,() are predictive problems.

where you should place the relevant letters in the parentheses (). Note that this list contains at least two di§erent types of problems.

Note: Because your manager gave you this list of motivating questions, it is not necessary to include these as part of your report. You may reference them with the letters (a), (b), etc.

Plan

What type of study is this and why?

Clearly deÖne the study population or process for this study.

Describe at least two possible sources of study error, including justiÖcation for why you believe they are sources of study error. Be sure to reference attributes when discussing possible.

Describe the sampling protocol for this study in as much detail as possible. Be sure to indicate the sample size for your dataset.

Clearly identify one possible source of measurement error.

Data

In 1  2 sentences, discuss any concerns or issues that you have in obtaining your sample (dataset).

Analysis

Complete the analyses given below as well as any other analysis you deem necessary. You should provide as much explanation as you think necessary for your manager to understand your report. Be sure to write in complete sentences. Since this is a report, your analyses should not contain any detailed mathematical derivations or calculations. All tables and plots require titles/labels as appropriate.

Note: The numbering below is only used to help with instructions for uploading the parts to Crowdmark for ease of marking.

(1) Your dataset contains three keyword variates (covid, health, and vaccine) that indicate whether or not each tweet contains a keyword.

You should decide which of these keywords you think is most important for the purposes of this study. You should then analyze the corresponding variate as follows:

Why did you choose this word?

Provide an estimate of the proportion of tweets in the study population that contain your chosen keyword. What model and method have you used to obtain this estimate? Explain why the model you used is reasonable.

Provide a 15% likelihood interval estimate for the proportion of tweets in the study population that contain your chosen keyword.

Suppose 50 tweets are chosen at random from the study population. Give an estimate of the probability that at least half of these tweets will contain your chosen keyword. Explain clearly how you obtained this estimate.

(2) Your dataset contains the variates media and first.tweet.

Complete the following table which summarizes the number of media items used for Örst tweets of the day and tweets which are not Örst tweets of the day.

What is the sample mode for the variate media for Örst tweets of the day? What is the sample mode for the variate media for tweets which are not Örst tweets of the day?

What is the sample mean for the variate media for Örst tweets of the day? What is the sample mean for the variate media for tweets which are not Örst tweets of the day?

Propose a model which could be used for modeling the variate media for Örst tweets of the day and justify your choice. What is/are the unknown parameter(s) in this model? Give the maximum likelihood estimate(s) for the unknown parameter(s).

(3) Your dataset contains the variate retweets. For this analysis you will need to create the trans formed variate retweets.log = log(retweets + 1)

Give the Öve number summary and sample skewness for retweets.log.For ease of marking please put these in a table. Values may be rounded to two decimal places.

Assume that retweets.log follows an Exponential( ) distribution. Provide the maximum like lihood estimate of  . Indicate how the parameter   relates to the attribute of interest in the study population.

Provide the plot of the relative frequency histogram of retweets.log with a superimposed Exponential( ^) probability distribution function.

(4) For this analysis you will be examining the variate retweets.log for the keyword you chose in part (1) of the Analysis step.

Provide the Öve-number summary for the number of retweets for tweets containing your chosen keyword. For ease of marking please put these in a table. Values may be rounded to two decimal places.

Provide the Öve-number summary for the number of retweets that do not contain your chosen keyword. For ease of marking please put these in a table. Values may be rounded to two decimal places.

Assume that retweets.log follows an Exponential( 0) distribution for tweets that do not contain your chosen keyword. Assume that retweets.log follows an Exponential( 1) distribution for tweets that do contain your chosen keyword. Provide the maximum likelihood estimates of  0 and  1

Provide approximate 95% conÖdence intervals for  0 and  1 based on the appropriate asymptotic Gaussian pivotal quantity.

Provide a side-by-side boxplot of retweets.log for tweets not containing your word and tweets that contain your keyword.

(5) Your dataset contains the variate likes and time.of.day. For this analysis you will need to create a new variate from the time.of.day variate which is the time of day converted to hours. Use a 24 hour clock.

Provide the Öve-number summary for the number of likes received by tweets which were tweeted during the time period 9 : 00  12 : 00. For ease of marking please put these in a table. Values may be rounded to two decimal places.

Provide the Öve-number summary for the number of likes received by tweets which were tweeted during the time period 12 : 00  15 : 00. For ease of marking please put these in a table. Values may be rounded to two decimal places.

Assume a G( amam) model for the number of likes received by tweets which were tweeted during the time period 9 : 00  12 : 00. Assume a G( pmpm) model for the number of likes received by tweets which were tweeted during the time period 12 : 00  15 : 00.

Note: a Gaussian model may not be appropriate for these data. However, your manager has insisted you use a Gaussian model, so you should answer these questions assuming one is suitable.

Give the maximum likelihood estimates of  am and  pm. Give a 95% conÖdence interval for both of these parameters.

Give the maximum likelihood estimates of  am and  pm. Give a 90% conÖdence interval for both of these parameters.

You have been asked to examine how often di§erent provincial health agencies use information retweeted from other other accounts compared to their own original tweets. Your analysis should include a proposed model, a discussion of whether the model is reasonable, and estimates (including interval estimates) of any relevant parameters.

Conclusion

The conclusions you make should be based on the Analysis step and should address the motivating questions (a)  (h) in the Problem step.

Conclusions can only be made in relation to the study population.