Antecedents of Student Retention: A Predictive Modelling Approach

Today university and college rankings are the major determinants influencing the enrollment decision of the current student. Universities improving on certain aspects are found to retain a majority of freshman students despite the university tier status. These aspects are seen to determine a student’s retention, which eventually helps the university to attend a higher tier status. The study tries to evaluate the aspects which primarily influence the future enrollment decision, thereby resulting in retaining the student. The evaluated aspects deal with the intrinsic factors responsible to support the academic tenure of a current student. The aspects, mainly revolve around the factors like peer effectiveness, academic progress, and academic advising out of which academic advising is a controlled variable addressed by the university administration. Two other variables are seen to indirectly be controlled by the administration. In the initial stage principal component analysis is used to identify the factors, followed by multiple regression analysis to determine the influence of each factor on the likelihood of retention. To validate the facts logistic regression analysis is used to verify the same factors that influence the actual retention of the student.


INTRODUCTION
T he higher education institutes are different from the other industries, being a non-profit organization, they depend heavily on government and public funding. The services that higher education institutes provide are intangible and are centered on knowledge and student learning (Adelman, 1999). Student enrollment has been the main source of revenue for the institutes beyond the government and public funding and donations (Light, ing for the past few years is freshman retention. It is quite evident that freshman prefers to transfer to other institutes for the sophomore year (Asera, 1998). University ranking, reputation, eliteness, and return on investments are the factors openly known to support such decisions (Mangold, Bean, Adams, Schwab, & Lynch, 2003).
Factors can be intrinsic or extrinsic (Noel, Levitz, & Saluri, 1985). This research study investigates factors which are intrinsic for whom the source is the institute itself. Institutes prefer to allocate their budget and invest their capital in resources and activities which can help them increase their enrollment (Bean, 1980(Bean, & 1983. This study focuses on the influence and effects of intrinsic factors on freshman retention. It is important to locate and investigate the internal factors which lead to a better student experience (Belgrade & LoRe, 2003& Pascarella & Terenzini, 2005. Some factors also lead to better student learning, while others contribute to tacit knowledge. Overall, the positive to consider is on the higher side the negatives that lead to dissatisfaction, denial, and turnover require attention (Bean, 1985& Roach, 1997.
This study solely focuses on the part where institutes control the factors and can further better understand the influence of their decisions control the internal institute-based activities. The study addresses many factors that can be solely discerned of the institute policies that can lead to increased retention. It is the need of the time to investigate the controllable factors which are subjected to institutional policy changes (Braxton, 2000& Schnell & Doetkott, 2003. Hence, this study contributes to higher education literature for better retention and enrollment. The findings from the study can lead to improved freshman retention, better satisfaction, and better student learning outcomes.

The intrinsic factors of higher education
This research is a design science study that is methodological and develops mathematical model to predict outcome. The study contributes to the literature of design science method by exploring predictive models that can best fit the data. This study tries to examine the antecedents of student retention considering multiple predictive models and confines itself to multiple linear regression for its predictive superiority over other models. Further, it verifies the validity of this model using a logistic regression model using the outcome variable with verified student retention data. The research question under consideration is "Does multiple linear regression fits to be the best predictive model for student retention".
The institutional intrinsic factors of higher education can be defined as the factors that are solely controlled by the institute administration and its policies (Braxton & McClendon, 2002). The institute must implement these activities in the assumption that they will lead to the betterment of a student's progress (Schnell, Seashore, & Doetkott, 2003). Intrinsic factors can be also considered as antecedents of student retention if they positively influence the students' learning process and satisfies them with the outcome (Braxton, Hirschy, & McClendon, 2004). For this study, 8 factors were considered as potential influencers of retention. Those 8 factors were identified as faculty effectiveness, college costs, academic progress, peer effectiveness, academic advising, academic course load, and academic success.
The outcome for retention was likeliness to continue which was used as the outcome variable. Faculty effectiveness is defined as the overall quality of instructions delivered through any medium (Cabrera, Casteneda, Nora, & Hengstler, 1992). It is taken for granted that with instructions some amount of knowledge is imparted from the faculty to the students. It is important to understand the role of faculty in leading a student to retain his status in the institute (Cabrera, Nora, & Castaneda, 1993). College cost is one most important factor that fluctuate over time (Tinto, 1987(Tinto, & 1997. It is completely controlled by the university administration and directly influenced by the running budget (Congos & Schoeps, 2003). It can act as a pivotal turning point when it comes to choosing an affordable option over others.
Academic progress is a factor considered to be guided by multiple aspects. In a research study, it has been proved that academic progress on either side of the scale can lead to a transfer (Carter, 1998& Tucker, 1999. Students performing better may prefer to upgrade themselves to high ranked universities, while those doing poorly might prefer to choose the universities where they can complete the program with just above the threshold grades (Covington, 2000& Crissey, 1997. Academic course load is one of those factors that seem to affect knowledge growth at the same time may lead to an increase in stress (Crokston, 1994& Dervarics & Roach, 2000. It has been reported in studies that excessive course load and unplanned course addition can lead to student turnover in institutes (Fidler, 1991). Academic course load directly affects academic progress and is directly influenced by academic advising (Gilley, 1992).
Academic advising is defined as the structured and guided instructions needed to put forward a plan of study (Glass, 1997). Academic advising leads to better structuring of courses addressing individual needs and skillsets (Graham & Diamond, 1999). It puts forward an idea about the course structure and helps a student to build his e-portfolio around. Academic course load is directly affected by advising (Hensen & Shelley, 2003& Hurd, 2000. Prompt advising can lead to a better plan, further leading to a feasible course load (Hurte, 2002). A student needs to get the right advice to build a structured plan of study. In the end, peer effectiveness is an important factor (Ishitani & DesJardins, 2002). Fellow students, classmates, colleagues, roommates, etc. are very influential factors when it comes to decision making at the institutional level (Karp & Logue, 2002). Understanding the influence of peer effectiveness can lead to developing a better strategy to influence and retain most freshmen.

Concurrent Validity
The concurrent validity model is a framework used in predictive modeling when validating a predictive model like a linear regression model with another outcome model (Kern, Fagely, & Miller, 1998). The outcome variable differs in which the first model is used to understand the influence of the factors on the predicted likelihood of the outcome while the second model is used to validate the factors using the outcome itself. The second model is run once the outcome has occurred. This model helps to validate and verify if the same factors were significantly affecting the outcome variable.

Data Collection
Data was collected using an online survey which was distributed among 2000 students at the university. The sample criteria were set to only include freshman and sophomore students. The survey had a total of 30 items (questions) representing 9 constructs. The nine constructs were reassumed through literature review and the items were tested using exploratory factor analysis (EFA) to verify if they contribute towards a particular construct. The survey had a screening section after the instructions. The job of the screening the respondents before the main is very important to get the right responses for the research.
A total of 750 respondents completed the survey. The survey was divided into nine different sections, each section comprising of a construct to be measured. The number of items for each section varied accordingly. The survey was programmed in such a way that no questions were missed and it was mandatory to answer each question before moving on to the next. Overall, the rate of response was 37.5% and all completed the questions in the survey.
The data was collected in 3 batches after sending 15 invitation and reminder emails over 15 weeks to the panel. The data was cleaned and re-coded wherever it was necessary. Further, the data was checked for abnormalities and patterns. The data was thoroughly scrutinized before being further analyzed.

METHODOLOGY
The research was divided into 3 different studies. Study 1 was used to investigate the number of factors responsible for the likelihood of future enrollment of a student. The steps are taken to investigate this phenomenon initially consisted of breaking down 32 variables into a few selected factors using principal component analysis. Then in study 2, the influence of each factor on the dependent variable was determined using multiple regression. The sole purpose of this analytics was to segregate and evaluate significant factors contributing to the dependent variable. Study 3 was used to validate the factors by testing them on real retention of the freshman student. Retention being a categorical variable in nature and using it as a dependent variable, logistic regression stands out to be the method that can be used in such a condition, keeping in mind the objective of the study to evaluate the influence of each factor on the freshman retention. Discriminant analysis is another method that can be used to do the same function is also used to group factors in a particular function. Functions in the Discriminant analysis represent two different associated with the dependent variable. The main purpose of the Discriminant analysis has segregated the factors into different groups while logistic regression analysis helps to understand the influence of each factor on the actual retention.
Overall, both the sections are used to compare the results so that the factors explored in study 2 can be validated and cross-checked from study 3. Study 1 helps us to understand the factors responsible for the likely hood retention, improving these factors would help a university secure, high retention before the students transfer or drop out of the school. These factors also indirectly affect the tier ranking of the university where mandatory criteria like GPA are strongly affected by peer effectiveness and academic advising. Study 3 inspects each of these factors on the real retention of students and clarifies the hypothetical relationship between retention and the inspected factors. Overall, the methodology tries to clarify the importance of each factor in freshman retention.

Study 1: Dimension Reduction using Exploratory Factor Analysis
Study 1 was used to identify and determine the influential factors affecting the likelihood of future enrollment and not actual retention. Results of the principal component analysis indicate that 47 metric variables are reduced into 9 factors. Further out of these 9 factors, 8 factors were found to pass a reliability test with Cronbach's alpha value over 0.8. These 8 factors were named as follows faculty effectiveness,

MANUSCRIPT CENTRAL
college costs, academic progress, peer effectiveness, major, academic advising, likely to\ continue, academic course load, and academic success.

Study 2: Multiple Linear Regression Equation Model
Study 2 was used to evaluate the significance of each factor, all factors were entered simultaneously in the regression analysis of the dependent variable likelihood of enrollment. The results of regression analysis indicate that three factors have a significant effect on the dependent variable and these 3 factors influence the likelihood of enrollment in either a positive or negative way. From figure 2 we can conclude that peer effectiveness, academic progress, and academic advising are responsible for affecting the likelihood of continuing at the university for the next semester. It is clear that peer effectiveness and academic progress negatively affect the likelihood to continue while academic advising positively affects the dependent variable. Considering the beta coefficients academic advising (+1.312) has a stronger positive effect compared to peer effectiveness (-0.461) and academic progress (-0.518) in their negative forms. Thus, this study overall supports hypothesis 1.

Study 3: Logistic Regression Equation Model
Study 3 was used to externally validate the results of study 2. As a part of concurrent validity, the data enrolled for the sophomore year are retained and thus to further test the three factors, peer effectiveness, academic progress, and academic advising for their significant influence on the enrollment of students. dependent variable was a categorical variable consisting of only two outcomes, yes or no, logistic regression was used to evaluate and validate the influence of each factor on the dependent variable. From figure 3 it is clear that all three factors are found to be significant at a 1% significance level.

DISCUSSION
This study has several contributions to the field of higher education. Student retention in higher education has not been predictively investigated before. The use of concurrent validity model with regression analysis as a predictive instrument answers the call of many researchers to discuss the need for improvements in student retention in higher education.
Higher education institutes are considered to be a non-profit organization. Our findings can lead to better academic advising which can further lead to increased academic progress. A positive influence of peer effects on student retention suggests that the mentor programs and group therapy can lead to improvising student confidence in others. All this constitutes to develop a better student retention strategy the university can focus on a better allocation of funds.
Peer effectiveness should be encouraged in the academic structure, due to its numerous benefits, as IJCRR 11 (11), 21906−21913 (2020) proved in this study. In a time where higher education institutes face increased criticism by the people and face challenges such as budget cuts, enrollment decline, faculty turnover, etc. One main takeaway from this research is that the higher education institute should a better academic advising approach.

LIMITATIONS AND FUTURE RESEARCH
This study has few limitations as common with any study using empirical analysis. The data was collected from a single source (panel) from an institute ranked and accredited on a particular level. There is a chance that there was a response bias leading to causal inference. Future research can use longitudinal data collected over some time on certain intervals which will enable the researchers to correct the response bias effect in a casual relationship.