Download Improving Student Success Using Predictive Models PDF

TitleImproving Student Success Using Predictive Models
File Size1.6 MB
Total Pages13
Document Text Contents
Page 1

Improving student success using predictive models and data
visualisations

Alfred Essa* and Hanan Ayad

Desire2Learn Inc, Kitchener, Canada

(Received 12 March 2012; final version revised 13 June 2012)

The need to educate a competitive workforce is a global problem. In the
US, for example, despite billions of dollars spent to improve the educational
system, approximately 35% of students never finish high school. The drop rate
among some demographic groups is as high as 50� 60%. At the college level in the
US only 30% of students graduate from 2-year colleges in 3 years or less and
approximately 50% graduate from 4-year colleges in 5 years or less. A basic
challenge in delivering global education, therefore, is improving student success.
By student success we mean improving retention, completion and graduation
rates. In this paper we describe a Student Success System (S3) that provides
a holistic, analytical view of student academic progress.

1
The core of S3 is a

flexible predictive modelling engine that uses machine intelligence and statistical
techniques to identify at-risk students pre-emptively. S3 also provides a set of
advanced data visualisations for reaching diagnostic insights and a case manage-
ment tool for managing interventions. S3’s open modular architecture will also
allow integration and plug-ins with both open and proprietary software. Powered
by learning analytics, S3 is intended as an end-to-end solution for identifying
at-risk students, understanding why they are at risk, designing interventions to
mitigate that risk and finally closing the feedback look by tracking the efficacy of
the applied intervention.

Keywords: predictive models, data visualisation, student performance,
risk analytics

1. Introduction

The need to educate a competitive workforce is a global problem. In the US, for

example, despite billions of dollars spent to improve the educational system,

approximately 35% of students never finish high school. The drop rate among some

demographic groups is as high as 50� 60%. At the college level in the US only 30% of
students graduate from 2-year colleges in 3 years or less and approximately 50%

graduate from 4-year colleges in 5 years or less (Bill and M. G. Foundation 2010).

A basic challenge in delivering global education, therefore, is improving student

success. By student success we mean improving retention, completion and graduation

rates. In this paper we describe a Student Success System (S3) that provides a holistic,

analytical view of student academic progress.
1

The core of S3 is a flexible predictive

modelling engine that uses machine intelligence and statistical techniques to identify

at-risk students pre-emptively. S3 also provides a set of advanced data visualisations

*Corresponding author. Email: [email protected]

Research in Learning Technology

Supplement: ALT-C 2012 Conference Proceedings

0203

ISBN 978-91-977071-4-5 (print), 978-91-977071-5-2 (online)

2012 Association for Learning Technology. # A. Essa and H. Ayad. This is an Open Access article distrib- 58
uted under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license (http://

creativecommons.org/licenses/by-sa/3.0/) permitting all non-commercial use, distribution, and reproduction in any

medium, provided the original work is properly cited. http://dx.doi.org/10.3402/rlt.v20i0.19191

http://dx.doi.org/10.3402/rlt.v20i0.19191

Page 2

for reaching diagnostic insights and a case management tool for managing

interventions. S3’s open modular architecture will also allow integration and plug-

ins with both open and proprietary software. Powered by learning analytics, S3 is

intended as an end-to-end solution for identifying at-risk students, understanding why

they are at risk, designing interventions to mitigate that risk and finally closing the

feedback look by tracking the efficacy of the applied intervention.

2. Related work

Student Success System (S3) draws and builds upon work in risk analytics in

education and health care. In this section we begin by describing how predictive

modeling has been applied in health care and education. We also describe

methodological limitations to current risk modelling approaches (e.g. Signals Project

at Purdue University) in learning analytics. Current approaches to building predictive

models for identifying at-risk students are stymied by two serious limitations. First,

the predictive models are one-off and, therefore, cannot be extended easily from one

context to another. We cannot simply assume that a predictive model developed for a
particular course at a particular institution is valid for other courses. Can we devise

a flexible and scalable methodology for generating predictive models that can

accommodate the considerable variability in learning contexts across different

courses and different institutions? Secondly, current modelling approaches, even if

they generate valid predictions, tend to be black boxes from the standpoint of

practitioners. The mere generation of a risk signal (e.g. green, yellow, red) does not

convey enough information for designing meaningful personalised interventions. The

design of S3, both from an application and research perspective, is intended to
overcome these limitations.

2.1. Risk analytics in health care

The use of risk analytics in health care provides an instructive example of how

segmentation strategies and statistical models can lead to substantial cost savings and

Return on Investment (ROI). Risk analytics is also the first step in designing

personalised interventions and therefore optimizing the quality of health care delivery.
The starting point of predictive models in health care delivery is a well-known

phenomenon: a small percentage of subscribers in health care plans account for

a disproportionately large percentage of health care costs. Typically, 20% of the

population accounts for 80% of the costs. The middle 60% of the population

accounts for another 15% of the costs. Finally, the remaining 20% of the population

accounts for only 5% of the costs.

Predictive models gained popularity in health care delivery initially as a strategy

for managing costs. Since then analytics is being used to deliver personalised care,
thereby improving the quality of health outcomes. Subscribers are first segmented

into risk pools using predictive analytics. Then each pool is managed using a different

intervention strategy.

. Risk Pool 1: the chronically ill, who need personalised and well-integrated care
services

. Risk Pool 2: the newly diagnosed, who have an immediate need for disease
specific information and timely and cost-effective options

ALT-C 2012 Conference Proceedings

59

Page 6

Student Profile Screen: The Student Profile Screen is the primary screen

associated with each student in S3. It is intended to provide an at-a-glance view of

a student’s profile and risk factors.

Course Screen: By clicking on Math in the Student Profile Screen we pull up Eric

Cooper’s performance charts and predictions for Mathematics. An explanation of the

visualisations is provided in the Section 3.2.

Notes Screen: The Notes Screen provides a running case history of the

student’s interactions with various advisors, faculty and counselors. It can be

regarded as a case management tool. We can imagine scalable versions of S3 would

integrate with an enterprise CRM tool to provide deeper case management

functionality.

ALT-C 2012 Conference Proceedings

63

Page 7

Referral Screen: The Referral Screen lists all relevant referral options at the

institution. In addition, a communication pathway (e.g. email) is provided from

within the screen rather than having to step outside the context.
In summary, the basic screens of S3 provide a synoptic view of a student’s

academic progress and the ability through single click interactions to isolate areas of

risk or potential risk. Once we have seen that a student is projected to be at-risk, what

kind of insights can we derive from the data and patterns as a basis for designing

an intervention strategy? A key feature of S3 is the set of data visualisations.

3.2. S3 Visualisations

As the user of the S3 navigates through the various success indicators, the underlying

models and data are presented in an intuitive and interpretable manner, going from

one level of aggregation to another. Furthermore, at the course level we present

dynamic and interactive chart that allow the user of the S3 to interact with the data

and to explore and understand its patterns and characteristics. Some sample

visualisations in S3 are displayed below:

Risk Quadrant. At a course level each point represents a student in the class.

The top right quadrant contains all students who are on-track and not at-risk. The

bottom right quadrant contains students who are academically at� risk, meaning that
they are projected to receive a D or F in the course. The bottom left quadrant

contains students who are likely to Withdraw or Dropout. Finally, the top left

quadrant contains students who are under-engaged, meaning that the students are

projected to succeed in the course but their pattern of under-engagement might be

a cause of concern for other reasons.

A. Essa and H. Ayad

64

Page 12

In general, ensuring that a predictive modelling algorithm matches the properties

of the data is crucial in providing meaningful results that meet the needs of the

particular application scenario. One way in which the impact of this algorithm-to-

application match can be alleviated is by using ensembles of predictive models, where

a variety of models (either different types of models or different instantiations of the

same model) are pooled before a final prediction is made. Intuitively, ensembles allow

the different needs of a difficult problem to be handled by models suited to those

particular needs. Mathematically, classifier ensembles provide an extra degree of

freedom in the classical bias/variance trade-off, allowing solutions that would be

difficult (if not impossible) to reach with only a single model (Oza and Tumer 2008).

Stacking, data fusion, adaptive boosting and related ensemble techniques have

successfully been applied in many fields to boost prediction accuracy beyond the level

obtained by any single model (Polikar 2006). S3 represents a particular instance of

the ensemble paradigm. It employs aspects of data fusion to build base models for

different learning domains. Furthermore, the system utilises a stacked generalisation

strategy. A best fit meta-model takes as input predictors the output of the base

models and optimally combine them into an aggregated predictor, referred to as a

success indicator/index. In this type of stacked generalisation, optimisation is

typically achieved by applying Expectation Maximization (EM) algorithm.
A large data arising from learner-produced data trails, ubiquitous learning and

networks of social interactions are giving rise to the new research area of learning

analytics. These diverse and abundant sources of learner data are not sufficiently

analysed via a single best-fit predictive model, as in the Course Signals system.

Instead, the discovery and blending of multiple models to effectively express and

manage complex and diverse patterns of the e-learning process is required.

The idea is that data from each learning modality, context or level of aggregation

across the institution can be used to train base predictive models, whose output can

then be combined to form an overall success or risk-level prediction. Applications in

which data from different sources with different input variables are combined to

make a more informed decision are generally referred to as data fusion applications.

ALT-C 2012 Conference Proceedings

69

Page 13

Hence, the data fusion model is useful for building individual predictive models

that are well suited for sub-domains of an application. In the context of S3 these

models correspond to each data tracking domain and represent different aspects of

the learning process. That is, each model is designed for a particular domain of

learning behaviour. An initial set of domains are defined as: Attendance, Comple-

tion, Participation and Social Learning.

6. Conclusion

In this paper we have outlined a holistic ensemble-based analytical system for

tracking student academic success. The core idea of the S3 synthesises several strands

of risk analytics: the use of predictive models and segmentation to identify aca-

demically at-risk students, the creation of data visualisations for reaching diagnostic

insights and the application of a case-based approach for managing interventions.

There are several fundamental limitations in current approaches to building

predictive models in learning analytics. The first limitation is the ability to generalise

across different learning contexts: how can we build predictive models that generalise
across different courses, different institutions, different pedagogical models, different

teaching styles and different learning designs? A second limitation is the ability to

interpret the results of a prediction for the purpose of decision and action: how can a

non-technical practitioner (e.g. an advisor) design meaningful interventions on

behalf of an individual learner when the underlying mechanism of prediction is either

a black box or obscure?

S3 applies an ensemble method for predictive modelling using a strategy of

decomposition. The units of decomposition have the added property that they are
semantically significant in a learning context. Decomposition provides a flexible

mechanism for building predictive models for application in multiple contexts.

Decomposition into semantic units provides an added bonus, namely the ability to

extend our predictions towards reaching diagnostic insights and designing persona-

lised interventions.

Note

1. S3 is in development by Desire2Learn Inc. A beta version of the software will be
demonstrated at the Alt-C conference. The production version will be available in
January 2013.

References

Bill & Melinda Gates Foundation. (2010) Next generation learning (pdf, 8 pages). Technical
report, Bill & Melinda Gates Foundation, Seattle, USA.

Campbell, J. P., DeBlois, P. B. & Oblinger, D. G. (2010) ‘Academic analytics: a new tool for a
new era’, Educause Review, vol. 42, no. 4, pp. 40� 57.

Gilfus Education Group. (2012) Academic analytics � New eLearning diagnostics, [online]
Available at: http://www.gilfuseducationgroup.com/academic-analytics-new-elearning-
diagnostics

Macfadyen, L. P. & Dawson, S. (2010) ‘Mining lms data to develop an ‘‘early warning system’’
for educators: a proof of concept’, Computers & Education, vol. 54, pp. 588� 599.

Oza, N. C. & Tumer, K. (2008) ‘Classifier ensembles: select real-world applications’,
Information Fusion, vol. 9, no. 1, pp. 4� 20.

Polikar, R. (2006) ‘Ensemble based systems in decision making’, IEEE Circuits and Systems
Magazine, Third Quarter, pp. 21� 45.

A. Essa and H. Ayad

70

http://www.gilfuseducationgroup.com/academic-analytics-new-elearning-diagnostics
http://www.gilfuseducationgroup.com/academic-analytics-new-elearning-diagnostics

Similer Documents