You are reading glossary / Programming

Author: Fatih Karabiber
Ph.D. in Computer Engineering, Data Scientist

Binary Variable

A binary variable is a categorical variable that can only take one of two values, usually represented as a Boolean — True or False — or an integer variable — 0 or 1

LearnDataSci is reader-supported. When you purchase through links on our site, earned commissions help support our team of writers, researchers, and designers at no extra cost to you.

You should already know:

Basic Python — Learn Python and Data Science concepts interactively on Dataquest.

A binary variable is a categorical variable that can only take one of two values, usually represented as a Boolean — True or False — or an integer variable — 0 or 1 — where $0$ typically indicates that the attribute is absent, and $1$ indicates that it is present.

Some examples of binary variables, i.e. attributes, are:

Smoking is a binary variable with only two possible values: yes or no
A medical test has two possible outcomes: positive or negative
Gender is traditionally described as male or female
Health status can be defined as diseased or healthy
Company types may have two values: private or public
E-mails can be assigned into two categories: spam or not
Credit card transactions can be fraud or not

In some applications, it may be useful to construct a binary variable from other types of data. If you can turn a non-binary attribute into only two categories, you have a binary variable. For example, the numerical variable of age can be divided into two groups: 'less than 30' or 'equal or greater than 30'.

Datasets used in machine learning applications have more likely binary variables. Some applications such as medical diagnoses, spam analysis, facial recognition, and financial fraud detection have binary variables.

Binary Variables in Python

In Python, the boolean data type is the binary variable and defined as $True$ or $False$.

# Boolen data type
x = True
y = False
print(type(x), type(y))

    
        Learn Data Science with

Out:

<class 'bool'> <class 'bool'>

    
        Learn Data Science with

Additionally, the bool() function converts the value of an object to a boolean value. This function returns $True$ for all values except the following values:

Empty objects (list, tuple, string, dictionary)
Zero number (0, 0.0, 0j)
None value

print("Boolean value of an empty list is ", bool([]))
print("Boolean value of zero is ", bool(0))
print("Boolean value of number 10 is", bool(10))
print("Boolean value of an empty string is", bool(''))
print("Boolean value of a string is", bool('string'))

    
        Learn Data Science with

Out:

Boolean value of an empty list is  False
Boolean value of zero is  False
Boolean value of number 10 is True
Boolean value of an empty string is False
Boolean value of a string is True

    
        Learn Data Science with

In a dataset

From the statsmodels library, a real dataset named birthwt about 'Risk Factors Associated with Low Infant Birth Weight' will be imported to observe binary variables.

import statsmodels.api as sm
dataset1 = sm.datasets.get_rdataset(dataname='birthwt', package='MASS')
df1 = dataset1.data

df1.head()

    
        Learn Data Science with

Out:

	age	lwt	race	smoke	ui	ftv	bwt
85	19	182	2	0	1	0	2523
86	33	155	3	0	0	3	2551
87	20	105	1	1	0	1	2557
88	21	108	1	1	1	2	2594
89	18	107	1	1	1	0	2600

From the help file, description of the dataset obtained by dataset1.__doc__ code is given below.

low : an indicator of whether the birth weight is less than 2.5kg
age : mother’s age in year
lwt : mother’s weight in pounds at last menstrual period
race : mother’s race (1 = white, 2 = black, white = other)
smoke : smoking status during pregnancy
ptl : number of previous premature labours
ht : history of hypertension
ui : presence of uterine irritability
ftv : number of physician visits during the first trimester
bwt : birth weight in grams

As can be easily learned from dataset description, low, smoke, and ui attributes are the binary variables. In Python, "value_counts()" function gives the counts of unique values in the variable.

# find counts of the variables
df1['smoke'].value_counts()

    
        Learn Data Science with

Out:

0    115
1     74
Name: smoke, dtype: int64

    
        Learn Data Science with

In the following example, a numerical variable, age, will be converted to a binary variable.

# convert a numerical variable to binary variable
df1['new_age'] = df1['age'] > 30
df1['new_age'].astype('bool')

print('Type of the new variable:\n', type(df1['new_age'].iloc[0]), '\n')
print('Value Counts of the new variable:\n', df1['new_age'].value_counts())

    
        Learn Data Science with

Out:

Type of the new variable:
 <class 'numpy.bool_'> 

Value Counts of the new variable:
 False    169
True      20
Name: new_age, dtype: int64

    
        Learn Data Science with

Start Learning for Free

Meet the Authors

Fatih Karabiber Ph.D. in Computer Engineering, Data Scientist

Associate Professor of Computer Engineering. Author/co-author of over 30 journal publications. Instructor of graduate/undergraduate courses. Supervisor of Graduate thesis. Consultant to IT Companies.

Back to blog index

Binary Variable

You should already know:

Binary Variables in Python

In a dataset

Recent articles:

The 9 Best AI Courses Online for 2024: Beginner to Advanced

The 6 Best Python Courses for 2024 – Ranked by Software Engineer

Best Course Deals for Black Friday and Cyber Monday 2024

Sigmoid Function

7 Best Artificial Intelligence (AI) Courses

Meet the Authors

Cookie Policy

Binary Variable

You should already know:

Binary Variables in Python

In a dataset

Get updates in your inbox

Recent articles:

The 9 Best AI Courses Online for 2024: Beginner to Advanced

The 6 Best Python Courses for 2024 – Ranked by Software Engineer

Best Course Deals for Black Friday and Cyber Monday 2024

Sigmoid Function

7 Best Artificial Intelligence (AI) Courses

Get updates in your inbox

Meet the Authors

Get updates in your inbox