On tap today is numpy.arange(). It’s a function in the NumPy library for Python that returns an array of evenly spaced values within a specified interval. It is similar to the built-in range() function in Python but instead returns an array instead of a list.

The basic syntax for numpy.arange() is:


   numpy.arange([start,] stop[, step,], dtype=None)

Numpy.arange takes these arguments:

  • start (optional) is the starting value of the sequence. The default is 0.
  • stop is the end value of the sequence (exclusive). This parameter is required.
  • step (optional) is the step size between each value in the sequence. The default is 1.
  • dtype (optional) is the data type of the returned array. The default is None, which means NumPy will choose a data type based on the inputs.

Here is an example of using numpy.arange() to create an array of values from 0 to 9 (exclusive) with a step size of 2:


   import numpy as np

   arr = np.arange(0, 10, 2)

   print(arr) # Output: [0 2 4 6 8]

You can also omit the start parameter to start the sequence at 0:


   arr = np.arange(5)

   print(arr) # Output: [0 1 2 3 4]

Here are two examples of using np.arange() in a data science application:

Example 1: Generating an array of dates


   import numpy as np
   import pandas as pd

   # Generate an array of dates for the month of January, 2023
   dates = pd.date_range(start='2023-01-01', end='2023-01-31')

   # Convert the array of dates to an array of integers
   days = np.arange(len(dates))

   print(dates)
   print(days)

Explanation of code:

  • We import the NumPy and Pandas libraries, as we will be using both in this example.
  • We use the pd.date_range() function to generate an array of dates for the month of January, 2023. This function takes two parameters: start (the first date in the range) and end (the last date in the range).
  • We then use the len() function to get the length of the dates array, which represents the number of days in January 2023.
  • We use np.arange() to generate an array of integers from 0 to the length of the dates array, which corresponds to the number of days in January.
  • Finally, we print both the dates and days arrays.

The output of the code for Example 1 would be:


   DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04',
   '2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08', '2023-01-09', 
   '2023-01-10', '2023-01-11', '2023-01-12', '2023-01-13', '2023-01-14', 
   '2023-01-15', '2023-01-16', '2023-01-17', '2023-01-18', '2023-01-19',
   '2023-01-20', '2023-01-21', '2023-01-22', '2023-01-23', '2023-01-24',
   '2023-01-25', '2023-01-26', '2023-01-27', '2023-01-28', '2023-01-29',
   '2023-01-30', '2023-01-31'],
   
   dtype='datetime64[ns]', freq='D')
   [0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 
    16 17 18 19 20 21 22 23 24 25 26 27 28 29 30]

Example 2: Generating an array of random numbers


   import numpy as np

  # Generate an array of 100 random numbers between 0 and 1
  rand_nums = np.random.rand(100)

  # Generate an array of bins for a histogram
  bins = np.arange(0, 1.1, 0.1)

  # Plot a histogram of the random numbers using the bins
  import matplotlib.pyplot as plt
  plt.hist(rand_nums, bins=bins)
  plt.show()

Explanation of code:

  • We import the NumPy library.
  • We use the np.random.rand() function to generate an array of 100 random numbers between 0 and 1. This function takes a single parameter, which is the length of the array.
  • We use np.arange() to generate an array of bins for a histogram. The bins are spaced 0.1 units apart, with a range from 0 to 1 (inclusive).
  • We import the pyplot module from the matplotlib library and use it to plot a histogram of the random numbers using the bins generated by np.arange(). We then use plt.show() to display the plot.

The output of the code for Example 2 would be:

I hope these examples are helpful in understanding how np.arange() can be used in a data science application!

There are many books on Python programming that discuss NumPy and its arange() function. Here are a two popular ones that you might find helpful:

  1. “Numerical Python” by Robert Johansson: This book focuses specifically on the NumPy library and covers its various features, including the arange() function.
  2. “Learning NumPy Array” by Ivan Idris: This book is a beginner’s guide to NumPy and covers the basics of the library, including its arange() function.

Both of these books should provide a detailed explanation of NumPy’s arange() function and how it can be used in data science applications.

Here are two quotes from “Numerical Python” by Robert Johansson that discuss numpy.arange() and offer insights into its usage:

  1. “The numpy.arange() function is similar to the built-in range() function but returns an array instead of a list and accepts float arguments as well as integers.” (page 30)

This quote highlights the key differences between numpy.arange() and the built-in range() function in Python. It also emphasizes the flexibility of numpy.arange(), which allows for both integer and float arguments.

  1. “One of the main advantages of numpy.arange() is that it can be used to create a sequence of values that can be used as an index for an array. This is particularly useful when the data is not available in a contiguous sequence of elements in memory.” (page 31)

This quote emphasizes the usefulness of numpy.arange() in creating sequences of values that can be used as indices for arrays. It also highlights how numpy.arange() can be used to work with data that is not stored in a contiguous sequence in memory.

Here are two quotes from “Learning NumPy Array” by Ivan Idris that discuss numpy.arange() and offer insights into its usage:

  1. “The numpy.arange() function is one of the most commonly used functions for creating a range of numbers in NumPy. It returns an array with evenly spaced values within a specified range.” (page 19)

This quote provides a concise description of the numpy.arange() function, emphasizing its usefulness in creating an array with evenly spaced values within a specified range.

  1. “The numpy.arange() function can be used to create an index array for a given data set. This is useful when working with large data sets that may not have consecutive indices.” (page 47)

This quote highlights one of the key applications of numpy.arange(), which is to create index arrays for large data sets. It also emphasizes the usefulness of numpy.arange() in situations where the indices of a data set may not be consecutive.

Leave a Reply