INFORMATION REPOSITORY

02. Handling Data & Files

Updated on February 7, 2025
We know how to enter different types of data into the Command Window, learned about types of data, and how to use simple functions. But what happens when we have data which seems impossible to type over into the Command Window? In this second tutorial, we will focus on ways of handling bigger data and will make the next step in your programming journey.

Learning Goals #

  • Using scripts to organize your programming efforts.
  • Opening bigger files such as .CSV files.
  • Using data stored as Tables.

1. Scripts #

1.1. Creating and writing a script #

Now that we have developed the ability for simple interactions with the MATLAB interface, we can start learning more about programming. Most code that you write can be organised in scripts, which are compilations of code. To avoid losing your written code, it is paramount that we create a script file immediately. Figure 1 shows some important options in the MATLAB interface.
Figure 1. MATLAB interface.
First press New, Scripts (A), and a new script will open with the name “untitled”. To save this new script, now press Save (B). The File Explorer will open, and you can generally save this file anywhere, but we recommend it saving in one allocated folder. This way, you can make sure this folder is visual in the Current Folder part of the MATLAB window. This ensures easy acces to all the files you will create during these tutorials.
EXERCISE 1
  1. Create a new script file and save it in a location that you want to use for your MATLAB calculations.
  2. Navigate in the Current Folder window to the location where you saved your new script file. You should now see it in your Current Folder window.
After creating a new script file, we also need to be able to run this script. We do this by clicking on Run (C) at the top of the window. By pressing this button, the whole script will ben executed (called “running” a script in the programming community). There is also a way to only run parts of the script. This will be important once we will make longer scripts and you need to either troubleshoot or just try out specific parts of your code. To run a section you need to click on Run Section (D). How to make sections will be discussed in 1.3.
EXERCISE 2
  1. Create a script which will sum two variables and save this summed value in a third variable. Hint: take a look at exercise 1 from tutorial 1.
  2. Now press Run to run this code. What happens? Hint: did you try using the semicolon “;”.
  3. Now try out the other buttons on the top.

1.2. Managing the Command Window and Workspace #

In the previous tutorial, we wrote quite an amount of code in the Command window. This is nice to quickly calculate something, but before you know it you’ll get lost in the variables accumulating in the Workspace. One additional problem is that these variables are not safely stored.

There is a way to clear both the Workspace and the Command Window, but remember: when you clear either one, this data is non-retrievable.

  • Typing “clear” into the Command Window will clear the Workspace.
  • Typing “clc” into the Command Window will clear the Command Window.
You can also save the current workspace by using the save function.

1.3. Structurization #

As a script grows it can start to become quite cluttered. You may understand the contents now, but this is often much more challenging if you revisit it months later. One way to improve your script, is by adding some organization. For instance, annotations, or notes, are added to explain some codes and make it understandable for others. This can also be used to hide unused code so it will not be ran when you run the script.
USE THE HELP AND DOC FUNCTIONS

Remember that in the previous lesson we covered the help and doc functions. As indicated, we from that point on assume you will look up new functions as we introduce them.

A note is made my adding “%” in the beginning of a line of code, everything after the note-sign will become non-runnable code.

				
					%% Section --> this will create a section

% Header --> this is used as headers and to add comments about the code

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% And if you want to be fancy, you can make separations like this ^
				
			
Furthermore, we can also add structurization by adding Sections and headers. Sections can be created by placing two percent signs on the beginning of the line, optionally (but recommended) you can add some text explaining what this part is about or what this part of the script does. You can also add notes and headers by simply adding one percent sign and typing some text after this.

2. Matrices, Loading Data and Tables #

In the previous tutorial, we entered data by simply typing it into the Command Window, but this makes no sense when dealing with larger datasets LC-MS. Let’s see what the scale is that we are talking about. Suppose that our LC method has an analysis time of 60 minutes, or 3600 s. The mass spectrometer (MS) measures, 10 times per second. This means that we have 3600*10 = 36,000 points in time.

Figure 2. Representation of the number of datapoints in an LC-MS file according to the specifications in the text.

If the MS scans from range 60 to 800 m/z, binned at 0.1 m/z, then we are looking at (800-60)/0.1=7,400 datapoints per spectrum. In other words, 36000*7400=266,400,000 data points! 

This is a lot of data! And this is not even high-resolution MS, and/or a 2D separation.

2.1. Loading files (.CSV) #

One of the most common ways to add these kind of datasets to MATLAB is by using a csv-file (Comma Separated File). This can be added my importing it as a Table. A table provides structure to the data, making it easier to read when your data includes different types of values in different columns. Note that all data within a column should be of the same data type (see Section 2.4).

EXERCISE 3

During Tutorial 1 we entered the data by manually typing over the numbers. Now we want to practice with a smarter way to enter data into MATLAB.

  1. First, download the CSV file: CEC concentrations.csv. Make sure to save it in the current folder.
  2. Now try and load in the data. Hint: use the readtable() function. Use the help or doc function (e.g. “doc readtable”) if you are stuck.
  3. Do you see any difference between a table and a matrix?
				
					% Open The Concentrations File
file_name   = 'CEC concentrations.csv';
T           = readtable(file_name);
				
			

2.2. What are tables? #

When data contains a mixture of variables types (e.g. strings and floats) then loading the data as a Table can be very useful. This table can then be used for fact indexing.

EXERCISE 4

Try and use the examples below on the table. What is the difference between the two types of indexing?

In the previous exercise, we imported the CEC concentrations csv-file. There are multiple ways of indexing a table. 

				
					% Indexing 1
idx_1   = T(:,'Name');
idx_2   = T(3,'Name');
idx_3   = T(5,2);


% Indexing 2
idx_4   = T{:,'Name'};
idx_5   = T{3,'Name'};
idx_6   = T{5,2};
				
			
Figure 3. Snapshot of the CEC concentation data, with the indexes highlighted.

2.3. Data structures #

The table we have imported also contains variables of a different types. We see words and numbers.

From the perspective of MATLAB, there are a lot of different types of variables. The most important ones are described in the table below. You can always check the type of variable of your data by using the function class() with the variable of interest as input.

Table 1. Overview of common variable types in MATLAB.
Type Explanation Example
General-purpose numeric data
1789, 7.23, Inf, -Inf, NaN
In other programming languages these values can be slit in integers and floats
Int: 1789; Float: 7.23, Inf, -Inf, NaN
Fixable character array, notes with “…”
“CAS”, “13423”, “hello world”
ContentFixed-size text data or legacy code, noted with ‘… ‘
‘CAS’, ‘13523’, ‘hello world’
Boolean type containing false and true
true, false, &

2.4. Loading matrices #

Sometimes data transferred to you is not stored as a table, but a general matrix. In our next example, we have obtained a chromatogram with the first column containing the time axis and the second column representing the signal intensity. For such cases, it can also be convenient to use the readmatrix() function. This simplified version of readtable, only yields simple output.

				
					% Loading A Chromatogram
file_name   = 'Chromatogram.csv';
X           = readmatrix(file_name);
				
			
Both readtable and readmatrix function very similar and can also be used for other type of files, such as Excel (.xlsx) and text (.txt) files. As the files that you want to load become larger, it will also become more relevant to evaluate the speed at which files can be loaded with different functions.
EXERCISE 5

Download the following file (Chromatogram, .CSV). Load it with the readmatrix function. Do you notice a difference when inspecting the loaded data in the Workspace? What about the CSV files themselves? Can you use readmatrix to load the data from Exercise 3? What about using readable on the chromatogram?

We’ll see different ways that we retrieve or represent the information stored in the data that we can load, as we progress with both the tutorial, as well as the Chemometrics & Statistics course.

Concluding remarks #

We now have the capability of loading data into MATLAB, and have learned more about the different formats in which data can be loaded. In the next tutorial we’ll focus more on the flow of executing scripts as well as plot the data that we load.

For more information also look at the MATLAB website. Here you can find even more information about how to work with everything from matrices to different functions, and so much more.

https://nl.mathworks.com/products/matlab.html

Is this article useful?