Instructions for data archiving
1 Prepare data material for archiving
The data to be archived should only include data files and variables used in the project's analysis and for figures and tables published in the report/working paper/article.
1.1 Data material solely from IFAU's data bases
Data solely retrieved from IFAU’s research databases (IFAU-I, -S, or -U) can be submitted for archiving in one of the below formats:
- Original format (Stata, SAS, GAUSS, R). A separate data description is not required (ensure that variables are documented with variable labels or in the program code, as needed).
- Text file with fixed positions according to the instructions under heading 1.2 (see description below).
1.2 Data material not solely from IFAU's databases
Data not solely retrieved from IFAU's research databases (IFAU-I, -S, or -U) must be saved as a fixed-width text file. Fixed-width means that the first character of variable n (where n = [1, 2, …, N]) in the dataset appears at the same position in the text file on every row. Where a variable has different lengths across rows (observations), blank spaces are used to pad the values to the variable’s maximum length. This creates straight columns in the file. Variables should not be separated by, for example, commas, semicolons, or tabs (note that even if a tab-separated file may appear aligned, it is still a delimited file where the delimiter is a tab). Historically, ASCII encoding has been required. Today, Unicode (UTF-8) is standard in most software and better suited for long-term preservation.
Via Stat Transfer
Data processed in Stata, SAS, GAUSS or Excel can be transferred to a fixed-width text file with fixed variable positions using Stat Transfer.
- Choose the statistical software that was used to produce the data file under Input file.
- Select the file to be transferred using browse.
- Choose ASCII – fixed format (S/T Schema) under Output file.
- Choose location (folder/catalogue) where the ASCII file is to be saved using browse.
Stat Transfer automatically creates a description file with a list of variables and positions. If the data file contains labels that describe the contents of each variable, these will also appear on the generated list of variables.
If you have many files that are to be transferred to ASCII – fixed format, you can use the following instructions to start a batch transfer:
- Locate the program exe found in the installation folder for StatTransfer, e.g. C:\Program Files (x86)\StatTransfer8\st.exe
- Set the path to the folder for your data files, for example:
- cd P:\yyyy\nn\data\analysis_data - Type copy * followed by the file extensions of the original data file (e.g. sas7bdat for SAS files. ‘*’ specifies that all files of the same file type located in the specified folder should be transferred) followed by *.sts (for ASCII – fixed format (S/T Schema)).
- For example: copy *.sas7bdat *.sts /O-
Adding ”/O-” indicates that StatTransfer should not use optimization of data types in the transfer.
Via a statistics software
Stata
A Stata program (.ado file) has been created for export of Stata .dta files to fixed-width text files: dta2fixedpos. The program will also generate a data description file containing information about the variables in the dataset, including labels for both variables and value labels (if either exists in the .dta file).
An additional program for importing a text file created using dta2fixedpos back into Stata’s .dta format has also been created: fixedpos2dta. The import program reads from the generated data description file. The data description file should therefore not be edited after it has been created by dta2fixedpos.
The programs can be installed directly in Stata[1] using the command:
- net install dta2fixedpos, all from(https://data.ifau.se/static/stata)
or (for a description with option to install):
- net from https://data.ifau.se/static/stata
The program is documented in a Stata help file (type help dta2fixedpos in Stata). To export a Stata .dta file:
- Open the file in Stata: use ”P:\yyyy\nnn\analysis_data.dta”
- Run the command:
dta2fixedpos, saveas(”P:\yyyy\nnn\ analysis_data.raw”)
The program optimizes display formats by data type to avoid loss of information; for example, date/time format variables are exported as text according to ISO 8601 ('YYYY-mm-dd' for dates and 'YYYY-mm-ddThh:mm:ss.sss' for timestamps).
SAS
A SAS macro for export of a dataset to a fixed-width text file has been created and is available via: https://data.ifau.se/static/sas/macro_export_fixed_width_with_desc.sas
A data description file is automatically generated.
- /* macro export_fixed_width_with_desc() requires the following parameters:
- data: dataset to be exported (libname.dataset, e.g.
work.analysis_data)
- outtxt: path and file name for the exported text file
- outdesc: path and file name for the generated data description file
*/
- %export_fixed_width_with_desc(
data=work.analysis_data, outtxt=P:\yyyy\nnn\data\analysis_data.txt,
outdesc= P:\yyyy\nnn\data\descr_analysis_data.txt - );
R
An R function for export of a dataset to a fixed-width text file has been created and can be installed in R using:
- # install.packages("devtools")
- devtools::install_github("adrianadermon/writefwf")
- library(writefwf)
- write_fwf(analysis_data, "P:/yyyy/nnn/analysis_data")
A data description file is automatically generated.
Python
A Python function for export of a dataset to a fixed-width text file has been created and is available via
https://data.ifau.se/static/python/func_export_fixed_width_with_desc.py
A data description file is automatically generated.
/* function export_fixed_width_with_desc() requires the following parameters:
- df: name of dataset/data frame to be exported (the function expects a pandas data frame)
- outtxt: path and file name for the exported text file
- outdesc: path and file name for the generated data description file
*/
export_fixed_width_with_desc(
df=analysis_data,
outtxt=“P:/yyyy/nnn/data/analysis_data.txt”,
outdesc=“ P:/yyyy/nnn/data/descr_analysis_data.txt”
)
GAUSS
indata = ”name of data file” ;
open f1 = ^inddata;
X= readr(f1,rowsf(f1));
output file = ut.txt reset;
X;
Output off;
Description of the data material
A data description file should also be created (text file with line breaks). Name the file postbeskr.txt.
(This will be automatically generated if using dta2fixedpos to export a datafile from Stata or from SAS using the macro export_fixed_width_with_desc(), from R using the function write_fwf() or from Python using the function export_fixed_width_with_desc(), see above instructions for each software.)
The data description file must include variable names, variable type (numeric, text, date, etc.), start position for each variable, exact column width, and a description of the variable content along with any other relevant information needed to explain the contents of the data file.
2 Documenting and submitting data material for archiving
NB! Applies to all cases, regardless of source of data material
Information
Create a text file containing information about the origin/source of the data material (e.g., surveys, AF Händel, Statistics Sweden’s LISA database…) and the population covered, the software used for the analysis, and any other information you consider necessary. If external packages/functions are used, that are not included in the standard installation of the software, these must be listed in the documentation.
In R, for example,
installed.packages()[, c("Package", "Version")]
and in Python,
pip freeze > requirements.txt
can be used to list installed packages including which version.
Save the file as text with line breaks; for example, in MS Word, choose the 'Unformatted Text' option as the file format. Insert line breaks.
Note: It is acceptable to save the data description and the above information regarding data sources etc. in the same file.
However, this does not apply if the export was done using dta2fixedpos in Stata as the corresponding import program, fixedpos2dta, reads from the automatically generated data description file. Therefore, the data description file should not be manually modified after export. Instead, in this case, create a separate text file with information about data sources etc.
Program files / Code files
Submit program files / code files for archiving (e.g. .do files in Stata, .sas files in SAS, .R files in R, .py in Python and .e or .sim files in GAUSS).
If the project's code files must be executed in a specific order, this should be clearly indicated – either in the text file containing information about the project's data, or by using a main script file that documents the project's code and executes it in the correct sequence.
File/folder names
It might be a good idea to replace spacing with underlining (_) and not use characters like Åå, Ää, Öö.
Data submission
- Create a folder under
I:\Forskningsprojekt och ärenden\Dataunderlag för arkivering\ - Name the folder ”Dnr_nn_yyyy” (e.g. Dnr_1_2025).
- Transfer the data material, including code files and information files to the folder and notify IFAU’s responsible person for archiving.
Those who do not have access to I:\, can
- create the folder to be transferred to the archive in the project folder on P:\ or, after coordination with IFAU’s database administrator, in another appropriate location
- use encrypted transfer via S-FTP (contact the database administrator)
[1] Within IFAU’s or NEK-UU’s network, which includes the calculation servers.