Importing excel in stata

Title   Converting other format files into Stata dataset files
Author Nicholas J. Cox, Durham University, UK

1. A rule to remember
2. How to get information from Excel into Stata
3. Other methods for transferring information

3.1 Copy and paste
3.2 import delimited command
3.3 ODBC and odbc load

4. Copying a Stata graph into Excel or any other package
5. Common problems

5.1 Nonnumeric characters
5.2 Spaces
5.3 Cell formats
5.4 Variable names
5.5 Missing rows and columns
5.6 Leading zeros
5.7 Filename and folder

1. A rule to remember

Stata expects one matrix or table of data from one sheet,
with at most one line of text at the start defining the contents of
the columns.

2. How to get information from Excel into Stata

Stata can directly import data from Excel (both .xls and .xlsx) files.

Select File > Import > Excel Spreadsheet from Stata’s menus.

Also, see import excel for more
information on importing Excel spreadsheets directly into Stata.

3. Other methods for transferring information

3.1 Copy and paste

Using your Windows or Mac computer,

  1. Start Excel.
  2. Enter data in rows and columns or read in a previously saved file.
  3. Highlight the data of interest, and then select Edit and click
    Copy.
  4. Start Stata and open the Data Editor (type edit at the Stata dot
    prompt).
  5. Paste data into editor by selecting Edit and clicking Paste.

3.2 import delimited command

The following section is based on material originally written by James Hardin,
University of South Carolina, and Ted Anagnoson, California State Los Angeles.

  1. Launch Excel and read in your Excel file.
  2. Save as a text file (tab delimited or comma delimited) by
    selecting File and clicking Save As.
    If the original filename is
    filename.xls, then save the file under the name
    filename.txt or filename.csv.
    (Use the Save as type list—specifying an extension such as
    .txt is not sufficient to produce a text file.)
  3. Quit Excel if you wish.
  4. Launch Stata if it is not already running. (If Stata is already running,
    then either
    save or
    clear
    your current data.)
  5. In Stata, type import delimited using filename.ext,
    where filename.ext is the name of the file that you
    just saved in Excel. Give the complete filename, including the
    extension.
  6. In Stata, type
    compress.
  7. Save the data as a Stata dataset using the
    save
    command.

3.3 ODBC and odbc load

The following section is provided by Kevin Turner, StataCorp.

  1. You will have to download and install an Excel ODBC driver from
    Microsoft’s website to work with Excel files.
  2. Launch Stata.
  3. List the ODBC data sources that have been defined by
    Windows using the odbc list command.
  4. Click DSN (data source name) listing provided by odbc
    list
    to query that DSN.
    odbc list will then list a default entry called
    “Excel Files” that you can
    use to choose any Excel (*.xls) file to load via ODBC. You
    must select an Excel file every time you issue an
    odbc command using this DSN. You can also
    define your own DSN that always points to a specific Excel
    file. On Windows, you would define this special DSN via the
    Control Panel called “Administrative Tools”, and
    then select “Data Sources (ODBC)”. More
    documentation is available from Microsoft concerning how to
    define your own Data Sources.
  5. Click the sheet/table corresponding to your data within the
    Excel file to describe the contents. You may need to
    issue the odbc query command with the
    dialog(complete) option if you selected an arbitrary
    Excel file in the previous list.
  6. If you are satisfied with the previous description of the
    sheet/table, you can click to load the described table.
  7. If all goes well, your data will load into Stata. There are,
    however, a few general reasons why loading Excel via ODBC
    may be problematic, and those are covered in
    section 5.

4. Copying a Stata graph into Excel or any other package

Once you have a suitable graph in Stata’s Graph window,

  1. Select Edit and click Copy Graph.
  2. Open or switch to Excel and move to where you want to paste the graph.
  3. Select Edit and click Paste.

These steps should also work in other packages that accept input in this
manner.

5. Common problems

The following section is from material by Ted Anagnoson, California State
Los Angeles; Dan Chandler, Trinidad, CA; Ronan Conroy, Royal College of
Surgeons, Dublin; David Moore, Hartman Group; Paul Wicks, South Manchester
University Hospitals Trust; Eric Wruck, Positive Economics; and Colin
Fischbacher, University of Edinburgh.

The problems mentioned in it are primarily with respect to text-based methods of importing data from Excel to Stata, such as copying and pasting and import delimited. import excel handles most of these issues.

5.1 Nonnumeric characters

One cell containing a nonnumeric character, such as a letter, within a
column of data is enough for Stata to make that variable a string variable.
It is often easiest to fix this in Excel. Within Stata,
suppose the problematic string variable is named foo.
Here are three alternative ways to identify the rogue observations:

        . tab foo if real(foo) == .
        . edit foo if real(foo) == .
        . list foo if real(foo) == .

If appropriate, they can be replaced by missing, and then the variable as a
whole can be converted to numeric by typing:

        . replace foo = "" if real(foo) == .
        . gen newfoo = real(foo)
        . drop foo 
        . rename newfoo foo

You could also use
destring:

        . destring foo, replace

destring includes an option for stripping
commas, dollar signs, percent signs, and other nonnumeric characters. It
also allows automatic conversion of percentage data.

5.2 Spaces

What appear to be purely numeric data in Excel are often treated by
Stata as string variables because they include spaces. People may
inadvertently enter space characters in cells that are otherwise
empty. Although Excel strips leading and trailing spaces from numeric
entries, it does not trim spaces from character entries. One or more
space characters by themselves constitute a valid character entry and
are stored as such. Stata dutifully reads the entire column as a
string variable.

Excel has a search and replace capability that enables you to delete these
stray spaces, or you can use a text-processing program or a text editor on
the text file. You can also use the solution in
nonnumeric characters.

5.3 Cell formats

Much of the formatting in Excel interferes with Stata’s ability to interpret
the data reasonably. Just before saving the data as a text file, make sure
that all formatting is turned off, at least temporarily. You can do this by
highlighting the entire spreadsheet, selecting Format, then
selecting Cells, and clicking General.

However, no solution solves all problems. Here is a cautionary tale. A text
file included address data. One column included house numbers, and a few
were in the form 4/2. Excel decided these few were dates and converted them
to 4th February. Setting all cells to a General format does not help
because it converts these unwanted dates to 5 digit Excel date codes. One
solution is to apply a Text format to the offending column when
offered the option during Excel’s text import process. But even this works
only if you have manageably few columns to look through and are aware of the
possibility of the problem.

5.4 Variable names

Stata limits variable names to 32 characters and does not allow
any characters that it uses as operators or delimiters. Also,
variable names should start with a letter. People who are Excel
users first and Stata users second are often creative with the names
they apply to columns. Stata converts illegal column (field) names to labels
and makes a best guess at a sensible variable name. Stata’s best guess,
however, may not be as good as the name a user would choose knowing Stata’s
naming restrictions.

For example, Stata will make variable names using the first 32 characters
of the variable name and use the rest for a label. If the first 32
characters are not unique, subsequent occurrences will be called
var1, var2, etc., or v1, v2, etc. (If you paste
the data, the variable stub is var; if you use insheet, the
stub is v, so be careful writing do-files.)

5.5 Missing rows and columns

Stata completely ignores empty rows in a spreadsheet but not
completely empty columns. A completely empty column gets read
in as a variable with missing values for every observation. Of
course, no harm is done in either case, but spreadsheet users who wish
to add blank columns and/or rows to enhance legibility may wish to
note this difference.

It is best if the first row of data is complete with no missing data.
If necessary, add a dummy row with every value present, and then once in
Stata type

        . drop in 1

The missings command by Nicholas J. Cox, which allows
variables or observations that are all missing to be easily dropped, was published in Stata
Journal
15(4). Type search dm0085 for information on this command.

5.6 Leading zeros

With integer-like codes, such as ICD-9 codes or U.S. Social Security
numbers, that do not contain a dash, leading zeros will get dropped when
pasted into Stata from Excel. One solution is to flag
the variable as a string in the first line: add a nonnumeric character in Excel on that
line, and then remove it in Stata.

The missing leading zeros can also be replaced in a conversion to string
with one Stata command line; for example,

        . gen svar = string(nvar, "%012.0f")

The second argument on the right-hand side of this command is a format
specifying leading zeros on conversion of nvar to its string
equivalent. For more details on formats, see
format.

5.7 Filename and folder

Confirm the filename and location of the file you are trying to read.
Use Explorer or its equivalent to check.

For example, you may have inadvertently produced a file named
filename.txt.txt, or more generally, a name that ends with two
extensions, which may or may not be the same extension. This naming is
possible if you have an option checked in Windows Explorer under
View, Folder Options to hide file extensions for known or
registered file types. Manually rename the file, or use the correct
filename in your Stata command. You may also wish to uncheck this option to
avoid similar mistakes in the future.

When it comes to leveraging data quickly and efficiently, importing data from an Excel spreadsheet into Stata is a key skill to master. Learning how to import Excel data with variable names in the first row using Stata provides you with the ability to quickly manage and analyse data to help make better business decisions.

Before You Begin

Before you begin importing data from an Excel spreadsheet into Stata, it’s important to note the following:

  • The data needs to be arranged in a standard dataset form in the Excel spreadsheet before importing it into Stata. This means that data needs to be arranged in either a Stacked form or a Rectangular form, both of which are explained below.
  • Ensure the variable names are in the first row.
  • Select what type of variable is contained in each cell.
  • Ensure the variable name names are in a standard format, meaning they should contain no spaces, punctuation marks or non-alphabetical characters.

What is Stacked Form?

Stacked form is a dataset form where the cases are lined up in a single column and the variables that describe each case are placed in secondary columns. To understand how a stacked form looks, consider the example below:

ID Name Age Gender
1 Jeff 27 Male
2 Annie 28 Female

What is Rectangular Form?

Rectangular form is a dataset form where the cases are lined up in rows and the variables that describe each case are placed in columns. To understand how a rectangular form looks, consider the example below:

ID Name Age Gender
1 Jeff 27 Male
2 Annie 28 Female

Step-by-Step Guide and Links

Follow these steps to learn how to import Excel data with variable names in the first row using Stata:

Open up Stata.

Open up the Excel file that contains the data you need to export.

Arrange the spreadsheet in either the staked or the rectangular form.

Ensure the variable names are in the first row.

Select what type of variable is contained in each cell.

Go to File > Import data > Excel files.

Select the Excel file that contains the data you need to export and click “Open”.

Select “Yes” within the dialogue box that appears.

Select “Stata Dataset” in the “Files of type” drop-down menu and click “Open”.

Select “Import range of cells” in the “Output” drop-down menu and click “OK”.

Select “Stacked” or “Rectangular” form in the “Format of the source data” drop-down menu.

Select “Yes, the first row contains variable names” in the “Does the first row in the spreadsheet contain names of the variables?” drop-down menu.

Ensure the “Save as type” drop-down menu is set to “Stata Dataset” and click “OK”.

Once the data has been exported, review it and make any necessary adjustments.

FAQ

What do I need to do before I start importing Excel data with variable names in the first row using Stata?

Before you begin importing data from an Excel spreadsheet into Stata, it’s important to note that the data needs to be arranged in a standard dataset form in the Excel spreadsheet before importing it into Stata. This means that data needs to be arranged in either a Stacked form or a Rectangular form. Additionally, ensure the variable names are in the first row, select what type of variable is contained in each cell and ensure the variable names are in a standard format, meaning they should contain no spaces, punctuation marks or non-alphabetical characters.

What is Stacked form?

Stacked form is a dataset form where the cases are lined up in a single column and the variables that describe each case are placed in secondary columns. To understand how a stacked form looks, consider the example below:

ID Name Age Gender
1 Jeff 27 Male
2 Annie 28 Female

What is Rectangular form?

Rectangular form is a dataset form where the cases are lined up in rows and the variables that describe each case are placed in columns. To understand how a rectangular form looks, consider the example below:

ID Name Age Gender
1 Jeff 27 Male
2 Annie 28 Female

What type of files can be imported into Stata?

Stata supports the import of various different types of files, including plain text (.txt, .csv, .por, .dta, and .dbf), Microsoft Excel (.xls, .xlsx and .xlsm), SPSS (.sav), SAS (.sas7bdat) and more.

Is it possible to import data from multiple files into Stata at once?

Yes, it is possible to import data from multiple files into Stata at once. To do this, you’ll need to use the use command with the batch prefix, followed by the complete path to the files you need to import.

For example, if the files you wish to import are located at c:/data/, then the command you would use would be:

use c:/data/*

How can I view the data I’ve imported into Stata once the import is complete?

To view the data you’ve imported into Stata, you can use the browse command or you can use the Explore window which can be opened by clicking View > Explore or using the command explore.

https://www.stata.com/help.cgi?import_excel

https://www.stata.com/help.cgi?dataset_formats

cd Change directory
dir or ls Show files in current directory
insheet Read ASCII (text) data created by a spreadsheet
infile Read unformatted ASCII (text) data
infix Read ASCII (text) data in fixed format
input Enter data from keyboard
import excel Import Excel .xls or .xlsx file
describe Describe contents of data in memory or on disk
compress Compress data in memory
save Store the dataset currently in memory on disk in
Stata data format
use Load a Stata-format dataset
count Show the number of observations
list List values of variables
clear Clear the entire dataset and everything else

2.0 Demonstration and explanation

A) Preparing the workspace

A1. Changing the working directory

We start by changing the working directory, which is the default directory (folder) from which Stata will read files and to which Stata will write files. We can read/write to a directory different from the working directory by specifying a full path name when reading/writing files. We use the cd command to change directories and then dir or ls to list the contents of the directory.

cd w:     /* note: directory and path may differ on your computer */
dir

A2. No dataset can be loaded while another dataset is in memory

In Stata, we can only have one dataset loaded in memory at a time. Before another dataset can be loaded, we must erase all data from memory using the clear command. We can also clear memory as we load in another dataset using the clear option on one of the data-loading commands (see below)

clear

B) Use import delimited to read in delimited data from other sources

B1. Comma-separated file with variable names

Our first data will come as a spreadsheet, often managed or created by programs such as Excel. For example, in
Excel, we can save data as a comma-separated-values format (.csv) file, which is a text file with fields separated by commas. Here is a how a .csv file might appear:

gender,id,race,ses,schtyp,prgtype,read,write,math,science,socst
0,70,4,1,1,general,57,52,41,47,57
1,121,4,2,1,vocati,68,59,53,63,61
0,86,4,3,1,general,44,33,54,58,31
0,141,4,3,1,vocati,63,44,47,53,56
0,172,4,2,1,academic,47,52,57,53,61
0,113,4,2,1,academic,44,52,51,63,61
0,50,3,2,1,general,50,59,42,53,61
0,11,1,2,1,academic,34,46,45,39,36
0,84,4,2,1,general,63,57,54,,51
0,48,3,2,1,academic,57,55,52,50,51

The command import delimited can read text files in which the fields are separated by any character, such as spaces, commas or tabs. The command reads the first line of the data file to automatically indentify the character used as the separator (the separator can be explicitly specified with the delimiter option). Imagine we have a data file, hs0.csv, located in our current working directory. Here are the Stata commands to read these data. We use the describe command to check if the input was successful.

import delimited using hs0.csv,  clear
describe

B2. Comma-separated file without variable names

If the first line of the data does not contain the variable names, we must supply the names to the import delimited command.
Let’s try to read such a file called hs0_noname.csv.

import delimited gender id race ses schtyp prgtype read write math science socst using hs0_noname.csv, clear
describe

B3. Delimited files in general

We can use the import delimited command to read text files where the fields are separated by any character, such as spaces or tabs. Here is a snapshot of the datafile, hs0.raw.

0	70	4	1	1	general		57	52	41	47	57
1	121	4	2	1	vocati		68	59	53	63	61
0	86	4	3	1	general		44	33	54	58	31
0	141	4	3	1	vocati		63	44	47	53	56
0	172	4	2	1	academic	47	52	57	53	61
0	113	4	2	1	academic	44	52	51	63	61

The columns are left-justified, suggesting that the file is tab-delimited. However, some columns (namely columns 6 and 7) may have 1 or 2 tabs between them — it can be hard to tell by visual inspection. We explicitly tell Stata that the delimiter is a tab in the datafile using the delimiter option, and use the suboption collapse to treat multiple tabs as one delimiter. This file has no variable names, so we must supply them again:

import delimited gender id race ses schtyp prgtype read write math science socst using hs0.raw, delimiter(tab, collapse) clear 

C) Use infix to read in fixed format files

Another data format in which data can be stored is fixed format. It always
requires a codebook to specify which column(s) corresponds to which variable. Here is small
example of this type of data with a codebook. Notice how we make use of the
codebook in the infix command below. We will use the
schdat.fix data file.

        195  094951
        26386161941
        38780081841
        479700  870
        56878163690
        66487182960
        786  069  0
        88194193921
        98979090781
       107868180801
variable name column number
id 1-2
a1 3-4
t1 5-6
gender 7
a2 8-9
t2 10-11
tgender 12

Below we use the infile command, where we specify variable names and the column numbers that their corresponding values inhabit.

clear
infix id 1-2 a1 3-4 t1 5-6 gender 7 a2 8-9 t2 10-11 tgender 12 using schdat.fix

D) Use import excel to read in Excel files

The import excel command was introduce in Stata 12. Here is what the file hsbdemo.xlsx looks like.

screenshot-stata14

On the import excel command below, we specify the sheet where the data are located with the sheet() option and that the variable names are contained in the first row using the firstrow option.

import excel using hsbdemo.xlsx, sheet("hsbdemo") firstrow clear

E) Use input to enter data from the keyboard or a do-file

We can also use the do-file editor to input data. The do-file editor is used for
writing a sequence of commands and running them all at once. You can copy and paste
the following Stata syntax to the do-file editor and run it. You can also paste this directly into the Command window

clear
input id female race ses str3 schtype prog read write math science socst
147 1 1 3 pub 1 47 62 53 53 61
108 0 1 2 pub 2 34 33 41 36 36
 18 0 3 2 pub 3 50 33 49 44 36
153 0 1 2 pub 3 39 31 40 39 51
 50 0 2 2 pub 2 50 59 42 53 61
 51 1 2 1 pub 2 42 36 42 31 39
102 0 1 1 pub 1 52 41 51 53 56
 57 1 1 2 pub 1 71 65 72 66 56
160 1 1 2 pub 1 55 65 55 50 61
136 0 1 2 pub 1 65 59 70 63 51
end

After running the above program, we can issue the describe command to get a
general idea about the data set.

describe

F) The save command reads stores data as Stata data (.dta) files, and the use command loads Stata data files

We can save the data set we just created to disk by issuing the save command. This creates a .dta file when no extension is specified.

save hsb10

We can then load the data we just saved using the use command.

clear
use hsb10
use "W:hsb10", clear

G) The use command can load files over the internet

The use command can also be used to read a data file over the
internet, which we will do throughout this seminar.

use https://stats.idre.ucla.edu/stat/data/hs0, clear

3.0 For more information

  • Data
    Management Using Stata:  A Practical Handbook

    • Chapter 2
  • Statistics
    with Stata 12

    • Chapter 2
  • Gentle Introduction to Stata, Revised Third Edition
    • Chapter 2
  • Data Analysis Using Stata, Third Edition
    • Chapter 11

  • An Introduction to Stata for Health Researchers, Third Edition

    • Chapter 6
  • Stata Learning Modules
    • A sample Stata
      session
    • Inputting raw data files into Stata
  • Frequently Asked Questions
    • How can I convert files among SAS,
      SPSS and Stata?
    • How can I input a dataset quickly?
    • How
      can I read Excel files in Stata? (Stata 12)
    • How
      can I read Stata 12 data files in Stata 11?
    • How do I read a data file that uses
      commas/tabs as delimiters?
    • How can I handle the No Room to Add
      Observations Error?

Содержание

  1. 1 item has been added to your cart.
  2. Stata/MP4 Annual License (download)
  3. Stata: Data Analysis and Statistical Software
  4. Excel ® import and export
  5. 1 item has been added to your cart.
  6. Stata/MP4 Annual License (download)
  7. Stata: Data Analysis and Statistical Software
  8. How do I get information from Excel into Stata?
  9. 1. A rule to remember
  10. 2. How to get information from Excel into Stata
  11. 3. Other methods for transferring information
  12. 3.1 Copy and paste
  13. 3.2 import delimited command
  14. 3.3 ODBC and odbc load
  15. 4. Copying a Stata graph into Excel or any other package
  16. 5. Common problems
  17. 5.1 Nonnumeric characters
  18. 5.2 Spaces
  19. 5.3 Cell formats
  20. 5.4 Variable names
  21. 5.5 Missing rows and columns
  22. 5.6 Leading zeros
  23. 5.7 Filename and folder
  24. Stata Class Notes: Entering Data
  25. 2.0 Demonstration and explanation
  26. A) Preparing the workspace
  27. B) Use import delimited to read in delimited data from other sources
  28. C) Use infix to read in fixed format files
  29. D) Use import excel to read in Excel files
  30. E) Use input to enter data from the keyboard or a do-file
  31. F) The save command reads stores data as Stata data (.dta) files, and the use command loads Stata data files
  32. G) The use command can load files over the internet
  33. 3.0 For more information
  34. Primary Sidebar

1 item has been added to your cart.

Stata/MP4 Annual License (download)

Stata: Data Analysis and Statistical Software

Excel ® import and export

Stata for Windows, Mac, and Linux can directly import data and export data and results to Microsoft Excel files. Both Excel .xls and .xlsx files are supported.

Above you see the preview tool, which you can use to see the data in an Excel worksheet before importing it and adjust options controlling how the data are imported.

import excel features

  • .xls and .xlsx support
  • import any worksheet from a workbook with multiple worksheets
  • import a custom cell range
  • treat first row of Excel data as Stata variable names
  • automatic conversion of Excel dates to Stata dates
  • automatic optimization of Stata storage types

export excel features

  • .xls and .xlsx support
  • replace an entire workbook
  • add a worksheet to an existing workbook
  • replace a single worksheet within an existing workbook
  • modify a subset of cells within an existing worksheet
  • save Stata variable names or variable labels to first row of worksheet
  • automatic conversion of Stata dates to Excel dates
  • export value labels or the underlying values
  • specify a custom missing-value code to use in worksheet

  • .xls and .xlsx support
  • export Stata returned results to a worksheet
  • export a Stata matrix in memory to a worksheet
  • export a table from a collection to a worksheet
  • export a custom numeric or string expression
  • insert a Stata graph or any PNG, JPEG, WMF, or TIFF file to a worksheet
  • create cell formulas in a worksheet
  • format cells in a worksheet
    • number formats
    • cell border style and color
    • horizontal and vertical alignment
    • fill patterns and foreground/background color
  • font formatting
    • font, font size, font color
    • bold, italic, strikeout, underline
    • subscripts
    • text wrapping, text indent, text rotation
  • replace an entire workbook
  • add a worksheet to an existing workbook
  • replace a single worksheet within an existing workbook
  • modify a subset of cells within an existing worksheet

And if we have the following Excel worksheet,

we can bold the column titles and add a solid black border below the column titles and on the right side of the State column with the commands

format the Average Response column as a percent:

and insert a Stata bar graph, bar1.png, into the worksheet:

Excel is a registered trademark of Microsoft.

Источник

1 item has been added to your cart.

Stata/MP4 Annual License (download)

Stata: Data Analysis and Statistical Software

How do I get information from Excel into Stata?

Title Converting other format files into Stata dataset files
Author Nicholas J. Cox, Durham University, UK

1. A rule to remember
2. How to get information from Excel into Stata
3. Other methods for transferring information
3.1 Copy and paste
3.2 import delimited command
3.3 ODBC and odbc load
4. Copying a Stata graph into Excel or any other package
5. Common problems
5.1 Nonnumeric characters
5.2 Spaces
5.3 Cell formats
5.4 Variable names
5.5 Missing rows and columns
5.6 Leading zeros
5.7 Filename and folder

1. A rule to remember

Stata expects one matrix or table of data from one sheet, with at most one line of text at the start defining the contents of the columns.

2. How to get information from Excel into Stata

Stata can directly import data from Excel (both .xls and .xlsx) files.

Select File > Import > Excel Spreadsheet from Stata’s menus.

Also, see import excel for more information on importing Excel spreadsheets directly into Stata.

3. Other methods for transferring information

3.1 Copy and paste

Using your Windows or Mac computer,

  1. Start Excel.
  2. Enter data in rows and columns or read in a previously saved file.
  3. Highlight the data of interest, and then select Edit and click Copy.
  4. Start Stata and open the Data Editor (type edit at the Stata dot prompt).
  5. Paste data into editor by selecting Edit and clicking Paste.

3.2 import delimited command

The following section is based on material originally written by James Hardin, University of South Carolina, and Ted Anagnoson, California State Los Angeles.

  1. Launch Excel and read in your Excel file.
  2. Save as a text file (tab delimited or comma delimited) by selecting File and clicking Save As. If the original filename is filename.xls, then save the file under the name filename.txt or filename.csv. (Use the Save as type list—specifying an extension such as .txt is not sufficient to produce a text file.)
  3. Quit Excel if you wish.
  4. Launch Stata if it is not already running. (If Stata is already running, then either save or clear your current data.)
  5. In Stata, type import delimited usingfilename.ext, where filename.ext is the name of the file that you just saved in Excel. Give the complete filename, including the extension.
  6. In Stata, type compress.
  7. Save the data as a Stata dataset using the save command.

3.3 ODBC and odbc load

The following section is provided by Kevin Turner, StataCorp.

  1. You will have to download and install an Excel ODBC driver from Microsoft’s website to work with Excel files.
  2. Launch Stata.
  3. List the ODBC data sources that have been defined by Windows using the odbc list command.
  4. Click DSN (data source name) listing provided by odbc list to query that DSN. odbc list will then list a default entry called “Excel Files” that you can use to choose any Excel (*.xls) file to load via ODBC. You must select an Excel file every time you issue an odbc command using this DSN. You can also define your own DSN that always points to a specific Excel file. On Windows, you would define this special DSN via the Control Panel called “Administrative Tools”, and then select “Data Sources (ODBC)”. More documentation is available from Microsoft concerning how to define your own Data Sources.
  5. Click the sheet/table corresponding to your data within the Excel file to describe the contents. You may need to issue the odbc query command with the dialog(complete) option if you selected an arbitrary Excel file in the previous list.
  6. If you are satisfied with the previous description of the sheet/table, you can click to load the described table.
  7. If all goes well, your data will load into Stata. There are, however, a few general reasons why loading Excel via ODBC may be problematic, and those are covered in section 5.

4. Copying a Stata graph into Excel or any other package

Once you have a suitable graph in Stata’s Graph window,

  1. Select Edit and click Copy Graph.
  2. Open or switch to Excel and move to where you want to paste the graph.
  3. Select Edit and click Paste.

These steps should also work in other packages that accept input in this manner.

5. Common problems

The following section is from material by Ted Anagnoson, California State Los Angeles; Dan Chandler, Trinidad, CA; Ronan Conroy, Royal College of Surgeons, Dublin; David Moore, Hartman Group; Paul Wicks, South Manchester University Hospitals Trust; Eric Wruck, Positive Economics; and Colin Fischbacher, University of Edinburgh.

The problems mentioned in it are primarily with respect to text-based methods of importing data from Excel to Stata, such as copying and pasting and import delimited. import excel handles most of these issues.

5.1 Nonnumeric characters

One cell containing a nonnumeric character, such as a letter, within a column of data is enough for Stata to make that variable a string variable. It is often easiest to fix this in Excel. Within Stata, suppose the problematic string variable is named foo. Here are three alternative ways to identify the rogue observations:

If appropriate, they can be replaced by missing, and then the variable as a whole can be converted to numeric by typing:

destring includes an option for stripping commas, dollar signs, percent signs, and other nonnumeric characters. It also allows automatic conversion of percentage data.

5.2 Spaces

What appear to be purely numeric data in Excel are often treated by Stata as string variables because they include spaces. People may inadvertently enter space characters in cells that are otherwise empty. Although Excel strips leading and trailing spaces from numeric entries, it does not trim spaces from character entries. One or more space characters by themselves constitute a valid character entry and are stored as such. Stata dutifully reads the entire column as a string variable.

Excel has a search and replace capability that enables you to delete these stray spaces, or you can use a text-processing program or a text editor on the text file. You can also use the solution in nonnumeric characters.

5.3 Cell formats

Much of the formatting in Excel interferes with Stata’s ability to interpret the data reasonably. Just before saving the data as a text file, make sure that all formatting is turned off, at least temporarily. You can do this by highlighting the entire spreadsheet, selecting Format, then selecting Cells, and clicking General.

However, no solution solves all problems. Here is a cautionary tale. A text file included address data. One column included house numbers, and a few were in the form 4/2. Excel decided these few were dates and converted them to 4th February. Setting all cells to a General format does not help because it converts these unwanted dates to 5 digit Excel date codes. One solution is to apply a Text format to the offending column when offered the option during Excel’s text import process. But even this works only if you have manageably few columns to look through and are aware of the possibility of the problem.

5.4 Variable names

Stata limits variable names to 32 characters and does not allow any characters that it uses as operators or delimiters. Also, variable names should start with a letter. People who are Excel users first and Stata users second are often creative with the names they apply to columns. Stata converts illegal column (field) names to labels and makes a best guess at a sensible variable name. Stata’s best guess, however, may not be as good as the name a user would choose knowing Stata’s naming restrictions.

For example, Stata will make variable names using the first 32 characters of the variable name and use the rest for a label. If the first 32 characters are not unique, subsequent occurrences will be called var1, var2, etc., or v1, v2, etc. (If you paste the data, the variable stub is var; if you use insheet, the stub is v, so be careful writing do-files.)

5.5 Missing rows and columns

Stata completely ignores empty rows in a spreadsheet but not completely empty columns. A completely empty column gets read in as a variable with missing values for every observation. Of course, no harm is done in either case, but spreadsheet users who wish to add blank columns and/or rows to enhance legibility may wish to note this difference.

It is best if the first row of data is complete with no missing data. If necessary, add a dummy row with every value present, and then once in Stata type

The missings command by Nicholas J. Cox, which allows variables or observations that are all missing to be easily dropped, was published in Stata Journal 15(4). Type search dm0085 for information on this command.

5.6 Leading zeros

With integer-like codes, such as ICD-9 codes or U.S. Social Security numbers, that do not contain a dash, leading zeros will get dropped when pasted into Stata from Excel. One solution is to flag the variable as a string in the first line: add a nonnumeric character in Excel on that line, and then remove it in Stata.

The missing leading zeros can also be replaced in a conversion to string with one Stata command line; for example,

The second argument on the right-hand side of this command is a format specifying leading zeros on conversion of nvar to its string equivalent. For more details on formats, see format.

5.7 Filename and folder

Confirm the filename and location of the file you are trying to read. Use Explorer or its equivalent to check.

Источник

Stata Class Notes: Entering Data

cd Change directory
dir or ls Show files in current directory
insheet Read ASCII (text) data created by a spreadsheet
infile Read unformatted ASCII (text) data
infix Read ASCII (text) data in fixed format
input Enter data from keyboard
import excel Import Excel .xls or .xlsx file
describe Describe contents of data in memory or on disk
compress Compress data in memory
save Store the dataset currently in memory on disk in Stata data format
use Load a Stata-format dataset
count Show the number of observations
list List values of variables
clear Clear the entire dataset and everything else

2.0 Demonstration and explanation

A) Preparing the workspace

A1. Changing the working directory

We start by changing the working directory, which is the default directory (folder) from which Stata will read files and to which Stata will write files. We can read/write to a directory different from the working directory by specifying a full path name when reading/writing files. We use the cd command to change directories and then dir or ls to list the contents of the directory.

A2. No dataset can be loaded while another dataset is in memory

In Stata, we can only have one dataset loaded in memory at a time. Before another dataset can be loaded, we must erase all data from memory using the clear command. We can also clear memory as we load in another dataset using the clear option on one of the data-loading commands (see below)

B) Use import delimited to read in delimited data from other sources

B1. Comma-separated file with variable names

Our first data will come as a spreadsheet, often managed or created by programs such as Excel. For example, in Excel, we can save data as a comma-separated-values format (.csv) file, which is a text file with fields separated by commas. Here is a how a .csv file might appear:

The command import delimited can read text files in which the fields are separated by any character, such as spaces, commas or tabs. The command reads the first line of the data file to automatically indentify the character used as the separator (the separator can be explicitly specified with the delimiter option). Imagine we have a data file, hs0.csv, located in our current working directory. Here are the Stata commands to read these data. We use the describe command to check if the input was successful.

B2. Comma-separated file without variable names

If the first line of the data does not contain the variable names, we must supply the names to the import delimited command. Let’s try to read such a file called hs0_noname.csv.

B3. Delimited files in general

We can use the import delimited command to read text files where the fields are separated by any character, such as spaces or tabs. Here is a snapshot of the datafile, hs0.raw.

The columns are left-justified, suggesting that the file is tab-delimited. However, some columns (namely columns 6 and 7) may have 1 or 2 tabs between them — it can be hard to tell by visual inspection. We explicitly tell Stata that the delimiter is a tab in the datafile using the delimiter option, and use the suboption collapse to treat multiple tabs as one delimiter. This file has no variable names, so we must supply them again:

C) Use infix to read in fixed format files

Another data format in which data can be stored is fixed format. It always requires a codebook to specify which column(s) corresponds to which variable. Here is small example of this type of data with a codebook. Notice how we make use of the codebook in the infix command below. We will use the schdat.fix data file.

variable name column number
id 1-2
a1 3-4
t1 5-6
gender 7
a2 8-9
t2 10-11
tgender 12

Below we use the infile command, where we specify variable names and the column numbers that their corresponding values inhabit.

D) Use import excel to read in Excel files

The import excel command was introduce in Stata 12. Here is what the file hsbdemo.xlsx looks like.

On the import excel command below, we specify the sheet where the data are located with the sheet() option and that the variable names are contained in the first row using the firstrow option.

E) Use input to enter data from the keyboard or a do-file

We can also use the do-file editor to input data. The do-file editor is used for writing a sequence of commands and running them all at once. You can copy and paste the following Stata syntax to the do-file editor and run it. You can also paste this directly into the Command window

After running the above program, we can issue the describe command to get a general idea about the data set.

F) The save command reads stores data as Stata data (.dta) files, and the use command loads Stata data files

We can save the data set we just created to disk by issuing the save command. This creates a .dta file when no extension is specified.

We can then load the data we just saved using the use command.

G) The use command can load files over the internet

The use command can also be used to read a data file over the internet, which we will do throughout this seminar.

3.0 For more information

Click here to report an error on this page or leave a comment

Источник

Понравилась статья? Поделить с друзьями:
  • Importing excel data to access
  • Importing data into к from excel
  • Importing data from excel to sql server
  • Importing csv files to excel
  • Importing csv and excel