Converting plain text citations in Word into EndNote unformatted EndNote citations

Situation: you have a Word document that has no ‘live’ EndNote references in it – they were either typed manually or if EndNote or other referencing software was used, then the live references have been converted to plain text at some point and you do not have an earlier version of the document where they’re live.

This might also be useful if you’ve got a document with live references in from different referencing software that doesn’t have ‘unformatted citations’.

You have an EndNote library of the references that are used in the document (or, perhaps, you don’t yet, but you would like to get this too – see note at the end); you just want to link the EndNote library to the citations in the document.

I.e.: you have a Word document with plain text references and you want to make them live EndNote references.

HOW DO YOU DO IT?

First, if you have numbered citations, you don’t really have much to go on. You could probably do a Find and Replace to change them all to a general {Author, 1000} format, but literally that (i.e. not specific authors and years) – running the ‘Update Citations and Bibliography’ function would identify them as points where a citation is required but you’d need to then manually add the details of the reference to search and select the correct reference to add.

However, if you have some kind of author-date citations, you can convert (Smith, 2024) into {Smith, 2024} – meaning that the process will be at least partly automated.

If round brackets are not used at all in the rest of the document, other than in citations (this is unlikely unfortunately!), then simply use Find and Replace (ctrl-H) to replace ( with { then ) with }.

However, otherwise, some more complex Find & Replace functions may be necessary. In these cases, ensure ‘Use wildcards’ (in the Find and Replace box) is selected.

Find:

\((*, [0-9]{4})\)

Replace with:

{\1}

Is probably the simplest form, assuming a (Smith, 2024) format (i.e single author surname then comma, space, year – no initials or et al or other authors).

More complex:

Find:

\(([!0-9]@, )([!0-9]@, )([0-9]{4})\)

Replace with:

{\1\3}

Will convert (Smith, J., 2024) or (Smith, et al, 2024)  or (Smith, J. et al, 2024) to  {Smith, 2024}

Find:

\(([!0-9]@)( and [!0-9]@)(, [0-9]{4})\)

Replace with

{\1\3}

Will convert (Smith and Jones, 2023) to {Smith, 2023} (though you’d need to watch out for institutions with the word ‘and’ in their name)

I would always do ‘Replace’ one at a time, rather than ‘Replace All’ – those functions can end up doing weird things (due to the capacity to have flexible author strings with any number of characters), so it feels safer to check it hasn’t somehow selected a whole paragraph to replace. If you do have a mix of reference formats, I guess I’d maybe work through these in reverse order they’re listed here (ands first, then initials, et als, etc, then just all the simple one author names remaining) and I think that may work. You may need to tweak if you have other formats!

Once all the references are in the format {Smith, 2024}, hopefully, if you’re confident your EndNote library has the right references, then it should be relatively straightforward (though it will still involve some manual selecting) to ‘Update Citations and Bibliography’ and get everything linked up and functional.

IF YOU DON’T HAVE THE REFERENCES IN ENDNOTE: this is rather more work, but it is possible to automate this to some extent, if you’ve got a lot of journal references. The exact sequence of processing will depend on the reference format, but you’re aiming to convert e.g.

Smith, A., Jones, B. (2024) ‘Article about interesting things’. Journal of interesting things 123(45):67-68.

Into something like:

(smith*.au AND “article about interesting things”.ti)

That you can then compile into a long string and run as searches on a database.

It *is* possible, just arduous, so only worth it for large numbers (for middling numbers, just manually copying first author and titles (often minus punctuation is safer) into two columns in Excel. Then use an Excel function to put them all into the right format to search your chosen database (you may do this several different ways to do several different databases), copy that column into word and run and Find and Replace for a paragraph mark (which is ^p) to be replaced by ‘ AND ‘ and then you’ll have your search string.

ALTERNATIVELY, depending on format, it’s sometimes possible to actually set rules to split references up (Find and Replace in Word, looking for punctuation usually, replacing with tab-spaces) so that ALL the information can be pasted into Excel – then you can use the process to add references in Excel via a tab-delimited format into EndNote. This can be tough, but if your references are all of the same type and punctuated in distinctive ways, it is possible!

Combining reference download files to import into EndNote in one go

If you’re working with a source that lets you download references for import into reference management software like EndNote, but which only allows you to download a small number of references at a time (perhaps even just one at a time), is there a convenient way to avoid having to import each file individually?

As long as the downloaded files are text files (or something equivalent), then: yes!

Let’s take an example of EMF Portal – this downloads references one at a time in .ris format. (However, the process could equally be applied to, for example, EudraCT EU Clinical Trials Register, which permits you to download a page of results at a time, in .txt format – although you’d need to have created your own import filter for this (that will be covered here at some point in the future!)). These instructions are for PC.

– First download all the references – the default file format will be bibliography.ris, bibliography(1).ris, bibliography(2).ris, etc.
– Find the files. Initially, they’ll probably be in your download folder; however, if there’s any other files there called ‘bibliography’, you may wish to copy and paste all the new files into another folder (anywhere else, just for this process, everything can be deleted afterwards).
– Wherever you put the files, view them in that folder and click in the location box (where it says e.g. C:\Users\username\Documents\folder) – but just click in the box, don’t click any of the parent folders or it’ll just navigate there! If you click at the front or back of the location, it’ll be fine and highlight it.
– Type ‘cmd’ in the location box and press enter – this will open up the command line (there various other ways of opening the command line dialogue, but this is the easiest way to have the correct folder already selected!)
– First the .ris files need renaming into .txt files (.ris files can’t be combined but .txt files can – even though .ris are basically just .txt files), so type: ren bib*.ris bib*.txt
– Then, to combine the files, type: copy bib*.txt combinedbib.txt
– Finally, to turn the combined file back into a .ris file, type: ren combinedbib.txt combinedbib.ris

This should have created one file with all the EndNote references in – double click it normally in File Explorer and the import into EndNote should start.

Importing references in XML format from REHABDATA into EndNote

REHABDATA by NARIC is perhaps not top of the databases you’d consider essential to use for a literature search for most topics, but if you are using it (or indeed any other database that only exports in XML), how can you get your results into EndNote?

The following process I’ve come up with is based primarily on this helpful video’s process for converting XML files into an EndNote-friendly tab-delimited format.

Save your results from REHABDATA in XML format.

Open the file in Excel.

Edit the column headings to exactly match the EndNote fields you want the data to go into, and delete all unnecessary columns. So, for example, I’d suggest ending up with: Author, Title, Journal, Year, Volume, Issue, Pages, Abstract, Keywords, ISSN for journal articles.

If you’ve got book results, you’d probably want to cut and paste these out into a new Excel sheet, change the column headings to ones relevant to ‘Book’ reference type and repeat the process with a separate file.

Save the tidied up table as a Text (tab delimited) (*.txt) document.

Open this in Word.

Put the cursor at the start of the document, press return to get a blank line, and type:

*Journal article

(including the asterisk at the start)

Run a Find and Replace (Ctrl-H), for:

Find: | (vertical line – usually shift \ on UK keyboards)

Replace with: // (two forward-slashes)

Replace all

Find: (double-quote-mark symbol – usually shift 2 on UK keyboards)

Replace with: (nothing! leave blank)

Replace all

Save (as text file)

In EndNote, go to File > Import > File

Choose the edited text file.

Select Tab Delimited as the Import Option.

Go!

Success!

If you get an error saying you’ve got the wrong field names, this will be because one or more of your headings you were editing in Excel are not the correct exact wording of a corresponding field name in EndNote. Try again!

You may find some older results appear with all the journal details crammed in the title field. Unfortunately, that’s just how the data comes out of REHABDATA. Obviously you could do some cunning find/replaces (in Excel or Word or EndNote), but that’s not really part of the import process; I’ll leave that to you to work out if you have sufficient results of that format to warrant it!

Fixing broken/corrupted EndNote citations

Uh oh! EndNote citations that look active, but some have stopped responding to EndNote at all? Won’t format into a new style nor appear in the references and won’t convert into unformatted citations? But if you view field codes, they look okay? Mysterious and frustrating?

I think this is caused by editing on other word processors, but I’m also suspicious of Track Changes and the copy & pasting of formatted citations.

Solution? There doesn’t seem to be any way to fix these broken citations directly. Best straightforward(ish) option: unformat citations, remove field codes from the document (turning the broken ones into plain text), then go through and reinsert them from EndNote manually.

If you’d been using an author-date style, you can automate this slightly by changing EndNote’s temporary citation delimiters to round brackets, meaning EndNote will go through and pick up on all the defunct citations. However, it’ll also pick up on anything else in a bracket, plus it won’t match the citations directly with the EndNote library (et als, no record numbers), so you’ll need to select & insert each citation.

Not ideal! What if you’ve got loads of these corrupted citations? And you’ve got loads of other stuff in brackets and/or you’ve used a numbered style?

THERE IS SOMETHING THAT CAN BE DONE.

This is it:

Convert all still-functional citations to unformatted citations.

Press Alt+F9 (display field codes in the document) and you should see the broken citations as field codes, including a load of data about the reference – this is what can be used.

Press Alt+F9 to switch back. Unfortunately getting the actual text of the field codes is not straightforward. But someone has made something that will do it:

http://www.gmayor.com/export_field.htm

(I can’t guarantee that this isn’t some kind of cunning virus thing, but I’m fairly confident that that’s not the case.)

Once you’ve downloaded it, then installed it, you can access it from the ‘Developer’ tab in Word.

Go through the document, highlighting it and running the converter in chunks (I think it can only process a certain amount of text at a time). (Also: avoiding headings and other non-standard text, as it’ll clear the formatting).

So, your broken EndNote citations will now be weird long field code text. But you can modify them (with Word’s Find/Replace function – Ctrl+H) so that EndNote thinks they’re unformatted citations!

Semi-colons in multiple citations are a hurdle and a few of these Find/Replaces are for dealing with them. The others are designed to clean up at least the start of each field code so EndNote will pick them up.

Find/Replace these, in this order (if ‘?’ is used, activate wildcards for that search, otherwise don’t)

19??; WITH ~CHECK DATE~

20??; WITH ~CHECK DATE~

</Cite><Cite><Author> WITH }{

ADDIN EN.CITE <EndNote><Cite><Author> WITH NOTHING

</Author><Year> WITH COMMA SPACE

</Year><RecNum> WITH SPACE HASH

</RecNum> WITH @@

&???;  WITH NOTHING

&apos; WITH NOTHING

{ADDIN EN.CITE.DATA} WITH ~BROKEN CITATION~

 

(That last one is for totally unrecoverable ones that don’t have full field code data – they’ll need to be searched for later and reinserted manually.)

Then ‘Update Citations & Bibliography’ and cross your fingers.

Adding Impact Factors to EndNote references

Based on this discussion, a solution seemed feasible.

Adding the Impact Factor to an empty field in the references themselves is probably possible, but the Find/Replace options within EndNote aren’t sophisticated enough to do it, so the process would involve exporting the Library out into Excel, running the replacements, then importing back – which is likely to run into a load of other problems en route. Plus: any new references added to the library would need the Impact Factor adding (or the whole process to be run again).

Better: use term lists to make the replacements. You will need: EndNote’s term list (in the case I was working on it was the medical one), and a full table of current Impact Factors downloaded in Excel format from Journal Citation Reports.

Instructions (I’m sure some of these could be refined if I was better at Excel, but this will work):

Open both the term list (as a UTF-8 tab-delimited text file) and the downloaded IF list in Excel (and paste them into the same file).

You can try and tidy up the IF list a little to match the term list formatting (e.g. a find/replace “fur” -> “für” to cover the German journal names).

In the term list, copy the full journal names into an additional column. Use Excel’s VLOOKUP function to replace that column with the appropriate value from the IF table.

Copy that column as values into a new column (then delete the old one) then replace “#N/A” with just “n/a” (or whatever you want to see if there is no IF for that journal). Fix any notable values that didn’t get picked up (e.g. BMJ/British Medical Journal).

So now hopefully you’ve got the term list with an extra column with IF values (or “n/a” or something if it doesn’t have one). The final steps depend how you want the IF to appear.

 

If you’re happy for it to appear with the journal name in the reference, e.g. Journal of Stuff (IF 2014: 3.021) 42(1) 567-578, then:

Create a new column that uses CONCATENATE to combine the IF column with the full journal name column to get the Journal of Stuff (IF 2014: 3.021) format (or however you’d like it).

Paste that column to replace the Abbreviation 1 column (and delete all columns beyond Full/Abbreviation 1/Abbreviation 2/Abbreviation 3).

Save as a text file. [To get the right formatting (to avoid weird glitches), I found I had to save it from Excel as an Excel file, then open it in Access and save it as a text file (without ” ” markers), then open in Notepad and save as a UTF-8 format text file. Madness.]

Then you can import the term list into your EndNote library as normal (remembering to delete anything already in the journal term list before you import), then finally edit your style to use Abbreviation 1 (without removing periods), and boom, you’re sorted.

If you want abbreviated journal titles in your references, then just use the Abbreviation 1 or Abbreviation 2 column (depending on whether you want dots or not) in the CONCATENATE stage. I’d still always finish by replacing the Abbreviation 1 column with the modified journal name + IF, since journal titles generally don’t appear in that format when they get imported into EndNote (i.e. that column is not needed for ‘recognising’ the journal name from your EndNote library references).

 

If you want the IF to appear separately to the journal name, e.g. Journal of Stuff 42(1) 567-578 (IF 2014: 3021), then:

Paste just the IF (just the number) into the Abbreviation 1 column. Save as a UTF-8 text file without ” ” markers in the elaborate way mentioned above.

Now, in your EndNote library, first make sure the values in the ‘Journal’ field is correct for all references (so if you have results from PubMed (which for mysterious reasons includes an abbreviated journal title in the ‘Journal’ field and the full name in ‘Alternative Journal’), identify those references, select them all and use Tools > Change/Move/Copy fields… to move ‘Alternative Journal’ field values to ‘Journal’). Then use the Change/Move/Copy on the whole library to copy the ‘Journal’ field to an unused field (e.g. ‘Tertiary Title’).

Finally, import the IF-enhanced term list, then modify your style so that the journal article template features ‘Tertiary Title’ where the journal name needs to be and ‘Journal’ where the IF needs to be, and choose to use ‘Abbreviation 1’ for abbreviation format. Bosh!

 

Obviously, once you’re using any of these systems, if you spot any journal names that are missing the correct IF, just edit the term list appropriately.

 

Hmm, writing that all down, it sounds pretty bad, but honestly, it’s actually not too fiddly – and it works!