[gov2001-l] Problems with Optim

29 Mar 2008

Dear All,

Apologies for the long message, but we didn't think that our  
questions would make sense otherwise. We're having some problems  
getting optim to work with our model & we'd be grateful for advice.

The paper we're trying to replicate involves a twin study and  
compares identical (MZ) and non-identical (DZ) twins in an attempt to  
estimate the effects of genetic and environmental effects. Sets of  
twins are modelled using a bivariate normal with identical and non- 
identical twins having different but related distributions -  
basically the model is that identical and non-identical twins share  
genetic causes 100% and 50% respectively. Everything else  
(environment) is assumed to be equivalent. The model is a simple one  
- it assumes 3 latent causal variables: genetic, shared environment  
and non-shared environment. We are (at least initially) interested in  
the effects of individual traits (e.g., Speaking at age 7, as  
measured by teacher assessment.) If we get past this, then we'll try  
and replicate their longitudinal modeling. We are interested in the  
proportions of these effects on the variance & covariance.

So we've created a log-likelihood function based on this (see R code  
pasted below) and we want to produce a MLE for the three factors. But  
optim doesn't work - it produces the following error message:

Error in solve.default(vcvm) :
   Lapack routine dgesv: system is exactly singular

We've tried all the different methods that optim uses (except "L-BFGS- 
B") and all (except "SANN") produce the same error message (sometimes  
referring to vcvd.) We've left SANN going all night and it hasn't  
produced an answer. Now vcvd and vcvm are variance-covariance  
matrices and, our function does attempt to work out the inverse using  
solve, but the way in which we have set them up should not allow  
singular matrices to be produced (see the code.) However, we've tried  
a range of different starting values & we get the same errors.

Our data has some odd features. After excluding some cases, there's  
roughly 2000 identical or MZ twins and roughly 3500 non-identical or  
DZ twins with lots of missing values. But many of the traits that  
we're interested in have a very small scale - so Speaking at age 7 is  
just a scale of 3 levels (1, 2 & 3) - and of course the trait for  
many of the twins (about 75% of them) are exactly the same. We can  
argue about whether this is a good model - but our initial job is to  
replicate the results of the paper. There are several other traits  
that are more straightforward (so 'g' or intelligence is measured as  
z-scores.)

So the specific questions are:

1. Why is optim doing this and should we use something different? The  
help file on optim suggests that optim might be problematic for  
parameters with more than one dimension (which is really what we've  
got here since we use the 3 variance parameters to create a variance- 
covariance matrix)? The help files suggests using optimize, but it's  
not at all clear how we should use optimize - the arguments don't  
appear to us to be equivalent to optim (e.g., no par). Also we're  
uninterested in mu - should we actually just work with the actual  
mean for the trait from our data - which would presumably be faster.

2. Is there a way of avoiding the for loop in the code? It seems to  
us that this will be really inefficient & optim certainly takes a  
while to return the error message (> 2 minutes). We've tried using  
apply with a function we define ourselves:
mu <- c(1,1)
vcv <- matrix(c(1,0,0,1),2,2)
D <- matrix(c(1,2,3,4),2,2)
muvcv.fun <- function(x,mu,vcv) (x-mu) %*% vcv %*% (x-mu)
apply(D,1,FUN = muvcv.fun(mu = mu, vcv = vcv))
But we get the following error message:
Error in muvcv.fun(mu = mu, vcv = vcv) :
   argument "x" is missing, with no default

So we're unsure how the syntax for a parameter should be for a  
function inside optim. We've tried various options that seem  
intuitive / sensible to us, although doubtless we're missing  
something obvious.
3. We have to then calculate the root mean square error of  
approximation (RMSEA). In the original paper, this is calculated  
using the software the authors use - Mx, which seems to be pretty  
exclusively used for twins studies, although RMSEA is fairly  
commonplace. The formula we've found for RMSEA is sqrt[(chi^2/df - 1)/ 
(N-1)] but it's not very clear how to use this. We think we have 6  
parameters (the variances and covariances for the identical and non- 
identical twins] - 3 for the three variance proportions that we're  
estimating, hence 3 df. Then N is the total sample size. But to get  
chi^2, do we take (obs-exp)^2/exp?

Thanks

Jeremy & Jill (Hohenstein)

Here's the R-code:

# Implement log-likelihood function for MZ & DZ twins - simple  
comparison of a single trait based ACE model & bivariate normal  
distribution

ll.dbnmbn <- function(par,Yd,Ym,Xd,Xm){
	sa <- exp(par[2]) # Re-parameterize R -> R+ as sigma^2 genetic  
(hereditary) variance
	sc <- exp(par[3]) # Shared environment variance
	se <- exp(par[4]) # Non-shared environment

	mu <- par[1]

	vcvd <- matrix(c(sa+sc+se,sa/2+sc,sa/2+sc,sa+sc+se),2,2) # varcov  
matrix for DZ twins
	vcvm <- matrix(c(sa+sc+se,sa+sc,sa+sc,sa+sc+se),2,2) # varcov matrix  
for MZ twins

	D <- rbind(Yd,Xd) # 2 x N(DZ) matrix of bivariate DZ twins trait scores
	M <- rbind(Ym,Xm) # 2 x N(MZ) matrix of bivariate MZ twins trait scores

	d <- 0
	for (i in 1:length(Xd)){
		d <- d + (D[,i]-mu)%*%solve(vcvd)%*%(D[,i]-mu) # sum((X-mu) x  
varcov^-1 x (X-mu)) for DZ twins
		}

	m <- 0
	for (i in 1:length(Xm)){
		m <- m + (M[,i]-mu)%*%solve(vcvm)%*%(M[,i]-mu) # sum((X-mu) x  
varcov^-1 x (X-mu)) for MZ twins
		}

	# Log-likelihood func based on bivariate normal - related but  
different for DZ and MZ twins
	out <- -length(Xd)*log(2*pi*sqrt(det(vcvd)))-0.5*d - length(Xm)*log 
(2*pi*sqrt(det(vcvm)))-0.5*m
	return(out)
	}

fit.1 <- optim(par = c(0,2.3,1.5,13.5), ll.dbnmbn, Yd = Yd, Ym = Ym,  
Xd = Xd, Xm = Xm, method="BFGS", control=list(fnscale = -1), hessian  
= T)

Dr Jeremy Hodgen
Senior Lecturer in Mathematics Education
King's College London
Department of Education and Professional Studies
Franklin-Wilkins Building
Waterloo Bridge Wing
150 Stamford Street
London SE1 9NH

Tel: 020 7848 3102
Fax: 020 7848 3182
E-mail: jeremy.hodgen at kcl.ac.uk

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

[gov2001-l] Problems with Optim